mirror of
https://github.com/velocitatem/PHANTOM.git
synced 2026-05-31 08:33:36 +00:00
adding docs
This commit is contained in:
298
SETUP.md
Normal file
298
SETUP.md
Normal file
@@ -0,0 +1,298 @@
|
||||
# PHANTOM: setup for operators and partners
|
||||
|
||||
This guide walks a team from **business context** (what you sell, how you price, what traffic you worry about) through a **running PHANTOM stack**, **behavioral kernels and contamination**, and **RL training / benchmarking**. The math lives in the thesis PDF; here we tie operations to that math without re-deriving it. References to the thesis use **chapter numbers** only (build the PDF locally if you need line-level citations).
|
||||
|
||||
**Thesis (PDF):** [thesis-latest.pdf](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf)
|
||||
|
||||
---
|
||||
|
||||
## 1. Who this is for / prerequisites
|
||||
|
||||
**Audience:** Engineers and researchers who run Docker, a Next.js app, and Python tooling; product or risk stakeholders who define experiment goals and acceptable UX tradeoffs.
|
||||
|
||||
**Skills:** Docker Compose, Node/npm, Python 3.8+, basic Kafka/Redis mental model.
|
||||
|
||||
**Decide up front:**
|
||||
|
||||
- **Vertical vs demo:** The repo ships `hotel` and `airline` storefront modes (`STORE_MODE`). Anything beyond that is custom integration work.
|
||||
- **Data residency:** Event streams and training artifacts default to paths under the repo (overridable via `PHANTOM_`* env vars in `lib/config.py`). Decide where logs and models may live before you point production-like traffic at the stack.
|
||||
- **Experiment governance:** Who may run human vs agent sessions, how sessions are labeled or weak-labeled for research, and retention policy for interaction logs.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
The formal model assumes each session is generated by a latent **actor class** $Y \in H,A$ (human vs agent). Your deployment choices implicitly assert **which sessions are valid for estimating human vs agent behavior** and whether experimental conditions are stable. If you mix exploratory QA traffic with labeled experiments without recording that fact, you blur the empirical partitions $D_H$ and $D_A$ that the methodology needs for transition kernels and contamination studies. See the **Introduction** (research questions) and **Methodology**, Problem Formalization, in the thesis PDF.
|
||||
|
||||
---
|
||||
|
||||
## 2. Business fit framing
|
||||
|
||||
**What PHANTOM is for:** Studying how **automated browsing and transaction orchestration** interact with **session-based pricing**: behavior generates a demand proxy $\hat{q}$; pricing policies map interaction history to prices; **Cost of Information (COI)** is the premium the platform can sustain above a floor when information is scarce. Agent-mediated **reconnaissance in one session** and **purchase in another** undermines that asymmetry; the thesis proves a **COI erosion** mechanism under many independent price queries.
|
||||
|
||||
**What you must supply:**
|
||||
|
||||
- A **product catalog** path: defaults assume Supabase-backed product data (`NEXT_PUBLIC_SUPABASE_URL`, `NEXT_PUBLIC_SUPABASE_ANON_KEY`).
|
||||
- A plan for **interaction and price events** reaching the ingestion path (backend → Kafka) or an adapter you maintain.
|
||||
- Clear **experiment goals:** e.g. compare human vs agent KPIs under the same task, measure margin under varying contamination $\alpha$.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Aggregate demand in the thesis is a **mixture** over human and agent types with contamination $\alpha$ plus noise $\epsilon_t$; see the mixture demand discussion in **Chapter 3 (Methodology)**. COI is defined as $\mathbb{E}[P]-\underline{p}$; the **COI framework** and theorem in the same chapter explain why saturated agent querying collapses extractable premium. Your business scenario determines which **actions** enter $\hat{q}$ and how interpretable $\alpha$ is for your traffic.
|
||||
|
||||
---
|
||||
|
||||
## 3. Environment and secrets
|
||||
|
||||
**Bootstrap files (from repo root):**
|
||||
|
||||
```bash
|
||||
npm install
|
||||
cp .env.example .env
|
||||
cp .env.sweep.example .env.sweep
|
||||
```
|
||||
|
||||
**Core `.env` (platform + web + docker):** See `[.env.example](.env.example)`. You must also set the variables called out in `[README.md](README.md)` for a full stack: `NEXT_PUBLIC_SUPABASE_URL`, `NEXT_PUBLIC_SUPABASE_ANON_KEY`, `AIRFLOW_FERNET_KEY`, `AIRFLOW_SECRET_KEY` (and provider ports per your compose file).
|
||||
|
||||
**Training / sweeps (`.env.sweep`):** Used by `make train`, `make benchmark`, sweep agents. Typically `WANDB_API_KEY`, optional `WANDB_ENTITY` / `WANDB_PROJECT`, `GITHUB_TOKEN` for bootstrap flows, `SWEEP_ID` for W&B sweep workers. See `[.env.sweep.example](.env.sweep.example)`.
|
||||
|
||||
**Security:** Never commit real `.env` or `.env.sweep` files. Rotate keys if they leak.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Splitting **online platform credentials** (ingestion, catalog, Kafka) from **offline training credentials** (W&B, cloud TPUs, GitHub tokens for workers) mirrors the **hybrid Kappa–Lambda** data loop in the thesis: streaming observation vs batch / long-running training jobs. That split is named in the **Terminology** appendix of the thesis PDF.
|
||||
|
||||
---
|
||||
|
||||
## 4. Bring-up (commands)
|
||||
|
||||
Aligned with `[README.md](README.md)`:
|
||||
|
||||
```bash
|
||||
npm install
|
||||
cp .env.example .env
|
||||
cp .env.sweep.example .env.sweep
|
||||
# edit .env: Supabase, Airflow keys, etc.
|
||||
|
||||
make platform.up
|
||||
make web.dev
|
||||
```
|
||||
|
||||
**Sanity checks:**
|
||||
|
||||
|
||||
| Endpoint | Role |
|
||||
| ------------------------------------------------------------- | --------------------------------- |
|
||||
| `http://localhost:3000` | Next.js storefront |
|
||||
| `http://localhost:5000/health` | Backend ingest API |
|
||||
| `http://localhost:5001/health` | Pricing provider |
|
||||
| `http://localhost:8085` | Airflow UI (default compose port) |
|
||||
| `http://localhost:8084` or configured `REDPANDA_CONSOLE_PORT` | Kafka console (see your `.env`) |
|
||||
|
||||
|
||||
**Optional tests:** `make test.backend` (with venv/tooling as in Makefile); `make test.e2e` requires backend, web, and Airflow up per README.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
A correctly wired stack logs **trajectories** $\tau_s$ (sequences of events) and **price exposure** together. **Chapter 3** defines events $e_{s,k}=(a,i,t)$ and proxies $\hat{q}$ from weighted actions—without joint logging of behavior and quotes, you cannot recover the objects the theory reasons about (Problem Formalization).
|
||||
|
||||
---
|
||||
|
||||
## 5. Service map
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
U[Human / Agent Browser] --> W[Next.js Web App]
|
||||
W -->|Price requests| P[Pricing Provider]
|
||||
W -->|Interaction events| B[Backend Ingest API]
|
||||
B --> K[Kafka]
|
||||
K --> A[Airflow + Worker Jobs]
|
||||
A --> R[Redis Model Registry]
|
||||
P -->|Session/global prices| W
|
||||
E[Research Engine + Experiments] --> A
|
||||
E --> R
|
||||
```
|
||||
|
||||
|
||||
|
||||
**Ports (typical; confirm in `docker-compose` and `.env`):** `BACKEND_PORT` (5000), `PROVIDER_PORT` (5001), `KAFKA_PORT`, `REDIS_PORT`, Airflow `AIRFLOW_WEBSERVER_PORT` (8085 default), Redpanda console.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
The platform **observes** behavioral proxies and quoted prices, not the latent demand curve $d(p\mid\theta)$. The distinction between $\hat{q}$ and true demand is explicit in **Chapter 3**. Misattributing proxy noise to “true” elasticity breaks both estimation and any causal story about COI.
|
||||
|
||||
---
|
||||
|
||||
## 6. Tailoring to your business
|
||||
|
||||
**Storefront mode:** `STORE_MODE=hotel` or `airline` (see `[web/src/lib/config.ts](web/src/lib/config.ts)` and env). This switches catalog and UI, not the core ingestion pattern.
|
||||
|
||||
**API base / environment:** `NEXT_PUBLIC_API_BASE`, `NEXT_PUBLIC_APP_ENV` (validated in `config.ts`).
|
||||
|
||||
**Paths for data and runs:** Override with `PHANTOM_DATA_DIR`, `PHANTOM_SIM_RUNS_DIR`, `PHANTOM_MODEL_REGISTRY_DIR`, `PHANTOM_COLLECTED_DATA_DIR`, etc. (`[lib/config.py](lib/config.py)`).
|
||||
|
||||
**Honest scope:** A new vertical (custom product ontology, checkout rules, pricing rules) means **new UI, events, and possibly new reward features** in the engine. Budget engineering time; the repo is a research platform, not a turnkey SaaS skin for arbitrary catalogs without code changes.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Transition kernels $\hat{\mathcal{T}}_H,\hat{\mathcal{T}}_A$ are estimated on a **finite action / state space** derived from your instrumentation. Changing catalog depth or event taxonomy changes the MDP state space; old kernel estimates are not portable. See the transition kernel discussion in **Chapter 3**.
|
||||
|
||||
---
|
||||
|
||||
## 7. Data collection and experiments
|
||||
|
||||
**Flow:** Browser → backend → **Kafka** → downstream consumers (Airflow DAGs, notebooks, ETL under `experiments/`). Ensure **session identity**, **item identifiers**, and **action types** are consistent enough to build trajectories.
|
||||
|
||||
**Weak labels:** The thesis discusses partitioning data into human vs agent subsets for MLE transition counts. In production you may only have heuristic labels—document bias explicitly.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Distinguishability (sub-question SQ1 in the **Introduction**) asks whether $H$ vs $A$ is identifiable from behavior alone. Your labeling and experimental design determine whether $\Delta_H,\Delta_A$ and $f(\tau)$ are meaningful or dominated by noise. Symbols appear in the **Terminology** appendix ($\Delta_H,\Delta_A$, $f(\tau)$, contamination generator $\mathcal{G}(\alpha)$).
|
||||
|
||||
---
|
||||
|
||||
## 8. Transition kernels and agent scoring (theory → practice)
|
||||
|
||||
**Theory:** Sessions yield trajectories $\tau_s$. For each actor class $y\inH,A$, the thesis estimates a **Markov transition kernel** by counting transitions and normalizing (MLE):
|
||||
|
||||
$$
|
||||
\hat{P}(s' \mid s) = \frac{N(s,s')}{\sum_k N(s,k)}
|
||||
$$
|
||||
|
||||
Human and agent prototypes $\hat{\mathcal{T}}_H,\hat{\mathcal{T}}_A$ support comparing an empirical kernel from a partial trajectory to prototypes (e.g. KL-style divergences $\Delta_H,\Delta_A$) and mapping to a **weak agent probability** $f(\tau)$. See **Chapter 3** and the **Terminology** appendix.
|
||||
|
||||
**Code:** `[engine/lib/coi.py](engine/lib/coi.py)` (`compute_agent_probability`: empirical transition counts vs human/agent reference dicts, KL-style terms, mapped via `[lib/agent_probability.py](lib/agent_probability.py)`).
|
||||
|
||||
**Optional narrative:** `[blog/02-behavioral-fingerprinting.md](blog/02-behavioral-fingerprinting.md)` walks a concrete study design (not required for operators).
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
If reference kernels are fit on **stale** or **mislabeled** partitions, $\Delta_H-\Delta_A$ is not interpretable as distinguishability. Ground claims in SQ1 (**Introduction**) and the kernel subsection of **Chapter 3**.
|
||||
|
||||
---
|
||||
|
||||
## 9. Contamination generator $\mathcal{G}(\alpha)$
|
||||
|
||||
**Theory:** Given clean trajectories, $\mathcal{G}(\alpha)$ injects synthetic agent trajectories until the effective mixture reaches contamination $\alpha\in[0,1]$, defining training scenarios for robust policies (**Chapter 3**). Catalog-scale block expansion of kernels is discussed there with validation caveats—treat large product spaces as **research-grade** until your team signs off.
|
||||
|
||||
**Code:** `[engine/engine.py](engine/engine.py)` — `MarketEngine` mixes human/agent demand, uses `get_adjusted_transitions` / `sample_behavior_from_transitions`, and `alpha` when combining actor types and building demand proxies (`estimate_demand`). This is the **simulator** path, not a drop-in replacement for your production database.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
$\alpha$ in mixture $Q(p)$ is **agentic demand contribution** in the formal model, not necessarily “bot share of page views” unless your instrumentation equates them. Mismeasuring $\alpha$ biases robust objectives tied to a fixed contamination level.
|
||||
|
||||
---
|
||||
|
||||
## 10. Training and evaluation — local workflow
|
||||
|
||||
**Environment:** Python venv via Nx (`make install` / `nx run research:install`). Training commands load `.env.sweep`.
|
||||
|
||||
```bash
|
||||
make train LOCAL_TRAIN_ARGS='--algo ppo --total-timesteps 50000'
|
||||
make benchmark LOCAL_BENCHMARK_ARGS='--tiers static,surge,linear,qtable,ppo --alpha-values 0.0,0.3 --episodes 3 --no-wandb'
|
||||
make benchmark.simple
|
||||
```
|
||||
|
||||
Entrypoints: `[engine/train.py](engine/train.py)`, `[engine/benchmark.py](engine/benchmark.py)`, `[engine/spec.py](engine/spec.py)` (Nx wraps these—see `project.json` / research targets).
|
||||
|
||||
**Artifacts:** `[lib/config.py](lib/config.py)` — `PHANTOM_SIM_RUNS_DIR` (default `sim/rl/runs`), `PHANTOM_MODEL_REGISTRY_DIR`, etc.
|
||||
|
||||
**TensorBoard (optional):** `[docker-compose.yml](docker-compose.yml)` includes `tensorboard-rl` on host port **6007** (`./sim/rl/runs`) and `tensorboard-ml` on **6006** (`./experiments/ml/runs`).
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Local runs instantiate the **offline defense gym**: policies trained on simulator-induced distributions approximate the DR-RL narrative in **Chapter 3**, but hyperparameters ($\lambda$ on COI leakage, $\eta$ on UX, robust radius) change the effective ambiguity set. Cross-check `engine/` against the thesis before claiming figure-for-figure replication.
|
||||
|
||||
---
|
||||
|
||||
## 11. Training and evaluation — remote / scaled deployment
|
||||
|
||||
For **research at scale** (cloud quota and secrets required):
|
||||
|
||||
|
||||
| Mechanism | Role |
|
||||
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `[submit_ray_job.sh](submit_ray_job.sh)` | Ray jobs with `.env` injected; `RAY_MODE=single|distributed|benchmark|sweep`. Set the script’s `ROOT` to your clone path. |
|
||||
| `make tpu.ray.bootstrap` / `tpu.ray.`* | TPU Ray bootstrap (`TPU_CONF`, e.g. `tpu_orchestration/configs/v4_spot_us.conf`). |
|
||||
| `make train.agent` / `make benchmark.agent` | W&B sweeps: `SWEEP_ID` in `.env.sweep`. |
|
||||
| `make train.bootstrap` | Worker bootstrap: `REPO_URL`, `SWEEP_ID`, `GITHUB_TOKEN`. |
|
||||
| `make docker.train.publish` | Trainer image (`TRAIN_IMAGE_REF` in Makefile). |
|
||||
|
||||
|
||||
See `submit_ray_job.sh` for env vars (`WANDB_*`, `PHANTOM_*` TPU toggles).
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Distributed training does not change the **definitions** of the Stackelberg game or Wasserstein ambiguity; it changes compute and variance of empirical estimates. Align random seeds and data protocol across nodes or split results explicitly—otherwise you mix distributions in a way a single empirical law $\hat{P}_N$ in the thesis does not describe.
|
||||
|
||||
---
|
||||
|
||||
## 12. Evaluation, artifacts, and audit trail
|
||||
|
||||
**Benchmarks:** `make benchmark`* sweeps tiers and $\alpha$; CLI includes robustness knobs (see default `BENCHMARK_ARGS` in `submit_ray_job.sh`: `--robust-radius`, `--lambda-coi`, `--eta-ux`, etc.).
|
||||
|
||||
**Audit trail:** Store `git` SHA, CLI argv, non-secret `.env.sweep` keys, and W&B run IDs with published tables. For scientific claims, cite **Chapters 4–5 (Results, Discussion)** in the thesis PDF.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Evaluation quality equals **simulator fidelity** plus **contamination modeling**. Separate theorem statements (assumption-based) from empirical curves (`engine`-dependent).
|
||||
|
||||
---
|
||||
|
||||
## 13. Operational suggestions
|
||||
|
||||
- **Staging:** Non-production namespaces; separate Kafka topics and Supabase projects where possible.
|
||||
- **Rate limits / abuse:** Protect ingest endpoints; respect participant privacy.
|
||||
- **Human vs agent sessions:** Comparable cohorts; record experimental condition in metadata.
|
||||
- **Contracts:** `tests/e2e/` encodes minimal flows—use when APIs change.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Non-stationary noise $\epsilon_t$ and drifting $\alpha$ confound benchmark interpretation. **Chapter 3** discusses mixture identification: isolate treatments when possible and document confounders when not.
|
||||
|
||||
---
|
||||
|
||||
## 14. Roadmap / gaps (honesty)
|
||||
|
||||
**Relatively turnkey:** Local dockerized stack, demo verticals, engine benchmarks, documented env and paths.
|
||||
|
||||
**Typically custom:** Production catalog without Supabase, identity/fraud layers, legal review of logging, Kafka/Airflow SLAs, hardening the pricing provider for real money.
|
||||
|
||||
**Thesis vs code:** The PDF is the **spec**; not every robustness term or large-catalog kernel construction is production-verified—see caveats in **Chapter 3**.
|
||||
|
||||
### Theoretical implications
|
||||
|
||||
Theorems in the thesis can be **stronger** than what observational firm logs support. The COI result assumes a clean experimental reading of the pricing policy; live market data may only support weaker claims.
|
||||
|
||||
---
|
||||
|
||||
## 15. Theory and thesis cross-references (quick index)
|
||||
|
||||
Use the **PDF table of contents** with these anchors:
|
||||
|
||||
|
||||
| Topic | Thesis location |
|
||||
| -------------------------------------------------------------------------- | ----------------------------------------------------- |
|
||||
| Research questions (margin, distinguishability, contamination, mitigation) | **Introduction** |
|
||||
| Sessions, events, $\hat{q}$, mixture $Q(p)$, $\alpha$ | **Chapter 3** — Problem Formalization, mixture demand |
|
||||
| COI definition and erosion theorem | **Chapter 3** — COI framework |
|
||||
| Transition kernels, MLE, $\mathcal{G}(\alpha)$ | **Chapter 3** |
|
||||
| DR-RL, ambiguity sets, Stackelberg | **Chapter 3** |
|
||||
| Symbol glossary (COI leakage, $f(\tau)$, UX, surrogates) | **Appendix — Terminology** |
|
||||
| Empirical results and limitations | **Chapters 4–5** |
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 16. Quick file index (code)
|
||||
|
||||
|
||||
| File | Role |
|
||||
| ---------------------------------------------------------------------------------- | -------------------------------------------------- |
|
||||
| `[engine/lib/coi.py](engine/lib/coi.py)` | KL-style trajectory comparison; agent probability. |
|
||||
| `[engine/engine.py](engine/engine.py)` | `MarketEngine`, mixture, demand proxy path. |
|
||||
| `[lib/agent_probability.py](lib/agent_probability.py)` | Divergence → probability score. |
|
||||
| `[lib/config.py](lib/config.py)` | Paths and ports for artifacts. |
|
||||
| `[engine/train.py](engine/train.py)`, `[engine/benchmark.py](engine/benchmark.py)` | CLI entrypoints. |
|
||||
| `[tpu_orchestration/](tpu_orchestration/)` | TPU configs and helpers. |
|
||||
|
||||
|
||||
You do **not** need a running storefront for many **offline** benchmarks if the research Python environment is installed; you **do** need aligned instrumentation to connect production trajectories to kernel estimation.
|
||||
Reference in New Issue
Block a user