P4P Interaction Layer
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
new file mode 100644
index 0000000..01f94e1
--- /dev/null
+++ b/docs/mkdocs.yml
@@ -0,0 +1,53 @@
+site_name: PHANTOM Platform
+site_description: Operator and research documentation for the PHANTOM dynamic pricing research platform.
+site_url: https://velocitatem.github.io/PHANTOM/documentation/
+site_author: Daniel Rösel
+
+repo_url: https://github.com/velocitatem/PHANTOM
+repo_name: velocitatem/PHANTOM
+
+docs_dir: src
+site_dir: documentation
+strict: true
+
+theme:
+ name: material
+ palette:
+ - scheme: default
+ primary: indigo
+ toggle:
+ icon: material/brightness-7
+ name: Switch to dark mode
+ - scheme: slate
+ primary: indigo
+ toggle:
+ icon: material/brightness-4
+ name: Switch to light mode
+ features:
+ - navigation.instant
+ - navigation.tracking
+ - content.code.copy
+ - search.suggest
+ - search.highlight
+
+nav:
+ - Home: index.md
+ - Setup: platform-setup.md
+ - Business overview: business.md
+ - Architecture: architecture.md
+ - Configuration: configuration.md
+ - Glossary: glossary.md
+ - Roadmap & implementation notes: roadmap.md
+
+markdown_extensions:
+ - pymdownx.snippets:
+ base_path:
+ - ..
+ - pymdownx.superfences
+ - admonition
+ - tables
+ - toc:
+ permalink: true
+
+plugins:
+ - search
diff --git a/docs/requirements.txt b/docs/requirements.txt
new file mode 100644
index 0000000..d14bca3
--- /dev/null
+++ b/docs/requirements.txt
@@ -0,0 +1 @@
+mkdocs-material>=9.5,<10
diff --git a/docs/src/architecture.md b/docs/src/architecture.md
new file mode 100644
index 0000000..9da03b3
--- /dev/null
+++ b/docs/src/architecture.md
@@ -0,0 +1,30 @@
+# Architecture
+
+## System map
+
+```mermaid
+flowchart LR
+ U[Human / Agent Browser] --> W[Next.js Web App]
+ W -->|Price requests| P[Pricing Provider]
+ W -->|Interaction events| B[Backend Ingest API]
+ B --> K[Kafka]
+ K --> A[Airflow + Worker Jobs]
+ A --> R[Redis Model Registry]
+ P -->|Session/global prices| W
+ E[Research Engine + Experiments] --> A
+ E --> R
+```
+
+
+
+## Event and training path (conceptual)
+
+1. **Online:** The browser emits events; the backend publishes to **Kafka**; schedulers and workers consume for ETL and model registry updates.
+2. **Offline:** Notebooks and scripts under `experiments/` transform logs; `**engine/`** runs simulations, training, and benchmarks; artifacts land under paths from `[lib/config.py](https://github.com/velocitatem/PHANTOM/blob/main/lib/config.py)`.
+3. **Feedback:** Trained or rule-based policies surface through the **pricing provider** to the web app.
+
+## Where to read more
+
+- Ports and health checks: [README](https://github.com/velocitatem/PHANTOM/blob/main/README.md) and [Configuration](configuration.md).
+- Formal notation for sessions, $\hat{q}$, and mixture demand: **Chapter 3 (Methodology)** in the thesis PDF.
+
diff --git a/docs/src/business.md b/docs/src/business.md
new file mode 100644
index 0000000..a6dc8bb
--- /dev/null
+++ b/docs/src/business.md
@@ -0,0 +1,21 @@
+# Business overview
+
+PHANTOM targets **platform operators and researchers** who need to:
+
+1. **Observe** session-level behavior and price quotes together (trajectories and policies—not just clicks).
+2. **Separate** human-driven demand signals from agent-mediated reconnaissance where possible (distinguishability and contamination \alpha in the thesis).
+3. **Evaluate** pricing policies that remain useful when **Cost of Information (COI)** is under pressure from automated querying (formal COI framework and theorem in the thesis PDF).
+
+## What this product is not
+
+- A drop-in fraud API that returns “bot score” for every request without your event schema.
+- A certified compliance guarantee for regulated pricing: it is a **research stack** with configurable experiments.
+- A hosted SaaS: you run the stack (or adapt components) under your infrastructure policy.
+
+## Self-service story (ideal path)
+
+A team connects their **catalog** (today: Supabase-backed flows in this repo), streams **interaction events** through the ingest path, runs **labeled or weak-labeled** human vs agent sessions, estimates **behavioral kernels**, varies **contamination** in simulation, and **trains or benchmarks** robust policies via `engine/`. Steps and caveats are in [Setup](platform-setup.md) (same content as root `SETUP.md`).
+
+## Thesis link
+
+Problem statement, contributions, and research questions: **Introduction** and abstract in the [thesis PDF](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf).
\ No newline at end of file
diff --git a/docs/src/configuration.md b/docs/src/configuration.md
new file mode 100644
index 0000000..73d7438
--- /dev/null
+++ b/docs/src/configuration.md
@@ -0,0 +1,63 @@
+# Configuration reference
+
+This page condenses tables from `[README.md](https://github.com/velocitatem/PHANTOM/blob/main/README.md)` and points to code. Authoritative env templates: `[.env.example](https://github.com/velocitatem/PHANTOM/blob/main/.env.example)`, `[.env.sweep.example](https://github.com/velocitatem/PHANTOM/blob/main/.env.sweep.example)`.
+
+## Core runtime (`.env`)
+
+
+| Variable | Purpose | Typical value |
+| ------------------------------- | ------------------------------ | ----------------------- |
+| `STORE_MODE` | Web mode (`hotel` / `airline`) | `hotel` |
+| `BACKEND_PORT` | Backend API | `5000` |
+| `PROVIDER_PORT` | Pricing provider | `5001` |
+| `KAFKA_HOST` | Kafka broker host | `localhost` |
+| `KAFKA_PORT` | Kafka port | `9092` |
+| `REDIS_PORT` | Redis port | `6377` |
+| `REDPANDA_CONSOLE_PORT` | Kafka UI | `8084` (see compose) |
+| `NEXT_PUBLIC_SUPABASE_URL` | Catalog / data | required for full stack |
+| `NEXT_PUBLIC_SUPABASE_ANON_KEY` | Catalog / data | required |
+| `AIRFLOW_FERNET_KEY` | Airflow | required |
+| `AIRFLOW_SECRET_KEY` | Airflow web | required |
+
+
+Web client validation: `[web/src/lib/config.ts](https://github.com/velocitatem/PHANTOM/blob/main/web/src/lib/config.ts)`.
+
+## Training / sweeps (`.env.sweep`)
+
+
+| Variable | Purpose |
+| --------------- | ----------------------------------------------- |
+| `WANDB_API_KEY` | Weights & Biases |
+| `WANDB_ENTITY` | Optional override |
+| `WANDB_PROJECT` | Project name (default `capstone`) |
+| `GITHUB_TOKEN` | Bootstrap / workers |
+| `SWEEP_ID` | Sweep agents (`train.agent`, `benchmark.agent`) |
+
+
+## Path overrides (`PHANTOM_*`)
+
+Defined in `[lib/config.py](https://github.com/velocitatem/PHANTOM/blob/main/lib/config.py)`:
+
+
+| Variable | Default (conceptual) |
+| ---------------------------- | ----------------------------------- |
+| `PHANTOM_DATA_DIR` | `data/` |
+| `PHANTOM_EXPERIMENTS_DIR` | `experiments/` |
+| `PHANTOM_SIM_RUNS_DIR` | `sim/rl/runs` |
+| `PHANTOM_MODEL_REGISTRY_DIR` | `data/models` |
+| `PHANTOM_COLLECTED_DATA_DIR` | `experiments/agents/collected_data` |
+
+
+## Makefile entrypoints
+
+
+| Goal | Command |
+| ---------------- | ------------------------------------------- |
+| Platform up/down | `make platform.up` / `make platform.down` |
+| Web dev | `make web.dev` |
+| Train | `make train` (+ `LOCAL_TRAIN_ARGS`) |
+| Benchmark | `make benchmark` (+ `LOCAL_BENCHMARK_ARGS`) |
+| Docs site | `make docs.platform` |
+
+
+See `make help` for the full list.
\ No newline at end of file
diff --git a/docs/src/glossary.md b/docs/src/glossary.md
new file mode 100644
index 0000000..5774101
--- /dev/null
+++ b/docs/src/glossary.md
@@ -0,0 +1,17 @@
+# Glossary
+
+Short definitions point to the thesis **Terminology** appendix in the [PDF](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf) for full precision.
+
+| Term | Meaning (operational) |
+| --- | --- |
+| **COI (Cost of Information)** | Expected price premium above a floor under the platform’s policy; thesis KPI for pricing power. |
+| **Trajectory \(\tau_s\)** | Ordered session events used as the behavioral record. |
+| **Demand proxy \(\hat{q}\)** | Weighted aggregation of actions—what the platform observes instead of true demand. |
+| **Contamination \(\alpha\)** | Agent share in the mixture demand model (thesis); not automatically “% of bots” in raw logs. |
+| **Transition kernel \(\hat{\mathcal{T}}\)** | MLE Markov model over behavioral states / events for class \(H\) or \(A\). |
+| **\(\Delta_H,\Delta_A\)** | Divergence scores vs human/agent prototypes (thesis notation). |
+| **\(f(\tau)\)** | Weak agent probability from trajectory (implementation: `engine/lib/coi.py`). |
+| **\(\mathcal{G}(\alpha)\)** | Contamination generator: synthetic agent trajectories to reach mixture level \(\alpha\). |
+| **DR-RL** | Distributionally robust reinforcement learning training narrative in the thesis. |
+| **Ambiguity set / Wasserstein** | Robust optimization neighborhood around an empirical demand law. |
+| **Kappa–Lambda architecture** | Thesis term for streaming (online) vs batch/offline learning loops. |
diff --git a/docs/src/index.md b/docs/src/index.md
new file mode 100644
index 0000000..caa59e9
--- /dev/null
+++ b/docs/src/index.md
@@ -0,0 +1,21 @@
+# PHANTOM platform documentation
+
+Welcome. This site mirrors the **operator and research** documentation for the PHANTOM repository: a research platform for studying **dynamic pricing** under **LLM-mediated browsing and transaction orchestration**, with ties to the academic thesis.
+
+## Start here
+
+| Document | Audience |
+| --- | --- |
+| [Setup](platform-setup.md) | Full walkthrough: Docker/web/ingest, kernels, contamination, RL training, and audit—content from `SETUP.md` in the repo. |
+| [Configuration reference](configuration.md) | Env vars, paths, and Makefile entrypoints in one place. |
+| [Roadmap & implementation notes](roadmap.md) | What is turnkey vs research-grade; thesis vs code. |
+
+## Canonical sources in the repo
+
+- Thesis PDF: [thesis-latest.pdf](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf)
+- Root onboarding: single file [`SETUP.md`](https://github.com/velocitatem/PHANTOM/blob/main/SETUP.md) (included on this site via snippets—edit that file to change content).
+- Quick start and command tables: [`README.md`](https://github.com/velocitatem/PHANTOM/blob/main/README.md)
+
+## Academic project page
+
+The research landing page (figures, abstract, links) is the site root on GitHub Pages: [velocitatem.github.io/PHANTOM/](https://velocitatem.github.io/PHANTOM/). Open **Documentation** in the Project Links menu there to return to this subsite.
diff --git a/docs/src/platform-setup.md b/docs/src/platform-setup.md
new file mode 100644
index 0000000..682f010
--- /dev/null
+++ b/docs/src/platform-setup.md
@@ -0,0 +1,5 @@
+# Setup
+
+The content below is included from the repository root file `SETUP.md` (single source of truth: platform bring-up, kernels, contamination, RL training, and thesis pointers by chapter).
+
+--8<-- "SETUP.md"
diff --git a/docs/src/roadmap.md b/docs/src/roadmap.md
new file mode 100644
index 0000000..d16f496
--- /dev/null
+++ b/docs/src/roadmap.md
@@ -0,0 +1,26 @@
+# Roadmap & implementation notes
+
+This page is the **honesty pass** from the documentation plan: what clients can expect today versus what remains research-heavy.
+
+## Turnkey in this repository
+
+- **Local stack:** Docker Compose services for backend, Kafka, Redis, Airflow, pricing provider, etc.; Next.js via `make web.dev` (see [Platform setup](platform-setup.md)).
+- **Demo verticals:** `hotel` and `airline` storefront modes.
+- **Engine:** Benchmarks and training entrypoints (`make train`, `make benchmark`), KL-based agent scoring in `[engine/lib/coi.py](https://github.com/velocitatem/PHANTOM/blob/main/engine/lib/coi.py)`, simulator mixing in `[engine/engine.py](https://github.com/velocitatem/PHANTOM/blob/main/engine/engine.py)`.
+- **Orchestration hooks:** Ray/TPU scripts (`submit_ray_job.sh`, `make tpu.ray.`*), W&B sweep agents, Docker trainer publish target.
+
+## Usually requires custom engineering
+
+- **Non-Supabase catalog** or checkout flows without adapting the web + backend contracts.
+- **Production SLAs** on Kafka, schema registry, or PII boundaries for your jurisdiction.
+- **Tight coupling** to a legacy pricing engine without mapping its API to the provider abstraction.
+
+## Thesis vs code
+
+- The **thesis** states theorems and constructions (COI erosion, kernels, \mathcal{G}(\alpha), DR-RL).
+- The **codebase** implements a **subset** of that story for experiments: verify CLI flags and simulator assumptions before claiming 1:1 equivalence with every equation.
+- **Catalog-scale kernel expansion** is discussed in **Chapter 3** with explicit validation caveats—do not assume row-stochasticity and Markov structure are automatically preserved at full product cardinality without review.
+
+## Suggested client messaging
+
+Position PHANTOM as a **reproducible research and evaluation stack** for agent-aware pricing, with a path to custom integration—not as a black-box “turn on anti-agent pricing” product without data and engineering investment.
\ No newline at end of file
diff --git a/paper/src/chapters/mdp_agent.pdf b/paper/src/chapters/mdp_agent.pdf
index 17d299e..b0911f1 100644
Binary files a/paper/src/chapters/mdp_agent.pdf and b/paper/src/chapters/mdp_agent.pdf differ
diff --git a/paper/src/chapters/mdp_human.pdf b/paper/src/chapters/mdp_human.pdf
index af63cd5..cced37d 100644
Binary files a/paper/src/chapters/mdp_human.pdf and b/paper/src/chapters/mdp_human.pdf differ
diff --git a/paper/src/main.tex b/paper/src/main.tex
index 2046342..c3422cc 100644
--- a/paper/src/main.tex
+++ b/paper/src/main.tex
@@ -18,7 +18,7 @@
\end{titlepage}
\begin{abstract}
-With accelerated growth of Lager Language Model agents in e-commerce a novel adversarial dynamic to digital markets emerges. This paper address the vulnerability of dynamic pricing systems to AI intermediaries that decouple the information gather stages from the transaction execution. By conducing reconnaissance isolates sessions, agents circumvent the ``Cost of Information'' (COI) defined as the accumulated price premium typically thought demand expression estimators.
+With accelerated growth of Large Language Model agents in e-commerce a novel adversarial dynamic to digital markets emerges. This paper address the vulnerability of dynamic pricing systems to AI intermediaries that decouple the information gather stages from the transaction execution. By conducing reconnaissance isolates sessions, agents circumvent the ``Cost of Information'' (COI) defined as the accumulated price premium typically thought demand expression estimators.
We formally define this phenomenon and derive the Cost of Information Theorem, proving that as the saturation of independent, utility-maximizing agents increases, the platform’s ability to sustain a COI converges to zero, rendering standard dynamic pricing mechanisms incentive-incompatible.
To respond to this threat we propose a defensive framework which integrates behavioral economics with Adversarially Distributionally Robust Optimization (DRO). We introduce a custom e-commerce research platform built on hybrid Kappa-Lambda architecture, designed to capture and simulate high-fidelity controlled interaction trajectories. We further demonstrate through modeling that human and agent behaviors exhibit distinct transition probability kernels, enabling the construction of discriminative models based on Kullback-Leibler divergence.
These behavioral signals serve as inputs for a Distributionally Robust Reinforcement Learning (DR-RL) agent. We formulate the pricing problem as a Stackelberg game where the learner optimizes against an ambiguity set of demand distributions defined by the Wasserstein distance. This approach allows the pricing policy to remain robust against non-stationary contamination without overfitting to deterministic demand curves. The research validates a mechanism for preserving margin integrity and market equilibrium in an agent-mediated economy, while minimizing degradation to the legitimate human user experience (UX).