diff --git a/SETUP.md b/SETUP.md index 0fc72a9..5f9623e 100644 --- a/SETUP.md +++ b/SETUP.md @@ -26,7 +26,9 @@ The formal model assumes each session is generated by a latent **actor class** $ ## 2. Business fit framing -**What PHANTOM is for:** Studying how **automated browsing and transaction orchestration** interact with **session-based pricing**: behavior generates a demand proxy $\hat{q}$; pricing policies map interaction history to prices; **Cost of Information (COI)** is the premium the platform can sustain above a floor when information is scarce. Agent-mediated **reconnaissance in one session** and **purchase in another** undermines that asymmetry; the thesis proves a **COI erosion** mechanism under many independent price queries. +**The problem PHANTOM addresses:** Session-based pricing accumulates demand signals across a user's browsing history and raises quoted prices accordingly—the **Cost of Information (COI)** premium. LLM agents undercut this by separating reconnaissance (many isolated sessions, no signal accumulation) from execution (a clean session that quotes a floor price). The thesis proves that as the number of independent querying agents grows, the realizable price collapses to a minimum order statistic and COI approaches zero. + +**What PHANTOM gives you:** A controlled platform to measure how much COI is at risk under real agent traffic, simulate that risk across contamination levels $\alpha \in [0,1]$, and train pricing policies that remain robust. The pipeline runs from raw interaction logs through behavioral kernel estimation and a contamination generator to a DR-RL gym. **What you must supply:** diff --git a/docs/index.html b/docs/index.html index 89062f6..7aa0c21 100644 --- a/docs/index.html +++ b/docs/index.html @@ -340,10 +340,13 @@

Abstract

- When you shop online, prices often change based on how much interest you show — the more you browse, the more the site learns about your intent and may raise prices accordingly. This works because stores assume that a curious, engaged shopper is more likely to buy. But AI assistants are now doing the shopping research on behalf of users: they browse in one session to gather price information and then let the user purchase in a fresh session at the lower, unadjusted price. The store never sees the connection between the two, so it never gets to factor in that genuine intent — and loses the revenue it would have earned. + Dynamic pricing extracts margin by exploiting the gap between what a platform knows and what a buyer knows. A user who browses a hotel across several sessions signals intent; the platform raises the price accordingly. That information asymmetry — the Cost of Information — is the economic engine behind session-based pricing in travel, hospitality, and e-commerce.

- PHANTOM studies this problem and builds defenses against it. We created a realistic fake store (in hotel and airline modes) where both real people and AI agents were given shopping tasks, and we recorded every click, scroll, and page visit. By comparing how humans and AI agents move through a site, we found clear patterns that tell them apart. We then used those patterns to build a smarter pricing system that can recognize when it is likely talking to an AI scout and adjust its strategy accordingly — protecting the store's margins without making things worse for genuine shoppers. + LLM agents break the engine. An agent conducting reconnaissance in isolated sessions accumulates zero demand signal, then routes the purchase through a clean session at the floor price. As the number of independent querying agents grows, the realizable price converges to its minimum order statistic and COI collapses to zero. This is not a future risk; it is a structural failure mode in any pricing system that treats sessions independently. +

+

+ PHANTOM formalizes the failure, measures it on real human and agent interaction data, and builds a defense. We prove the COI erosion theorem, collect 29 labeled sessions (13 human, 16 agent) across hotel and airline storefronts under goal-driven tasks, learn class-specific Markov transition kernels, and train a Distributionally Robust RL pricing policy over a Wasserstein ambiguity set. Behavioral separability is statistically significant (Mann–Whitney U = 2.0, p = 0.0006). The per-session agent probability signal f(τ) feeds directly into the robust policy reward as a COI-leakage penalty.

@@ -355,18 +358,30 @@
-

Project Scope

+

How it works

- The current thesis revision extends both theory and implementation. The main research question is how a pricing system can preserve margin integrity when browsing and purchasing are increasingly orchestrated by AI agents. + The methodology runs in three stages: observe, distinguish, defend.

-
    -
  • Formal contribution: a Cost of Information erosion theorem showing why price-query saturation can collapse dynamic pricing power.
  • -
  • System contribution: a hybrid online/offline stack (Next.js storefront, pricing provider, Kafka event streams, Airflow ETL, Redis serving layer).
  • -
  • Modeling contribution: class-specific transition kernels for human and agent behavior, with KL-divergence based separability scores.
  • -
  • Control contribution: a contamination-aware DR-RL pricing policy trained under distributional uncertainty using Wasserstein-style robustness.
  • -
+ +

Stage 1 — Observe

- Controlled trials currently include balanced human and agent sessions with goal-driven tasks across hotel and airline interfaces. Early separability results are strong (Mann-Whitney U=2.0, p=0.0006), while robust pricing gains remain regime-dependent and are being calibrated in larger sweeps. + Both human participants and LLM agents are assigned goal-driven tasks on a live instrumented storefront (hotel or airline mode). Every interaction is logged as a timestamped event tuple (action, item, timestamp). Actions are partitioned into four semantic categories — cart, dwell, navigation, filter — with decreasing signal weights (4.0, 2.0, 1.0, 0.5) calibrated by the KL divergence between human and agent transition rows. Price quotes are streamed to a separate Kafka topic, enabling joint analysis of behavior and pricing exposure. The platform runs a surge-discount heuristic during collection to expose participants to state-dependent prices. +

+ +

Stage 2 — Distinguish

+

+ From the labeled session trajectories, we estimate class-specific Markov transition kernels H and A by maximum likelihood. For any new partial trajectory τ', we compute KL divergence to each prototype: +

+

+ ΔH = DKL(T̂' ∥ T̄H),   ΔA = DKL(T̂' ∥ T̄A) +

+

+ The gap score g(τ') = ΔH − ΔA maps to a weak agent probability via a temperature-controlled logistic function: f(τ') = σ((ΔH − ΔA) / T). This is a continuous signal, not a binary bot flag. The Mann–Whitney test on gap scores between the 13-human and 16-agent cohorts yields U = 2.0, p = 0.0006 — the behavioral distributions are well separated. +

+ +

Stage 3 — Defend

+

+ A contamination generator G(α) mixes real human trajectories with synthetic agent trajectories drawn from A to produce training distributions at any contamination level α ∈ [0, 1]. The pricing policy is trained as a Stackelberg leader against a Wasserstein ambiguity set around the generator's empirical distribution, minimizing worst-case regret over plausible demand shifts. The per-step reward penalizes COI leakage — weighted by f(τ') — while a UX index bounds harm to legitimate users. Sweeps ran across 384 TPU chips (v4, v5e, v6e Trillium) covering six contamination levels and multiple algorithm variants (PPO, A2C, DQN, Q-table).

diff --git a/docs/src/business.md b/docs/src/business.md index a6dc8bb..faed0b1 100644 --- a/docs/src/business.md +++ b/docs/src/business.md @@ -1,21 +1,39 @@ # Business overview -PHANTOM targets **platform operators and researchers** who need to: +Dynamic pricing extracts margin by exploiting the information asymmetry between buyer and seller. When a user browses a flight or hotel across multiple sessions, each interaction accumulates demand signals that push the quoted price upward. That is the mechanism working as intended. -1. **Observe** session-level behavior and price quotes together (trajectories and policies—not just clicks). -2. **Separate** human-driven demand signals from agent-mediated reconnaissance where possible (distinguishability and contamination \alpha in the thesis). -3. **Evaluate** pricing policies that remain useful when **Cost of Information (COI)** is under pressure from automated querying (formal COI framework and theorem in the thesis PDF). +LLM agents break it. An agent can conduct reconnaissance—across dozens of isolated sessions, at machine speed—and then execute a purchase through a clean session that looks like a first-time visitor. The platform sees a low-engagement session and quotes a floor price. The margin that should have been captured, the **Cost of Information (COI)**, vanishes. At scale this is not a theoretical concern; it is a structural leak in any session-based pricing system. -## What this product is not +**PHANTOM is a research platform for studying and defending against that leak.** -- A drop-in fraud API that returns “bot score” for every request without your event schema. -- A certified compliance guarantee for regulated pricing: it is a **research stack** with configurable experiments. -- A hosted SaaS: you run the stack (or adapt components) under your infrastructure policy. +## Who it is for -## Self-service story (ideal path) +| Role | What they get | +|---|---| +| Pricing and revenue researchers | A controlled lab with instrumented human and agent sessions, behavioral kernel estimation, and contamination simulation at configurable levels | +| Platform engineers evaluating agent risk | A concrete pipeline from behavioral event logs to a per-session agent-probability signal, ready to feed into an existing pricing provider | +| RL practitioners | A Distributionally Robust RL gym built on a Wasserstein ambiguity set, with benchmark tiers and sweep tooling out of the box | -A team connects their **catalog** (today: Supabase-backed flows in this repo), streams **interaction events** through the ingest path, runs **labeled or weak-labeled** human vs agent sessions, estimates **behavioral kernels**, varies **contamination** in simulation, and **trains or benchmarks** robust policies via `engine/`. Steps and caveats are in [Setup](platform-setup.md) (same content as root `SETUP.md`). +## Core capabilities -## Thesis link +**Behavioral fingerprinting.** PHANTOM logs interaction trajectories at the event level (action, item, timestamp) and fits separate Markov transition kernels for human and agent sessions via MLE. Per-session divergence scores (Δ_H, Δ_A) and a learned agent-probability signal f(τ) are computed on partial trajectories in real time, giving the pricing layer a continuous signal rather than a binary bot flag. -Problem statement, contributions, and research questions: **Introduction** and abstract in the [thesis PDF](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf). \ No newline at end of file +**Contamination simulation.** The contamination generator G(α) mixes real human trajectories with synthetic agent trajectories at a configurable ratio α. This lets you evaluate pricing robustness across the full spectrum from purely human traffic to fully automated demand, without needing live agent traffic in production. + +**Robust policy training.** The defense gym trains pricing policies against the worst-case demand distribution within a Wasserstein ball around the generator's empirical distribution. The reward function penalizes COI leakage (weighted by agent probability) while bounding UX degradation for legitimate users. + +## The path from logs to defense + +A team: connects their catalog and ingest path → streams interaction events through Kafka → labels or weak-labels sessions → estimates behavioral kernels → varies α in simulation → trains and benchmarks robust policies. The full walkthrough is in [Setup](platform-setup.md). + +## Scope and honest caveats + +This is a **research stack**, not a hosted service: + +- It ships two demo verticals (`hotel`, `airline`); a new catalog requires engineering work on events and reward features. +- Kernel estimates are research-grade until validated on your traffic distribution. +- There is no built-in compliance layer for regulated pricing markets. + +The thesis PDF contains the formal proofs, the COI erosion theorem, and the full DR-RL formulation. The code operationalizes those constructs—every term in the reward function maps to something computed from your logs. + +**Thesis PDF:** [thesis-latest.pdf](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf) — Introduction and Chapter 3 cover the problem statement, contributions, and formal model. \ No newline at end of file diff --git a/docs/src/index.md b/docs/src/index.md index caa59e9..307b053 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,21 +1,23 @@ -# PHANTOM platform documentation +# PHANTOM -Welcome. This site mirrors the **operator and research** documentation for the PHANTOM repository: a research platform for studying **dynamic pricing** under **LLM-mediated browsing and transaction orchestration**, with ties to the academic thesis. +LLM agents are quietly eroding the pricing power of dynamic pricing systems. They conduct reconnaissance across isolated sessions at machine speed and execute purchases through clean sessions that quote floor prices. The margin that should have accumulated never does. -## Start here +PHANTOM is a research platform for measuring, simulating, and defending against that erosion. It provides behavioral fingerprinting of human vs agent sessions, a contamination generator for controlled experiments, and a Distributionally Robust RL gym for training pricing policies that hold up under automated demand. -| Document | Audience | +--- + +## Where to start + +| Document | What it covers | | --- | --- | -| [Setup](platform-setup.md) | Full walkthrough: Docker/web/ingest, kernels, contamination, RL training, and audit—content from `SETUP.md` in the repo. | -| [Configuration reference](configuration.md) | Env vars, paths, and Makefile entrypoints in one place. | -| [Roadmap & implementation notes](roadmap.md) | What is turnkey vs research-grade; thesis vs code. | +| [Business overview](business.md) | The problem, capabilities, and who this is for | +| [Setup](platform-setup.md) | Full bring-up: Docker stack, ingest, behavioral kernels, contamination, RL training | +| [Architecture](architecture.md) | Service map and data flow | +| [Configuration reference](configuration.md) | Env vars, paths, and Makefile targets | +| [Roadmap & notes](roadmap.md) | What is turnkey vs research-grade | -## Canonical sources in the repo +## Key references -- Thesis PDF: [thesis-latest.pdf](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf) -- Root onboarding: single file [`SETUP.md`](https://github.com/velocitatem/PHANTOM/blob/main/SETUP.md) (included on this site via snippets—edit that file to change content). -- Quick start and command tables: [`README.md`](https://github.com/velocitatem/PHANTOM/blob/main/README.md) - -## Academic project page - -The research landing page (figures, abstract, links) is the site root on GitHub Pages: [velocitatem.github.io/PHANTOM/](https://velocitatem.github.io/PHANTOM/). Open **Documentation** in the Project Links menu there to return to this subsite. +- **Thesis PDF:** [thesis-latest.pdf](https://pub-d5b94a3c29fd40c6b3881946e463fdb7.r2.dev/thesis-latest.pdf) — formal model, COI erosion proof, DR-RL formulation +- **Repo root:** [`SETUP.md`](https://github.com/velocitatem/PHANTOM/blob/main/SETUP.md) | [`README.md`](https://github.com/velocitatem/PHANTOM/blob/main/README.md) +- **Academic landing page:** [velocitatem.github.io/PHANTOM/](https://velocitatem.github.io/PHANTOM/)