diff --git a/SETUP.md b/SETUP.md index 0fc72a9..5f9623e 100644 --- a/SETUP.md +++ b/SETUP.md @@ -26,7 +26,9 @@ The formal model assumes each session is generated by a latent **actor class** $ ## 2. Business fit framing -**What PHANTOM is for:** Studying how **automated browsing and transaction orchestration** interact with **session-based pricing**: behavior generates a demand proxy $\hat{q}$; pricing policies map interaction history to prices; **Cost of Information (COI)** is the premium the platform can sustain above a floor when information is scarce. Agent-mediated **reconnaissance in one session** and **purchase in another** undermines that asymmetry; the thesis proves a **COI erosion** mechanism under many independent price queries. +**The problem PHANTOM addresses:** Session-based pricing accumulates demand signals across a user's browsing history and raises quoted prices accordingly—the **Cost of Information (COI)** premium. LLM agents undercut this by separating reconnaissance (many isolated sessions, no signal accumulation) from execution (a clean session that quotes a floor price). The thesis proves that as the number of independent querying agents grows, the realizable price collapses to a minimum order statistic and COI approaches zero. + +**What PHANTOM gives you:** A controlled platform to measure how much COI is at risk under real agent traffic, simulate that risk across contamination levels $\alpha \in [0,1]$, and train pricing policies that remain robust. The pipeline runs from raw interaction logs through behavioral kernel estimation and a contamination generator to a DR-RL gym. **What you must supply:** diff --git a/docs/index.html b/docs/index.html index 89062f6..7aa0c21 100644 --- a/docs/index.html +++ b/docs/index.html @@ -340,10 +340,13 @@
- When you shop online, prices often change based on how much interest you show — the more you browse, the more the site learns about your intent and may raise prices accordingly. This works because stores assume that a curious, engaged shopper is more likely to buy. But AI assistants are now doing the shopping research on behalf of users: they browse in one session to gather price information and then let the user purchase in a fresh session at the lower, unadjusted price. The store never sees the connection between the two, so it never gets to factor in that genuine intent — and loses the revenue it would have earned. + Dynamic pricing extracts margin by exploiting the gap between what a platform knows and what a buyer knows. A user who browses a hotel across several sessions signals intent; the platform raises the price accordingly. That information asymmetry — the Cost of Information — is the economic engine behind session-based pricing in travel, hospitality, and e-commerce.
- PHANTOM studies this problem and builds defenses against it. We created a realistic fake store (in hotel and airline modes) where both real people and AI agents were given shopping tasks, and we recorded every click, scroll, and page visit. By comparing how humans and AI agents move through a site, we found clear patterns that tell them apart. We then used those patterns to build a smarter pricing system that can recognize when it is likely talking to an AI scout and adjust its strategy accordingly — protecting the store's margins without making things worse for genuine shoppers. + LLM agents break the engine. An agent conducting reconnaissance in isolated sessions accumulates zero demand signal, then routes the purchase through a clean session at the floor price. As the number of independent querying agents grows, the realizable price converges to its minimum order statistic and COI collapses to zero. This is not a future risk; it is a structural failure mode in any pricing system that treats sessions independently. +
++ PHANTOM formalizes the failure, measures it on real human and agent interaction data, and builds a defense. We prove the COI erosion theorem, collect 29 labeled sessions (13 human, 16 agent) across hotel and airline storefronts under goal-driven tasks, learn class-specific Markov transition kernels, and train a Distributionally Robust RL pricing policy over a Wasserstein ambiguity set. Behavioral separability is statistically significant (Mann–Whitney U = 2.0, p = 0.0006). The per-session agent probability signal f(τ) feeds directly into the robust policy reward as a COI-leakage penalty.
- The current thesis revision extends both theory and implementation. The main research question is how a pricing system can preserve margin integrity when browsing and purchasing are increasingly orchestrated by AI agents. + The methodology runs in three stages: observe, distinguish, defend.
-
- Controlled trials currently include balanced human and agent sessions with goal-driven tasks across hotel and airline interfaces. Early separability results are strong (Mann-Whitney U=2.0, p=0.0006), while robust pricing gains remain regime-dependent and are being calibrated in larger sweeps.
+ Both human participants and LLM agents are assigned goal-driven tasks on a live instrumented storefront (hotel or airline mode). Every interaction is logged as a timestamped event tuple (action, item, timestamp). Actions are partitioned into four semantic categories — cart, dwell, navigation, filter — with decreasing signal weights (4.0, 2.0, 1.0, 0.5) calibrated by the KL divergence between human and agent transition rows. Price quotes are streamed to a separate Kafka topic, enabling joint analysis of behavior and pricing exposure. The platform runs a surge-discount heuristic during collection to expose participants to state-dependent prices.
+
+ From the labeled session trajectories, we estimate class-specific Markov transition kernels T̂H and T̂A by maximum likelihood. For any new partial trajectory τ', we compute KL divergence to each prototype: +
++ ΔH = DKL(T̂' ∥ T̄H), ΔA = DKL(T̂' ∥ T̄A) +
++ The gap score g(τ') = ΔH − ΔA maps to a weak agent probability via a temperature-controlled logistic function: f(τ') = σ((ΔH − ΔA) / T). This is a continuous signal, not a binary bot flag. The Mann–Whitney test on gap scores between the 13-human and 16-agent cohorts yields U = 2.0, p = 0.0006 — the behavioral distributions are well separated. +
+ ++ A contamination generator G(α) mixes real human trajectories with synthetic agent trajectories drawn from T̂A to produce training distributions at any contamination level α ∈ [0, 1]. The pricing policy is trained as a Stackelberg leader against a Wasserstein ambiguity set around the generator's empirical distribution, minimizing worst-case regret over plausible demand shifts. The per-step reward penalizes COI leakage — weighted by f(τ') — while a UX index bounds harm to legitimate users. Sweeps ran across 384 TPU chips (v4, v5e, v6e Trillium) covering six contamination levels and multiple algorithm variants (PPO, A2C, DQN, Q-table).