From 97902f39a347bb4efc8451b43d1bde61cd125b13 Mon Sep 17 00:00:00 2001
From: Daniel Rosel <daniel@alves.world>
Date: Wed, 8 Apr 2026 22:33:15 +0200
Subject: [PATCH] updated main

---
 paper/src/chapters/03-methodology.tex |   2 +-
 paper/src/chapters/mdp_agent.pdf      | Bin 10932 -> 10932 bytes
 paper/src/chapters/mdp_human.pdf      | Bin 11953 -> 11953 bytes
 paper/src/main.tex                    |   6 ++----
 4 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/paper/src/chapters/03-methodology.tex b/paper/src/chapters/03-methodology.tex
index 2e9be48..5e68c23 100644
--- a/paper/src/chapters/03-methodology.tex
+++ b/paper/src/chapters/03-methodology.tex
@@ -143,7 +143,7 @@ The architecture of this platform begins with the deployed web-apps posting inte
 
 \paragraph{Public Web Artifact} We transition the Kappa like architecture of the data collection to a Lambda architecture for actual learning in a surrogate environment. This allows us to move faster on data which is provided and helps us create a feedback loop for production deployment. To support further research in this intersection of fields we release P4P \footnote{\url{https://github.com/velocitatem/p4p}} as a public repository providing the interaction layer of the PHANTOM framework. This provides a configurable storefront which can be tailored to any commercial setting with a standardized session-level event tracking. We document the API adapters or what the framework expects in terms of schemas for pricing providers and log ingestion servicse. The repository is intended for controlled experimentation and method replication rather than production commerce deployment.
 
-\paragraph{Public Dataset} For reproducibility of the behavioral analysis and distinguishability experiments, we also release the interaction dataset used in this thesis as \textit{WhoClickedIt}. The dataset is hosted on Hugging Face \footnote{\url{https://huggingface.co/datasets/velocitatem/whoclickedit}} and is distributed as one flattened event sheet (\texttt{whoclicked.csv}) with explicit labels (\texttt{actor\_type}, \texttt{is\_agent}, and \texttt{record\_type}). The associated dataset card specifies the schema, collection process, and known limitations; a full copy is included in Appendix~\ref{app:whoclicked_card}.
+\paragraph{Public Dataset} For reproducibility of the behavioral analysis and distinguishability experiments, we also release the interaction dataset used in this thesis as \textit{WhoClickedIt}. The dataset is hosted on Hugging Face \footnote{\url{https://huggingface.co/datasets/velocitatem/whoclickedit}} and is distributed as one flattened event sheet (\texttt{whoclicked.csv}) with explicit labels (\texttt{actor\_type}, \texttt{is\_agent}, and \texttt{record\_type}). The dataset card on that page documents the schema, collection process, and known limitations.
 
 
 \subsubsection{DevOps Principles}
diff --git a/paper/src/chapters/mdp_agent.pdf b/paper/src/chapters/mdp_agent.pdf
index f9a1eb978ebeec22047c2ad833246010b28cac89..7ec2159365eff0dcb20d6163f3c9d72ce4f61b46 100644
GIT binary patch
delta 280
zcmV+z0q6d-RkT&Gh$w%UWnhz7Yn9@I3Npr^sBdErAzo@>G$l#L{P#;*+cENVKkmsn
zhs(hto?*!%0lwrBUtzJzqM|@@xwnL*loi<sK=M{!Ej}Q?aKdMg_PT2vc(mR2iqhz7
zr^sNYWNkF~oagftvzv3R-6Tm*D<j)mt?XQzDvpP53`tk(<{*C^VRqOECNh??T!@88
z^A{$b8H-1gzgT$k)<IuNLm@{<)`NGt)^ewRd|c+A$kR1=#@AEq#(9B7Iw_7ucP)w{
z3}GHzdOD)Br$#om4Q}c_;#WA+(Kzk>)`7ljMW^E~K%YAB6ItCEMHY+6^X!Gn4dJc7
e!@q)i>ExHL9{mga4IcIU(k}u9Z%?z4DI)=al8cT2

delta 280
zcmV+z0q6d-RkT&Gh$w$pWzZ%u)+)sZE65lFQQyWMLK<pdG$l#L{P#<$b&NdSk9%^?
z;Zl%<7Fdvkfi7r5mzXb$q^gjguf0Nk5Up|vfc#B24j*73IO20ewHZ1Onrt?s;$TfP
zNM(^qp{)VUXqj=O7w>zw&hwGBR`eGwU1n@Mo87*#N_cI$o$!AO>2A#kD+nhgqYKW9
z7s8$inN22tvG5eEqy8YQiaBDk5xlELi>>(yaY=Zr<SWpEt{$;F9|R8kq1an9^r)&h
z#QEkj(g~dcwW4!<bd%Q!zrx!=O;a7VcIsP8b=+?w45=eODrW{OmBakuc?v@5Z{c0I
e!@q*H@Z!s~_kIiE2KRbl=@$bXZ&0(5DI)=T>V)P1

diff --git a/paper/src/chapters/mdp_human.pdf b/paper/src/chapters/mdp_human.pdf
index b40208093adfb0bb4468b1819ca692f1e28754fe..ebe5c34a7095a60196d73e820ef1a1bde5179fe7 100644
GIT binary patch
delta 291
zcmV+;0o?wvU9nxTdn|v!YQr!PMDO~Fxs*B-Y)N(;H<%paKq#dUlHN)WLJ^LjSVodd
z^Y@jU*rn*Sj~&en%RmV$v8059Eg4}e<g1d@HL~|>uaF%?s~iE4z3Q{W2RJB>_#9DV
zhR%bL&1O^_tT_)-StL?uYrs-gr3I3U_q|(Z*+^R}`iqt>HMW0UEN<UeCA>D>PI!f6
zx8{_Ww4fE|IWMXw${#6ROeTM^@DQw{{vfQ1IbyOAys5JmTk{j*((w3mS%H_!)f~I?
zLEuo#ioG>MkGhUSoNq28ozN*zD>~OlH+h}#3%niFG&Nysr@pmR$Ne_KkUH|Ca%Qkn
pIpnkFDF~&%g?AR=4*v=^!iz7{-us0c-0Ov<Uj_YdRI`sQA^}b~jiCSl

delta 291
zcmV+;0o?wvU9nxTdn|uZYr-%Th2Qfl&dZo(&?d2sPKpm&kTC|LdmDQQG1S0lN|KKG
z@0V2T7<sxM_vD<zWuSx!RFrVAiV?QJd?83(qj<mc8pU3=+7kf9t2udmfP><I&k=3y
z&;>BEUXO~sv!_96hnbSbS+JZf^AfXj2)$nw#YkHx`?FC#x2}K9vg<ca%V2ER$e=N6
zR-AG{OS<H&DogQ1`6H#-Wbzja579d5_tNQ<BPAQbn?4!2u|F{`jgLQzCAg>-_t;&C
z0*~^p*f~4&sOvPO`RX##2^}MKvh#g%Q<n+9Ah<zK(>883`r9ad*liMwsS`hFZwIHe
p$NcVjj6xNz;e8go!@q)C8RVC3Z~fv8ZuR2QF9#KGRkM#RA^}=3k)Hqn

diff --git a/paper/src/main.tex b/paper/src/main.tex
index 49ee31f..afc4daf 100644
--- a/paper/src/main.tex
+++ b/paper/src/main.tex
@@ -18,10 +18,7 @@
 \end{titlepage}
 
 \begin{abstract}
-With accelerated growth of Large Language Model agents in e-commerce a novel adversarial dynamic to digital markets emerges. This paper address the vulnerability of dynamic pricing systems to AI intermediaries that decouple the information gather stages from the transaction execution. By conducing reconnaissance isolates sessions, agents circumvent the ``Cost of Information'' (COI) defined as the accumulated price premium typically thought demand expression estimators.
-We formally define this phenomenon and derive the Cost of Information Theorem, proving that as the saturation of independent, utility-maximizing agents increases, the platform’s ability to sustain a COI converges to zero, rendering standard dynamic pricing mechanisms incentive-incompatible.
-To respond to this threat we propose a defensive framework which integrates behavioral economics with Adversarially Distributionally Robust Optimization (DRO). We introduce a custom e-commerce research platform built on hybrid Kappa-Lambda architecture, designed to capture and simulate high-fidelity controlled interaction trajectories. We further demonstrate through modeling that human and agent behaviors exhibit distinct transition probability kernels, enabling the construction of discriminative models based on Kullback-Leibler divergence.
-These behavioral signals serve as inputs for a Distributionally Robust Reinforcement Learning (DR-RL) agent. We formulate the pricing problem as a Stackelberg game where the learner optimizes against an ambiguity set of demand distributions defined by the Wasserstein distance. This approach allows the pricing policy to remain robust against non-stationary contamination without overfitting to deterministic demand curves. The research validates a mechanism for preserving margin integrity and market equilibrium in an agent-mediated economy, while minimizing degradation to the legitimate human user experience (UX).
+With accelerated growth of Large Language Model agents in e-commerce, a novel adversarial dynamic to digital markets emerges. This paper addresses the vulnerability of dynamic pricing systems to AI intermediaries that decouple the information gather stages from the transaction execution. By conducting reconnaissance in isolated sessions, agents circumvent the ``Cost of Information'' (COI) defined as the accumulated price premium typically via demand expression estimators. We formally define this phenomenon and derive the Cost of Information Theorem, proving that as the saturation of independent, utility-maximizing agents increases, the platform's ability to sustain a COI converges to zero, rendering standard dynamic pricing mechanisms incentive-incompatible. To respond to this threat, we propose a defensive framework which integrates behavioral economics with Adversarially Distributionally Robust Optimization (DRO). We introduce a custom e-commerce research platform built on a hybrid Kappa-Lambda architecture, designed to capture and simulate high-fidelity controlled interaction trajectories. We further demonstrate through modeling that human and agent behaviors exhibit distinct transition probability kernels, enabling the construction of discriminative models based on Kullback-Leibler divergence. These behavioral signals serve as inputs for a Distributionally Robust Reinforcement Learning (DR-RL) agent. We formulate the pricing problem as a Stackelberg game where the learner optimizes against an ambiguity set of demand distributions defined by the Wasserstein distance. This approach allows the pricing policy to remain robust against non-stationary contamination without overfitting to deterministic demand curves. Extensive TPU-accelerated factorial training demonstrates that while agent contamination causally reduces short-term revenue, our robust mechanism successfully preserves COI margin integrity and market equilibrium, particularly under higher contamination ratios and larger catalog sizes. Additionally, we show that integrating a balanced UX penalty drastically reduces supra-competitive pricing tendencies, minimizing degradation to the legitimate human user experience. Finally, we release our custom interaction framework and dataset as public artifacts to support future research in agent-mediated traffic.
 \end{abstract}
 
 \noindent\textbf{Keywords:} Dynamic Pricing, LLM Agents, Adversarial Machine Learning, E-commerce, Behavioral Detection, Reinforcement Learning
@@ -111,6 +108,7 @@ v4             &  64 & 275 & $64  \times 275 = 17{,}600$  \\
 Converting to petaFLOPS: $160{,}320\;\text{TFLOPS} = 160.32\;\text{PFLOPS} \approx 160\;\text{PFLOPS}$. This is the theoretical peak under sustained BF16 arithmetic; realized throughput depends on memory bandwidth utilization and inter-chip communication overhead, but the figure serves as a useful upper bound for provisioning decisions.
 
 
+
 \section{KL divergence when the reference has zeros}
 \label{app:kl_zeros}