updated main

2026-05-31 16:43:36 +00:00 · 2026-04-08 22:33:15 +02:00
parent 86c06176ae
commit 97902f39a3
4 changed files with 3 additions and 5 deletions
--- a/paper/src/main.tex
+++ b/paper/src/main.tex
@@ -18,10 +18,7 @@
 \end{titlepage}

 \begin{abstract}
-With accelerated growth of Large Language Model agents in e-commerce a novel adversarial dynamic to digital markets emerges. This paper address the vulnerability of dynamic pricing systems to AI intermediaries that decouple the information gather stages from the transaction execution. By conducing reconnaissance isolates sessions, agents circumvent the ``Cost of Information'' (COI) defined as the accumulated price premium typically thought demand expression estimators.
-We formally define this phenomenon and derive the Cost of Information Theorem, proving that as the saturation of independent, utility-maximizing agents increases, the platform’s ability to sustain a COI converges to zero, rendering standard dynamic pricing mechanisms incentive-incompatible.
-To respond to this threat we propose a defensive framework which integrates behavioral economics with Adversarially Distributionally Robust Optimization (DRO). We introduce a custom e-commerce research platform built on hybrid Kappa-Lambda architecture, designed to capture and simulate high-fidelity controlled interaction trajectories. We further demonstrate through modeling that human and agent behaviors exhibit distinct transition probability kernels, enabling the construction of discriminative models based on Kullback-Leibler divergence.
-These behavioral signals serve as inputs for a Distributionally Robust Reinforcement Learning (DR-RL) agent. We formulate the pricing problem as a Stackelberg game where the learner optimizes against an ambiguity set of demand distributions defined by the Wasserstein distance. This approach allows the pricing policy to remain robust against non-stationary contamination without overfitting to deterministic demand curves. The research validates a mechanism for preserving margin integrity and market equilibrium in an agent-mediated economy, while minimizing degradation to the legitimate human user experience (UX).
+With accelerated growth of Large Language Model agents in e-commerce, a novel adversarial dynamic to digital markets emerges. This paper addresses the vulnerability of dynamic pricing systems to AI intermediaries that decouple the information gather stages from the transaction execution. By conducting reconnaissance in isolated sessions, agents circumvent the ``Cost of Information'' (COI) defined as the accumulated price premium typically via demand expression estimators. We formally define this phenomenon and derive the Cost of Information Theorem, proving that as the saturation of independent, utility-maximizing agents increases, the platform's ability to sustain a COI converges to zero, rendering standard dynamic pricing mechanisms incentive-incompatible. To respond to this threat, we propose a defensive framework which integrates behavioral economics with Adversarially Distributionally Robust Optimization (DRO). We introduce a custom e-commerce research platform built on a hybrid Kappa-Lambda architecture, designed to capture and simulate high-fidelity controlled interaction trajectories. We further demonstrate through modeling that human and agent behaviors exhibit distinct transition probability kernels, enabling the construction of discriminative models based on Kullback-Leibler divergence. These behavioral signals serve as inputs for a Distributionally Robust Reinforcement Learning (DR-RL) agent. We formulate the pricing problem as a Stackelberg game where the learner optimizes against an ambiguity set of demand distributions defined by the Wasserstein distance. This approach allows the pricing policy to remain robust against non-stationary contamination without overfitting to deterministic demand curves. Extensive TPU-accelerated factorial training demonstrates that while agent contamination causally reduces short-term revenue, our robust mechanism successfully preserves COI margin integrity and market equilibrium, particularly under higher contamination ratios and larger catalog sizes. Additionally, we show that integrating a balanced UX penalty drastically reduces supra-competitive pricing tendencies, minimizing degradation to the legitimate human user experience. Finally, we release our custom interaction framework and dataset as public artifacts to support future research in agent-mediated traffic.
 \end{abstract}

 \noindent\textbf{Keywords:} Dynamic Pricing, LLM Agents, Adversarial Machine Learning, E-commerce, Behavioral Detection, Reinforcement Learning
@@ -111,6 +108,7 @@ v4             &  64 & 275 & $64  \times 275 = 17{,}600$  \\
 Converting to petaFLOPS: $160{,}320\;\text{TFLOPS} = 160.32\;\text{PFLOPS} \approx 160\;\text{PFLOPS}$. This is the theoretical peak under sustained BF16 arithmetic; realized throughput depends on memory bandwidth utilization and inter-chip communication overhead, but the figure serves as a useful upper bound for provisioning decisions.


+
 \section{KL divergence when the reference has zeros}
 \label{app:kl_zeros}