building sumamry properly

This commit is contained in:
2026-04-08 22:19:14 +02:00
parent 97a6bf3974
commit 86c06176ae
3 changed files with 33 additions and 28 deletions

View File

@@ -1,26 +1,26 @@
% -*- TeX-master: t -*-
% Two-page summary: one self-contained source file (no \input chapters).
% Text is stitched from the thesis chapters using the authors wording; trim if the PDF exceeds two pages.
\documentclass[10pt,letterpaper]{article}
\input{preamble}
\begin{document}
\singlespacing
\setlength{\parskip}{0.25em}
\setlength{\parskip}{0.35em}
\setlength{\parindent}{0pt}
\small
\fancyhead[L]{PHANTOM Summary}
\fancyhead[L]{}
\begin{center}
{\large\bfseries PHANTOM: Pricing Heuristics Against Non-human Transaction Orchestration Mechanisms}\\[0.3em]
{\normalsize Daniel Rösel}\\[0.12em]
{\small Bachelor of Computer Science \& Artificial Intelligence, IE University, Madrid}\\[0.12em]
{\small Supervised by Alberto Martín Izquierdo \quad\textbar\quad \today}
{\small\url{https://velocitatem.github.io/PHANTOM/}}\\[0.65em]
{\large\bfseries PHANTOM: Pricing Heuristics Against Non-human\\[0.15em] Transaction Orchestration Mechanisms}\\[0.55em]
{\normalsize Daniel Rösel\footnote{Bachelor of Computer Science \& Artificial Intelligence @ IE University, Madrid}}\\[0.55em]
{\small Supervised by Alberto Martín Izquierdo}\\[0.35em]
{\small \today}
\end{center}
\vspace{0.35em}
\vspace{0.75em}
\noindent
To better understand all wedges of the current works, we must start by exploring the nature of agents, agentic computer use and web automation, complementing that with economic reasoning and strategic interaction.
The final surface to cover leads us to data-driven dynamic pricing under uncertainty.
The key technical risk is not ``agents buying things'' per se, but agents shaping the behavioral and demand signals that downstream pricing systems consume and depend on \parencite{xia_evaluation-driven_2025}.
@@ -34,8 +34,7 @@ In order to create an environment in which prices can be tested against a demand
The key component of this mediation between agents and commercial platforms lays in the transaction costs related to information gathering and negotiation.
As proposed by \textcite{shahidi_coasean_2025} these costs are bound to collapse towards zero (which we demonstrate mathematically), calling for a re-evaluation of the boundaries between firms and markets.
\vspace{0.3em}
\noindent
\vspace{0.5em}
In this paper we present an exploration and defense against the presence of new commercial entities in digitally powered platforms, preserving market equilibrium in the age of AI.
We formally define interaction data as coming from some actor which can either be an agent ($A$) or human ($H$).
Dynamic pricing algorithms rely on directly translating demand features $q$ to new price assignments $\hat{p}$ across a catalogue of products of size $N$.
@@ -43,8 +42,7 @@ This opens opportunities to design a \textit{tabula rasa} of digital market mech
We propose a robust optimization objective defined in our methodology, transforming the pricing problem into a form of Distributionally Robust Optimization \parencite{kuhn_distributionally_2025} where the learner must guard against adversarial contamination in observed demand distributors.
For purposes of this research, an agent is an algorithmic loop with the ability to access a web platform and perform actions such as clicks, scrolls, and input field fills.
\vspace{0.3em}
\noindent
\vspace{0.5em}
The platform does not directly observe the true underlying demand function $d(p)$ where $d \in \mathbb{R}^{+}$ and our proxy $\hat{q} \in \mathbb{R}^{+}$.
Instead, it observes a behavioral proxy $\hat{q}_t$, which is a composite signal derived from the mixture of actor types.
The total observed demand is a stochastic process governed by the naively defined mixture $Q(p) = (1-\alpha) \cdot \mathbb{E}_{\theta \sim \mathcal{D}_H}[d(p\mid Y=H,\theta)] + \alpha \cdot \mathbb{E}_{\theta \sim \mathcal{D}_A}[d(p\mid Y=A,\theta)] + \epsilon_t$ where $\alpha \in [0, 1]$ represents the contamination parameter (proportion of agents) and $\epsilon_t$ is non-stationary market noise.
@@ -53,8 +51,7 @@ We quantify this markup as the \textit{Cost of Information} (COI), which represe
We formally demonstrate that standard dynamic pricing mechanisms are not incentive-compatible with high-frequency agentic traffic.
As the number of independent competitive agents $N$ querying the system grows, the platform's ability to sustain a COI vanishes.
\vspace{0.3em}
\noindent
\vspace{0.5em}
In order for our research to have grounding in interactions we built a robust e-commerce web-platform.
The architecture of this platform begins with the deployed web-apps posting interaction data to our backend which processes them and stores each ingested interaction into a kafka cluster.
This serves as our data reservoir tracking and associating each interaction with its session and importantly with which experiment it belongs to.
@@ -74,8 +71,7 @@ The second half uses collected behavioral traces to distinguish classes $Y \in \
Our process follows three stages: (1) observe and \textit{vectorize} behavioral interactions, (2) learn distinguishability to characterize human versus agent patterns, and (3) use the learned signal to train a defensive policy in a controlled dynamic-pricing simulator.
Our web platform (developed in similar spirit to RecSim \parencite{ie_recsim_2019}) gives us a controlled environment where tasks are assigned to human and agentic actors and then executed.
\vspace{0.3em}
\noindent
\vspace{0.5em}
Because sessions are collected under controlled experimental conditions where each actor is assigned a known type at the start of the trial, labels $Y_s \in \{H, A\}$ are available as ground truth rather than as the output of a heuristic classifier.
We therefore estimate separate transition kernels directly from each labeled partition $\mathcal{D}_H$ and $\mathcal{D}_A$, treating the resulting $\hat{\mathcal{T}}_H$ and $\hat{\mathcal{T}}_A$ as the ground-truth behavioral profiles for each class.
This allows us to construct a \textit{Contamination Generator} $\mathcal{G}(\alpha)$.
@@ -90,27 +86,23 @@ As part of reward engineering, we keep a UX factor ($UX\in[0,1]$) as an auxiliar
Our training budget is provisioned through TPU Research Cloud and spans 384 chips across TPU v4, v5e, and v6e generations, with a spot-heavy allocation plus an on-demand reserve.
At peak BF16 throughput this corresponds to approximately $160$\,PFLOPS of aggregate compute.
\vspace{0.3em}
\noindent
\vspace{0.5em}
The sign structure is consistent with the theoretical expectation: human sessions produce negative gap scores (closer to the human centroid, far from the agent centroid) while agent sessions produce positive gap scores (closer to the agent centroid).
The two-sided test result ($p<0.001$) at $n_H=13$, $n_A=16$ indicates strong rank distinction between groups, providing evidence that the transition kernels are distinguishable enough to justify their use as a control signal in downstream pricing.
Interpreted on the contamination grid, a $+0.1$ increase in $\alpha$ corresponds to an average revenue decrease of about $9{,}014$ units, and the robust check preserves both direction and significance.
The ability to extract COI is greater in the presence of robustness within the training loop; empirical evidence shows that agent contamination reduces revenue and that robustness is condition-dependent, requiring explicit calibration rather than a one-size-fits-all penalty.
\vspace{0.3em}
\noindent
\vspace{0.5em}
Our analysis of the interaction dynamics between the platform and non-human actors suggests that the current static pricing models are insufficient for an agent-mediated economy.
This technology does not come without a more bitter side, ethical concerns do arise from the idea of deploying black-box like solutions to set prices based on a behavioral attributes.
\vspace{0.3em}
\noindent
\vspace{0.5em}
Contributions include formalization of non-human transaction orchestration in e-commerce as a distinct source of contamination, definition of COI together with a theorem showing its erosion under increasing agent saturation, a controlled e-commerce research platform built on a hybrid Kappa-Lambda architecture, empirical validation of behavioral distinguishability, translation of distinguishability into a distributionally robust reinforcement learning formulation, and release of a reusable public experimental artifact.
\vspace{0.3em}
\noindent\textbf{Acknowledgments.}
\vspace{0.65em}
\noindent\textbf{Acknowledgments.}\quad
This research was supported by the TPU Research Cloud program, which provided access to Google Cloud TPU accelerators (including TPU v4, v5e, and v6e).
Eugene Bykovets, PhD---ETH.
\textbf{Project page:} \url{https://velocitatem.github.io/PHANTOM/}
\renewcommand*{\bibfont}{\footnotesize}
\printbibliography[title={References}]