diff --git a/paper/src/auto/main.el b/paper/src/auto/main.el index a64eef1..31e5a9a 100644 --- a/paper/src/auto/main.el +++ b/paper/src/auto/main.el @@ -17,6 +17,10 @@ "chapters/05-discussion" "chapters/06-conclusion" "article" - "art12")) + "art12") + (LaTeX-add-labels + "app:compute_budget" + "tab:compute_derivation" + "app:whoclicked_card")) :latex) diff --git a/paper/src/chapters/03-methodology.tex b/paper/src/chapters/03-methodology.tex index e07dcac..a8acbeb 100644 --- a/paper/src/chapters/03-methodology.tex +++ b/paper/src/chapters/03-methodology.tex @@ -94,7 +94,8 @@ where $\mathbb{E}[P]$ is the expected price charged by the policy and $\underlin We now formally demonstrate that standard dynamic pricing mechanisms are not incentive-compatible with high-frequency agentic traffic. As the number of independent competitive agents $N$ querying the system grows, the platform's ability to sustain a COI vanishes. - A fundamental assumption for our claim lies in the alignment of the AI agent through its prompt which has been demonstrated by \cite{fish_algorithmic_2025} to cause strong collusive behavior under linguistic nudges. This assumption can be generalized to the human user asking the agent to research products with a minimizing objective. +\paragraph{Assumption Scope} +The theorem and core experiments in this thesis assume a non-collusive independent-session setting: each agent queries prices independently and does not share sampled quotes across agents. Collusive coordination is outside the current proof scope and is treated as an extension scenario. \begin{theorem}[COI Erosion in the Limit] Let $N$ be the number of independent, utility-maximizing agents querying the platform. Let $p_{(1)}$ be the first order statistic (minimum) of the prices offered to these agents. As $N \to \infty$, the Cost of Information converges to 0. @@ -331,7 +332,7 @@ where $\mathcal{S}_e$ denotes the set of destination events that follow $e$ in t To obtain this statistic, we aggregate transitions by triggering event $e$ and treat normalized outgoing probabilities as categorical distributions $P_e$ (human) and $Q_e$ (agent). We intersect shared event labels, then accumulate log-ratio contributions over shared destinations. Large contributions, including near-zero $Q_e(k)$ cases, identify transitions where one actor class is difficult to mimic. -With these divergence features we train a contrastive model to estimate a weak agent probability $f(\tau)\in[0,1]$, which we later use as a weighting and control signal. +With these divergence features we compute a weak agent probability $f(\tau')\in[0,1]$ directly from divergence gaps, which we later use as a weighting and control signal. \subsubsection{Transition Probability Estimation} @@ -375,10 +376,36 @@ Because contamination level $\alpha$ and demand shift are non-stationary online, \Delta_A &= D_{KL}(\hat{\mathcal{T}}^\prime \parallel \bar{\mathcal{T}}_A) \end{align} -This yields two centroid-like heuristics that act as a session-level agent score in the engine. On a per-customer or use-case basis a similar study should be done in order to obtain ground truth behavior models for humans and agents and their specific interaction with a given products website. +From these two divergences we define the gap score: +\begin{equation} +g(\tau') := \Delta_H(\tau') - \Delta_A(\tau'). +\end{equation} +Positive values indicate trajectories farther from the human centroid and closer to the agent centroid. + +We map this gap to a weak agent probability using a temperature-controlled logistic map: +\begin{equation} +f(\tau') := P(Y=A\mid\tau') = \operatorname{softmax}(-\Delta_A,-\Delta_H)_A = \sigma\left(\frac{\Delta_H-\Delta_A}{T}\right), \quad T>0. +\end{equation} +The session-level control signal injected into pricing is then +\begin{equation} +\hat{\alpha}(\tau') := f(\tau'). +\end{equation} + +This turns distinguishability into an operational control input in the engine. On a per-customer or use-case basis, a similar data collection and fitting process should be repeated to obtain domain-specific behavior kernels. In implementation, we maintain an alternating game-history stack (our \textit{Limbo} stack) and execute it explicitly every epoch with exactly two transitions: first the platform publishes a price vector (leader move), then the market responds with trajectory-derived demand (follower move). +To avoid notation drift, we separate two COI objects used for different purposes: +\begin{align} +\text{COI}_{\text{level}}(\pi) &:= \mathbb{E}[P]-\underline{p} \quad \text{(global reporting KPI)} \\ +\text{COI}_{\text{leak}}(p,\tau') &:= f(\tau')\cdot \text{InfoValue}(p,\tau') \quad \text{(local control penalty)} +\end{align} +where $\text{COI}_{\text{level}}$ is evaluated at policy level and $\text{COI}_{\text{leak}}$ is evaluated per observed quote during training. We connect local leakage to expected global erosion with the operational assumption +\begin{equation} +\mathbb{E}[\Delta\text{COI}_{\text{level},t} \mid \tau_t'] \approx -\kappa\,\text{COI}_{\text{leak}}(p_t,\tau_t') + \xi_t, +\end{equation} +where $\kappa>0$ and $\xi_t$ is residual noise. This keeps theorem-level COI erosion (global, asymptotic) distinct from training-time leakage control (local surrogate). + % Mention discretized action space and the clipping and over shotting in continuous action spaces % Also talk about catastrophic economics, we add termination on bankrupcy or zero demand so market collaps diff --git a/paper/src/chapters/04-results.tex b/paper/src/chapters/04-results.tex index f1e4f56..45208ae 100644 --- a/paper/src/chapters/04-results.tex +++ b/paper/src/chapters/04-results.tex @@ -40,7 +40,12 @@ We report two preliminary stages before the full factorial interpretation. First \subsubsection{The Impact of Contamination on Revenue} -A linear fit test on run-level data ($n=95$) shows a strong negative association between contamination and mean revenue. The fitted model mapping $\alpha \to \text{revenue}$ result in $t(93)=-8.2148$, $p=1.20\times 10^{-12}$, $R^2=0.4205$, and a 95\% confidence interval for the slope of $[-75{,}288.76,\,-45{,}975.13]$. In practical terms, a $+0.1$ increase in $\alpha$ corresponds to an average decrease of about $6{,}063$ revenue units within our environment. +The contamination--revenue slope is estimated on a controlled cohort (single sweep, baseline policy, $n_{\text{products}}=100$, $n=95$). In this setting, contamination $\alpha$ is set exogenously by the experiment, so the slope identifies the within-sweep causal effect of contamination on revenue under fixed policy and environment settings. The fitted linear model is + +\[ +\widehat{y}=348{,}823.41-90{,}140.53\,\alpha, +\] +with $t(93)=-61.45$, $p=4.27\times10^{-77}$, $R^2=0.976$, and a 95\% confidence interval for the slope of $[-93{,}053.38,\,-87{,}227.68]$. Interpreted on the contamination grid, a $+0.1$ increase in $\alpha$ corresponds to an average revenue decrease of about $9{,}014$ units. A heteroskedasticity-robust check (HC1) preserves the same direction and significance ($t=-41.25$, $p=1.42\times10^{-61}$), supporting a large and statistically stable impact in this controlled regime. \subsubsection{Large Scale Factorial Training} @@ -58,7 +63,6 @@ In our complete training runs we logged $\approx 180$ days of net compute time. \caption{Revenue curves by contamination for the final cohort. The baseline remains above the defended curve in most cells, but the gap narrows in the high-contamination region.} \label{fig:final_focus_revenue_by_alpha} \end{figure} -% TODO: we need a similar plot which shows the COI preserved (what we gain across teh multiple conatmination leves, showing that the robust method has better COI optimization.) \begin{figure}[ht] \centering diff --git a/paper/src/main.tex b/paper/src/main.tex index f31edd9..555fbc3 100644 --- a/paper/src/main.tex +++ b/paper/src/main.tex @@ -110,19 +110,6 @@ v4 & 64 & 275 & $64 \times 275 = 17{,}600$ \\ Converting to petaFLOPS: $160{,}320\;\text{TFLOPS} = 160.32\;\text{PFLOPS} \approx 160\;\text{PFLOPS}$. This is the theoretical peak under sustained BF16 arithmetic; realized throughput depends on memory bandwidth utilization and inter-chip communication overhead, but the figure serves as a useful upper bound for provisioning decisions. -\section{Slope-Test Verification: Revenue vs. Contamination} -\label{app:alpha_revenue_slope} - -This appendix provides a compact verification of the slope result reported in the main results section. Using the same run-level pairs $x_i=\texttt{study/alpha}_i$ and $y_i=\texttt{eval/revenue\_mean}_i$ ($n=95$), we re-checked the ordinary least squares slope test in Python with standard test routines (SciPy two-sided $t$ test for the slope). - -\[ -\widehat{y}=326{,}878.57-60{,}631.95\,x, -\] -\[ -t(93)=-8.2148,\qquad p=1.2038\times 10^{-12},\qquad R^2=0.4205,\qquad 95\%\,\text{CI}_{\beta_1}=[-75{,}288.76,\,-45{,}975.13]. -\] - -The Python verification reproduces the reported coefficients and inference values, confirming that the slope-test results are correct under standard methods. \section{whoclickedit Dataset Card} \label{app:whoclicked_card}