chore: refactoring, proper citation and updating on data and refs and apendices

2026-07-16 01:53:37 +00:00 · 2026-03-15 21:15:23 +01:00
parent 0521a63937
commit 375445f260
5 changed files with 82 additions and 118 deletions
--- a/paper/src/main.tex
+++ b/paper/src/main.tex
@@ -46,15 +46,44 @@ These behavioral signals serve as inputs for a Distributionally Robust Reinforce
 \appendix
 \section{Terminology}
 \begin{description}
-\item[Agent $A$] An actor of non-human nature, powered by an LLM.
-\item[Human $H$] An individual human with some job to be done.
-\item[Actor $\theta$] Defines a type of class which is either Agent or Human and has the capability to carry out actions on a web platform.
-\item[Platform] Any web-based platform which serves an interface to a collection of items that can be purchased, each at some price $p_i$.
-\item[Behavioral Model] A mathematical model predicting what action comes after a series of prior actions.
-\item[LLM] Large Language Model served by some provider with the abstracted capability of tool calling.
-\item[TPU] Tensor Processing Unit which is a unique kind of chip architecture developed by Google.
-\item[Trajectory] Defined as a series of unspecified length, collecting data on states of some object over time.
-% TODO: maybe define other things in a similar succient manner
+\item[Agent $A$] A non-human actor, typically an LLM-driven system that executes web actions toward a goal.
+\item[Human $H$] A human participant interacting with the platform to complete a task.
+\item[Actor Type $\theta$] A latent class parameter describing whether a session is generated by a human or an agent profile.
+\item[Platform] A web interface exposing purchasable items and their offered prices.
+\item[Session $s$] A bounded interaction record tied to one actor and one session identifier.
+\item[Event $e_{s,k}$] A single interaction tuple in a session, including action, item target, and timestamp.
+\item[Trajectory $\tau_s$] The ordered sequence of events generated within a session.
+\item[Demand Proxy $\hat{q}_{t,i}$] A weighted aggregate of observed actions used as an operational substitute for latent demand.
+\item[Action Weight Function $\omega(a)$] A mapping from action type to signal strength in the demand proxy.
+\item[True Demand $d(p;\theta)$] The latent purchase response as a function of price and actor type.
+\item[Contamination $\alpha$] The proportion of agent-generated traffic in the session mixture.
+\item[Non-stationary Noise $\epsilon_t$] Time-varying residual variation not explained by the actor mixture.
+\item[Pricing Policy $\pi(\tau)$] A function mapping observed interaction history to an offered price.
+\item[Cost of Information (COI)] The expected premium above the minimum viable price induced by the pricing policy.
+\item[COI Leakage] A per-quote penalty term modeling information revealed to reconnaissance behavior.
+\item[First-Order Statistic $p_{(1)}$] The minimum observed price among multiple independent queries.
+\item[Transition Kernel $\mathcal{T}$] A Markov transition matrix over behavioral states or actions.
+\item[Separability] The degree to which human and agent sessions can be distinguished from behavior alone.
+\item[KL Divergence $D_{KL}$] A relative-entropy measure used to compare session transition structure against class prototypes.
+\item[Divergence Scores $\Delta_H,\Delta_A$] Session-level distances to human and agent transition centroids.
+\item[Weak Agent Probability $f(\tau)$] A session-level score estimating the likelihood that a trajectory is agent-generated.
+\item[Contamination Generator $\mathcal{G}(\alpha)$] A simulator component that injects synthetic agent trajectories to reach a target mixture level.
+\item[Stackelberg Game] A leader-follower formulation where the platform sets prices and demand responds.
+\item[Ambiguity Set $\mathcal{U}_{\epsilon}$] A set of plausible demand distributions considered under distributional uncertainty.
+\item[Wasserstein Ball] A distance-bounded neighborhood around an empirical distribution used in robust optimization.
+\item[DR-RL] Distributionally Robust Reinforcement Learning for policies trained against worst-case distributional shifts.
+\item[Nominal Contamination $\alpha_0$] The baseline contamination level around which robust candidates are evaluated.
+\item[Robustness Radius $\epsilon_\alpha$] The local interval width used for inner minimization over contamination scenarios.
+\item[Query-Tax Surrogate] A constant leakage proxy assigning fixed penalty to suspected reconnaissance queries.
+\item[Revelation Surrogate] A leakage proxy based on $-\log\pi(p\mid\tau)$ to penalize highly informative quotes.
+\item[Limbo Stack] The alternating game-history buffer that stores leader price moves and follower demand responses.
+\item[UX Index] A bounded user-experience metric tracked to evaluate policy side effects on legitimate users.
+\item[Look-to-Book Ratio] The ratio of search-like interactions to completed purchases, used as an operational contamination indicator.
+\item[Hybrid Kappa-Lambda Architecture] A data design combining streaming ingestion with offline and batch learning loops.
+\item[MDP / POMDP] Sequential decision models with full observability (MDP) or partial observability (POMDP).
+\item[Behavioral Model] A model predicting what action is likely to follow from prior actions.
+\item[LLM] Large Language Model served through an inference provider with tool-use capability.
+\item[TPU] Tensor Processing Unit, a specialized accelerator architecture developed by Google.
 \end{description}

 \section{Aggregate Compute Budget Derivation}
@@ -81,109 +110,19 @@ v4             &  64 & 275 & $64  \times 275 = 17{,}600$  \\

 Converting to petaFLOPS: $160{,}320\;\text{TFLOPS} = 160.32\;\text{PFLOPS} \approx 160\;\text{PFLOPS}$. This is the theoretical peak under sustained BF16 arithmetic; realized throughput depends on memory bandwidth utilization and inter-chip communication overhead, but the figure serves as a useful upper bound for provisioning decisions.

-\section{Full Slope-Test Derivation: Revenue vs. Contamination}
+\section{Slope-Test Verification: Revenue vs. Contamination}
 \label{app:alpha_revenue_slope}

-This appendix gives the full ordinary least squares computation for the linear effect of contamination on mean revenue. Let
+This appendix provides a compact verification of the slope result reported in the main results section. Using the same run-level pairs $x_i=\texttt{study/alpha}_i$ and $y_i=\texttt{eval/revenue\_mean}_i$ ($n=95$), we re-checked the ordinary least squares slope test in Python with standard test routines (SciPy two-sided $t$ test for the slope).
+
 \[
-x_i = \texttt{study/alpha}_i, \qquad y_i = \texttt{eval/revenue\_mean}_i,
+\widehat{y}=326{,}878.57-60{,}631.95\,x,
 \]
-and fit
 \[
-y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,\dots,n.
-\]
-The slope test is
-\[
-H_0: \beta_1 = 0 \qquad \text{vs.} \qquad H_1: \beta_1 \neq 0.
+t(93)=-8.2148,\qquad p=1.2038\times 10^{-12},\qquad R^2=0.4205,\qquad 95\%\,\text{CI}_{\beta_1}=[-75{,}288.76,\,-45{,}975.13].
 \]

-\subsection{Sample moments and least-squares coefficients}
-
-From the data:
-\[
-n=95, \qquad \bar{x}=0.3810526316, \qquad \bar{y}=303{,}774.6096.
-\]
-Define
-\[
-S_{xx}=\sum_{i=1}^{n}(x_i-\bar{x})^2, \qquad S_{xy}=\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y}).
-\]
-Numerically,
-\[
-S_{xx}=7.0508947368, \qquad S_{xy}=-427{,}509.4691.
-\]
-The least-squares slope and intercept are
-\[
-\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} = \frac{-427{,}509.4691}{7.0508947368} = -60{,}631.9460,
-\]
-\[
-\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x} = 303{,}774.6096 - (-60{,}631.9460)(0.3810526316) = 326{,}878.5722.
-\]
-So the fitted line is
-\[
-\hat{y} = 326{,}878.5722 - 60{,}631.9460\,x.
-\]
-
-\subsection{Residual variance and standard error of the slope}
-
-For each observation, $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ and $e_i = y_i - \hat{y}_i$. The residual sum of squares is
-\[
-\mathrm{SSE} = \sum_{i=1}^{n} e_i^2 = 35{,}721{,}896{,}352.27375.
-\]
-With $df=n-2=93$,
-\[
-\mathrm{MSE} = \frac{\mathrm{SSE}}{n-2} = \frac{35{,}721{,}896{,}352.27375}{93} = 384{,}106{,}412.3900.
-\]
-The slope standard error is
-\[
-SE(\hat{\beta}_1) = \sqrt{\frac{\mathrm{MSE}}{S_{xx}}} = \sqrt{\frac{384{,}106{,}412.3900}{7.0508947368}} = 7{,}380.8038.
-\]
-
-\subsection{t-statistic, p-value, and confidence interval}
-
-Under $H_0: \beta_1=0$,
-\[
-t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} = \frac{-60{,}631.9460}{7{,}380.8038} = -8.2148,
-\]
-with $df=93$. The two-sided p-value is
-\[
-p = 2\,\Pr\left(T_{93} \ge |t|\right) = 1.2038\times 10^{-12}.
-\]
-The 95\% confidence interval is
-\[
-\hat{\beta}_1 \pm t_{0.975,93}\,SE(\hat{\beta}_1)
-= -60{,}631.9460 \pm (1.9858)(7{,}380.8038)
-= [-75{,}288.7597,\,-45{,}975.1324].
-\]
-
-\subsection{Effect size and fit statistics}
-
-The sample correlation is $r=-0.64846$, so
-\[
-R^2 = r^2 = 0.4205.
-\]
-Hence, 42.05\% of the variation in \texttt{eval/revenue\_mean} is explained by a linear trend in \texttt{study/alpha}.
-
-The slope interpretation is direct:
-\[
-\hat{\beta}_1 = -60{,}631.9460 \quad \Rightarrow \quad \Delta y \approx -6{,}063.19 \text{ for } \Delta x = +0.1.
-\]
-From $\alpha=0$ to $\alpha=0.8$, the fitted drop is
-\[
-0.8\times (-60{,}631.9460) = -48{,}505.5568,
-\]
-so the model predicts roughly $48{,}506$ lower revenue units on average.
-
-\subsection{Conclusion of the slope test}
-
-The estimated model is
-\[
-\hat{y}=326{,}878.57-60{,}631.95\,x,
-\]
-with
-\[
-t(93)=-8.2148, \qquad p=1.2038\times 10^{-12}, \qquad 95\%\,\text{CI}=[-75{,}288.76,\,-45{,}975.13].
-\]
-The slope is therefore strongly negative and statistically different from zero.
+The Python verification reproduces the reported coefficients and inference values, confirming that the slope-test results are correct under standard methods.

 % \input{../build/concatenated_code}