chore: refactoring, proper citation and updating on data and refs and apendices

This commit is contained in:
2026-03-15 21:15:23 +01:00
parent 0521a63937
commit 375445f260
5 changed files with 82 additions and 118 deletions

View File

@@ -46,15 +46,44 @@ These behavioral signals serve as inputs for a Distributionally Robust Reinforce
\appendix
\section{Terminology}
\begin{description}
\item[Agent $A$] An actor of non-human nature, powered by an LLM.
\item[Human $H$] An individual human with some job to be done.
\item[Actor $\theta$] Defines a type of class which is either Agent or Human and has the capability to carry out actions on a web platform.
\item[Platform] Any web-based platform which serves an interface to a collection of items that can be purchased, each at some price $p_i$.
\item[Behavioral Model] A mathematical model predicting what action comes after a series of prior actions.
\item[LLM] Large Language Model served by some provider with the abstracted capability of tool calling.
\item[TPU] Tensor Processing Unit which is a unique kind of chip architecture developed by Google.
\item[Trajectory] Defined as a series of unspecified length, collecting data on states of some object over time.
% TODO: maybe define other things in a similar succient manner
\item[Agent $A$] A non-human actor, typically an LLM-driven system that executes web actions toward a goal.
\item[Human $H$] A human participant interacting with the platform to complete a task.
\item[Actor Type $\theta$] A latent class parameter describing whether a session is generated by a human or an agent profile.
\item[Platform] A web interface exposing purchasable items and their offered prices.
\item[Session $s$] A bounded interaction record tied to one actor and one session identifier.
\item[Event $e_{s,k}$] A single interaction tuple in a session, including action, item target, and timestamp.
\item[Trajectory $\tau_s$] The ordered sequence of events generated within a session.
\item[Demand Proxy $\hat{q}_{t,i}$] A weighted aggregate of observed actions used as an operational substitute for latent demand.
\item[Action Weight Function $\omega(a)$] A mapping from action type to signal strength in the demand proxy.
\item[True Demand $d(p;\theta)$] The latent purchase response as a function of price and actor type.
\item[Contamination $\alpha$] The proportion of agent-generated traffic in the session mixture.
\item[Non-stationary Noise $\epsilon_t$] Time-varying residual variation not explained by the actor mixture.
\item[Pricing Policy $\pi(\tau)$] A function mapping observed interaction history to an offered price.
\item[Cost of Information (COI)] The expected premium above the minimum viable price induced by the pricing policy.
\item[COI Leakage] A per-quote penalty term modeling information revealed to reconnaissance behavior.
\item[First-Order Statistic $p_{(1)}$] The minimum observed price among multiple independent queries.
\item[Transition Kernel $\mathcal{T}$] A Markov transition matrix over behavioral states or actions.
\item[Separability] The degree to which human and agent sessions can be distinguished from behavior alone.
\item[KL Divergence $D_{KL}$] A relative-entropy measure used to compare session transition structure against class prototypes.
\item[Divergence Scores $\Delta_H,\Delta_A$] Session-level distances to human and agent transition centroids.
\item[Weak Agent Probability $f(\tau)$] A session-level score estimating the likelihood that a trajectory is agent-generated.
\item[Contamination Generator $\mathcal{G}(\alpha)$] A simulator component that injects synthetic agent trajectories to reach a target mixture level.
\item[Stackelberg Game] A leader-follower formulation where the platform sets prices and demand responds.
\item[Ambiguity Set $\mathcal{U}_{\epsilon}$] A set of plausible demand distributions considered under distributional uncertainty.
\item[Wasserstein Ball] A distance-bounded neighborhood around an empirical distribution used in robust optimization.
\item[DR-RL] Distributionally Robust Reinforcement Learning for policies trained against worst-case distributional shifts.
\item[Nominal Contamination $\alpha_0$] The baseline contamination level around which robust candidates are evaluated.
\item[Robustness Radius $\epsilon_\alpha$] The local interval width used for inner minimization over contamination scenarios.
\item[Query-Tax Surrogate] A constant leakage proxy assigning fixed penalty to suspected reconnaissance queries.
\item[Revelation Surrogate] A leakage proxy based on $-\log\pi(p\mid\tau)$ to penalize highly informative quotes.
\item[Limbo Stack] The alternating game-history buffer that stores leader price moves and follower demand responses.
\item[UX Index] A bounded user-experience metric tracked to evaluate policy side effects on legitimate users.
\item[Look-to-Book Ratio] The ratio of search-like interactions to completed purchases, used as an operational contamination indicator.
\item[Hybrid Kappa-Lambda Architecture] A data design combining streaming ingestion with offline and batch learning loops.
\item[MDP / POMDP] Sequential decision models with full observability (MDP) or partial observability (POMDP).
\item[Behavioral Model] A model predicting what action is likely to follow from prior actions.
\item[LLM] Large Language Model served through an inference provider with tool-use capability.
\item[TPU] Tensor Processing Unit, a specialized accelerator architecture developed by Google.
\end{description}
\section{Aggregate Compute Budget Derivation}
@@ -81,109 +110,19 @@ v4 & 64 & 275 & $64 \times 275 = 17{,}600$ \\
Converting to petaFLOPS: $160{,}320\;\text{TFLOPS} = 160.32\;\text{PFLOPS} \approx 160\;\text{PFLOPS}$. This is the theoretical peak under sustained BF16 arithmetic; realized throughput depends on memory bandwidth utilization and inter-chip communication overhead, but the figure serves as a useful upper bound for provisioning decisions.
\section{Full Slope-Test Derivation: Revenue vs. Contamination}
\section{Slope-Test Verification: Revenue vs. Contamination}
\label{app:alpha_revenue_slope}
This appendix gives the full ordinary least squares computation for the linear effect of contamination on mean revenue. Let
This appendix provides a compact verification of the slope result reported in the main results section. Using the same run-level pairs $x_i=\texttt{study/alpha}_i$ and $y_i=\texttt{eval/revenue\_mean}_i$ ($n=95$), we re-checked the ordinary least squares slope test in Python with standard test routines (SciPy two-sided $t$ test for the slope).
\[
x_i = \texttt{study/alpha}_i, \qquad y_i = \texttt{eval/revenue\_mean}_i,
\widehat{y}=326{,}878.57-60{,}631.95\,x,
\]
and fit
\[
y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \qquad i=1,\dots,n.
\]
The slope test is
\[
H_0: \beta_1 = 0 \qquad \text{vs.} \qquad H_1: \beta_1 \neq 0.
t(93)=-8.2148,\qquad p=1.2038\times 10^{-12},\qquad R^2=0.4205,\qquad 95\%\,\text{CI}_{\beta_1}=[-75{,}288.76,\,-45{,}975.13].
\]
\subsection{Sample moments and least-squares coefficients}
From the data:
\[
n=95, \qquad \bar{x}=0.3810526316, \qquad \bar{y}=303{,}774.6096.
\]
Define
\[
S_{xx}=\sum_{i=1}^{n}(x_i-\bar{x})^2, \qquad S_{xy}=\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y}).
\]
Numerically,
\[
S_{xx}=7.0508947368, \qquad S_{xy}=-427{,}509.4691.
\]
The least-squares slope and intercept are
\[
\hat{\beta}_1 = \frac{S_{xy}}{S_{xx}} = \frac{-427{,}509.4691}{7.0508947368} = -60{,}631.9460,
\]
\[
\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x} = 303{,}774.6096 - (-60{,}631.9460)(0.3810526316) = 326{,}878.5722.
\]
So the fitted line is
\[
\hat{y} = 326{,}878.5722 - 60{,}631.9460\,x.
\]
\subsection{Residual variance and standard error of the slope}
For each observation, $\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i$ and $e_i = y_i - \hat{y}_i$. The residual sum of squares is
\[
\mathrm{SSE} = \sum_{i=1}^{n} e_i^2 = 35{,}721{,}896{,}352.27375.
\]
With $df=n-2=93$,
\[
\mathrm{MSE} = \frac{\mathrm{SSE}}{n-2} = \frac{35{,}721{,}896{,}352.27375}{93} = 384{,}106{,}412.3900.
\]
The slope standard error is
\[
SE(\hat{\beta}_1) = \sqrt{\frac{\mathrm{MSE}}{S_{xx}}} = \sqrt{\frac{384{,}106{,}412.3900}{7.0508947368}} = 7{,}380.8038.
\]
\subsection{t-statistic, p-value, and confidence interval}
Under $H_0: \beta_1=0$,
\[
t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} = \frac{-60{,}631.9460}{7{,}380.8038} = -8.2148,
\]
with $df=93$. The two-sided p-value is
\[
p = 2\,\Pr\left(T_{93} \ge |t|\right) = 1.2038\times 10^{-12}.
\]
The 95\% confidence interval is
\[
\hat{\beta}_1 \pm t_{0.975,93}\,SE(\hat{\beta}_1)
= -60{,}631.9460 \pm (1.9858)(7{,}380.8038)
= [-75{,}288.7597,\,-45{,}975.1324].
\]
\subsection{Effect size and fit statistics}
The sample correlation is $r=-0.64846$, so
\[
R^2 = r^2 = 0.4205.
\]
Hence, 42.05\% of the variation in \texttt{eval/revenue\_mean} is explained by a linear trend in \texttt{study/alpha}.
The slope interpretation is direct:
\[
\hat{\beta}_1 = -60{,}631.9460 \quad \Rightarrow \quad \Delta y \approx -6{,}063.19 \text{ for } \Delta x = +0.1.
\]
From $\alpha=0$ to $\alpha=0.8$, the fitted drop is
\[
0.8\times (-60{,}631.9460) = -48{,}505.5568,
\]
so the model predicts roughly $48{,}506$ lower revenue units on average.
\subsection{Conclusion of the slope test}
The estimated model is
\[
\hat{y}=326{,}878.57-60{,}631.95\,x,
\]
with
\[
t(93)=-8.2148, \qquad p=1.2038\times 10^{-12}, \qquad 95\%\,\text{CI}=[-75{,}288.76,\,-45{,}975.13].
\]
The slope is therefore strongly negative and statistically different from zero.
The Python verification reproduces the reported coefficients and inference values, confirming that the slope-test results are correct under standard methods.
% \input{../build/concatenated_code}