catchup: rogue scripts

2026-07-16 01:53:37 +00:00 · 2026-02-27 12:45:46 +01:00
parent e8a9716f69
commit 5444a4ea13
27 changed files with 6908 additions and 2 deletions
--- a/paper/src/chapters/slacberger.tex
+++ b/paper/src/chapters/slacberger.tex
@@ -0,0 +1,69 @@
+
+\section{Problem Formulation: A Stackelberg Game Approach}
+\label{sec:math_formulation}
+
+We formalize the interaction between the dynamic pricing system and non-human actors as a \textit{Stackelberg Game} (Leader-Follower) with incomplete information. This framework captures the hierarchical nature of the problem: the Platform (Leader) sets a pricing policy, and the Actors (Followers)---both Humans and Agents---observe these prices and react strategically.
+
+\subsection{The Players and Objectives}
+
+Let $t \in \{1, \dots, T\}$ denote discrete time steps. At each step, the system interactions are defined by the following entities:
+
+\paragraph{1. The Leader (The Platform)}
+The e-commerce platform acts as the leader, choosing a pricing policy $\pi$ to maximize total expected revenue. At time $t$, given a state $s_t \in \mathcal{S}$ (representing inventory, time of day, and historical interactions), the platform sets a price $p_t \in [p_{\min}, p_{\max}]$.
+
+The platform's goal is to maximize the cumulative revenue from genuine human transactions while mitigating the distortion caused by agent interactions.
+
+\paragraph{2. The Followers (The Demand Mixture)}
+The observed demand is not a monolithic signal but a mixture of two distinct populations with divergent objective functions. Let $u$ denote an incoming actor. The type of the actor $\theta \in \{H, A\}$ is a latent variable, where $H$ denotes a Human and $A$ denotes an Agent.
+
+\begin{itemize}
+    \item \textbf{The Human ($H$):} Acts as a \textit{myopic utility maximizer}. A human $i$ has a private valuation $v_i$ for the product. They execute a purchase decision $d_i \in \{0, 1\}$ based on the consumer surplus:
+    \begin{equation}
+        d_i(p_t) = \mathbb{I}(v_i - p_t \geq 0)
+    \end{equation}
+    where $\mathbb{I}(\cdot)$ is the indicator function. The aggregate human demand $q_H(p_t)$ follows a standard downward-sloping demand curve $D(p_t)$.
+
+    \item \textbf{The Agent ($A$):} Acts as an \textit{information maximizer} (reconnaissance). The agent does not intend to purchase at the displayed price $p_t$ unless an arbitrage condition is met. Instead, the agent generates interaction events (queries) to estimate the platform's pricing function $f(p)$. The agent's reward function $R_A$ is defined by Information Gain:
+    \begin{equation}
+        R_A(p_t) = H(\mathcal{P}) - H(\mathcal{P} \mid p_t) - c_{query}
+    \end{equation}
+    where $H(\mathcal{P})$ is the entropy of the agent's belief regarding the price distribution, and $c_{query}$ is the marginal cost of interaction (assumed $\approx 0$ for LLMs).
+\end{itemize}
+
+\subsection{The Demand Contamination Model}
+
+% MAYBE alpha has to be \lambda which we also need to formally define still
+
+The core difficulty in this setting is that the platform observes only the aggregate interaction volume $\hat{q}_t$, which is a contaminated signal. Let $\alpha_t \in [0, 1]$ represent the proportion of traffic generated by agents at time $t$. The observed signal is:
+
+\begin{equation}
+    \hat{q}_t(p_t) = (1 - \alpha_t) \cdot q_H(p_t) + \alpha_t \cdot q_A(p_t) + \epsilon_t
+\end{equation}
+
+where:
+\begin{itemize}
+    \item $q_H(p_t)$ is the \textit{true signal} (conversion intent).
+    \item $q_A(p_t)$ is the \textit{adversarial noise} (reconnaissance queries).
+    \item $\epsilon_t$ is random market noise.
+\end{itemize}
+
+Crucially, $q_A(p_t)$ is often inversely correlated with $q_H(p_t)$ in terms of utility; agents may flood the system with queries during high-volatility periods to map price boundaries, artificially inflating $\hat{q}_t$ without converting.
+
+\subsection{The Optimization Objective: Robust Revenue}
+
+Standard dynamic pricing algorithms (e.g., Thompson Sampling or UCB) assume $\alpha_t = 0$, estimating demand $\hat{D}(p) \approx \mathbb{E}[\hat{q} | p]$. In the presence of agents ($\alpha_t > 0$), this estimator becomes biased, leading to the \textit{Cost of Information} (COI) defined in Section 3.2.
+
+We propose a robust optimization objective. The platform seeks a pricing policy $\pi^*$ that maximizes worst-case revenue over a statistically plausible set of contamination rates $\alpha$:
+
+\begin{equation}
+    \pi^* = \argmax_{\pi} \sum_{t=1}^T \mathbb{E}_{s_t} \left[ \min_{\alpha} \left( p_t \cdot \hat{q}_t(p_t | \theta=H) \right) - \lambda \cdot \mathcal{L}_{detect}(\hat{q}_t) \right]
+\end{equation}
+
+Here:
+\begin{itemize}
+    \item The first term, $p_t \cdot \hat{q}_t(p_t | \theta=H)$, represents the revenue generated strictly from the estimated human segment.
+    \item $\mathcal{L}_{detect}$ is a penalty term for failing to separate distributions (the cost of confusion).
+    \item $\lambda$ is a hyperparameter balancing revenue exploitation vs. robust detection.
+\end{itemize}
+
+This formulation effectively transforms the pricing problem into a \textit{Distributionally Robust Optimization (DRO)} problem, where the learner must guard against adversarial perturbations (Agent traffic) in the observed demand distribution.
--- a/paper/src/graphics/gcp.png
+++ b/paper/src/graphics/gcp.png
--- a/paper/src/graphics/gcp.webp
+++ b/paper/src/graphics/gcp.webp