work summary and notes

2026-07-16 01:53:37 +00:00 · 2026-04-09 19:58:08 +02:00
parent 895a004807
commit 51de0cbdc5
6 changed files with 23 additions and 6 deletions
--- a/paper/src/chapters/01-intro.tex
+++ b/paper/src/chapters/01-intro.tex
@@ -7,6 +7,7 @@
 %% \end{figure}

 \section{Introduction}
+\label{sec:introduction}

 In this paper we present an exploration and defense against the presence of new commercial entities in digitally powered platforms, preserving market equilibrium in the age of AI. This research establishes the following contributions: definition and formalization of non-human transactors in e-commerce platforms, development of a testing-ground for capturing the behavioral essence of these transactors across a large variety of digital systems, construction of a discriminative model (to prove distinguishability) as a guiding teacher for downstream mitigation of contamination by non-human entities, translation of such learned distinguishability into existing dynamic pricing machine learning loops, and finally establishment of a high-level KPI-affecting causal effect and cost-saving framework for the future of internet commerce in the presence of such non-human learners.

@@ -26,13 +27,14 @@ Dynamic pricing systems, as presented by \textcite{mueller_low-rank_2019}, often
 We formally define interaction data as coming from some actor which can either be an agent ($A$) or human ($H$). For purposes of this research, an agent is an algorithmic loop with the ability to access a web platform and perform actions such as clicks, scrolls, and input field fills. The loop terminates when the internal large language model judges the provided task definition as complete. A detailed breakdown can be found in \cref{algagent-loop}.

 \subsection{Research Questions}
+\label{sec:research_questions}

 This dissertation is organized around one main research question and three supporting pillar questions:
 \begin{enumerate}
    \item[\textbf{Main RQ}] How can dynamic pricing systems preserve margin integrity when transaction orchestration is increasingly mediated by non-human agents?
-    \item[\textbf{SQ1}] \textit{Distinguishability}: Can agent and human sessions be reliably distinguished from behavioral interaction signals alone, without relying on network-level or device fingerprinting?
-    \item[\textbf{SQ2}] \textit{Theoretical Impact}: What is the formal relationship between agent contamination levels and the erosion of pricing power in dynamic pricing systems?
-    \item[\textbf{SQ3}] \textit{Robust Mitigation}: How can pricing policies be constructed to maintain margin integrity under unknown and non-stationary levels of agent contamination?
+    \item[\textbf{SQ1}] \hypertarget{sq1}{}\textit{Distinguishability}: Can agent and human sessions be reliably distinguished from behavioral interaction signals alone, without relying on network-level or device fingerprinting?
+    \item[\textbf{SQ2}] \hypertarget{sq2}{}\textit{Theoretical Impact}: What is the formal relationship between agent contamination levels and the erosion of pricing power in dynamic pricing systems?
+    \item[\textbf{SQ3}] \hypertarget{sq3}{}\textit{Robust Mitigation}: How can pricing policies be constructed to maintain margin integrity under unknown and non-stationary levels of agent contamination?
 \end{enumerate}


@@ -65,3 +67,6 @@ Extract final result $r$ from terminal state\;


 The previously described goal of distinguishability allows us to formulate a task which entails taking raw interaction data for either actor and creating a composite demand estimate $\hat{q}$. We propose a robust optimization objective defined in our methodology, transforming the pricing problem into a form of distributionally robust optimization \parencite{kuhn_distributionally_2025} in which the learner guards against adversarial contamination in observed demand \emph{distributions}. The decision rule (in the policy) must perform when the data-generating mechanism is not a single known distribution but any member of an ambiguity set described only partially. Here that mechanism is a mixture whose weight and components need not be stationary.
+
+% The contributions of this thesis are best understood as a dependency chain centered on dynamic pricing under agent-mediated contamination. The work begins with a formal account of why non-human reconnaissance threatens pricing power, then constructs a controlled platform capable of generating the interaction data needed to study that threat empirically. On top of that substrate, session behavior is modeled to determine whether human and agent traffic can be separated from behavioral traces alone. The resulting contamination estimate is then translated into the pricing loop itself, where it serves as an operational signal for robust control under distributional uncertainty. The breadth of the thesis is therefore a consequence of the problem structure: the theoretical, behavioral, systems, and control components are not separate projects, but successive requirements of a single argument.
+Our work's contributions are best understoo as a dependency chain centered around dynamic pricing. The work begins with a formal account of why a non human mediator threatens pricing power, then we construct a platform capable of generating the intraction data needed for our study of that threat. On to of that \textit{substrate} we build behaioral models to determine whether human and agent traffic can be separated. The resulting contamination estimate is then translated into the pricing core itself, where it serves as key signal for robust control under distributionall uncertainty. The breadth of the thesis is therefore a consequence of the problem structure: the theoretical, behavioral, systems, and control components are not separate projects, but successive requirements of a single argument.
--- a/paper/src/chapters/02-literature-review.tex
+++ b/paper/src/chapters/02-literature-review.tex
@@ -1,4 +1,5 @@
 \section{Literature Review}
+\label{sec:literature_review}

 To situate the work we review agents and agentic computer use, web automation, economic reasoning, and strategic interaction, then turn to data-driven dynamic pricing under uncertainty. The main technical risk is not ``agents buying things'' in isolation but agents reshaping the behavioral and demand signals on which downstream pricing depends. Related litigation is already underway---for example \textcite{noauthor_amazoncom_2026} under the Computer Fraud and Abuse Act. Mediating actors surface classic concerns such as false-name bidding \parencite{yokoo_effect_2004} or pseudonymous re-entry which can whitewash reputation and weaken defenses \parencite{feldman_free-riding_2004}. Dynamic pricing assumes demand proxies are behaviorally meaningful, whereas classical bot detection targets security and access control. The gap we target is a principled way to separate non-human reconnaissance from genuine human demand expression and to fold that signal into pricing without degrading legitimate users (we track harm with a user-experience index), for the stakeholders named in the introduction.

--- a/paper/src/chapters/03-methodology.tex
+++ b/paper/src/chapters/03-methodology.tex
@@ -1,4 +1,5 @@
 \section{Methodology}
+\label{sec:methodology}

 % Extra notes and clarifications: we observed some humans and get their transition probabilities between event types
 % We modify behavioral profiles of transition matrices with price elasticity matrices generated by sample valuations of a distributing.
--- a/paper/src/chapters/04-results.tex
+++ b/paper/src/chapters/04-results.tex
@@ -1,4 +1,10 @@
 \section{Results}
+\label{sec:results}
+
+% The gap we target is not detection for its own sake but whether behavioral signals can support pricing decisions once agent traffic is present. This section follows the supporting questions in \cref{sec:research_questions}: we first establish session-level distinguishability (behavioral evidence and a rank test), then estimate how contamination shifts revenue in a controlled sweep, and finally compare robust and baseline policies under factorial training with COI and revenue readouts. The ordering is deliberate---each stage feeds the next so that separability, contamination effects, and policy outcomes form one connected line of evidence.
+
+In our work, the gap we target is not the detection for its own sake. Our aim is to understand behavioral signals which can support pricing decisions once agent traffic is present. Now we set to conclude and piece together the path we laid out in \cref{sec:research_questions}. We established distinguishability (behavioral evidence and test) that estimate how contamination shifts revenue in an adversarial environment and finally we compare robust and baseline pricing under factorial training.
+
 \begin{figure}[ht]
    \centering
    \input{chapters/figures/supra/supra.tex}
@@ -40,7 +46,7 @@ We report two preliminary stages before the full factorial interpretation. First

 \subsubsection{The Impact of Contamination on Revenue}

-The contamination--revenue slope is estimated on a controlled cohort (single sweep, baseline policy, $n_{\text{products}}=100$, $n=95$). In this setting, contamination $\alpha$ is set exogenously by the experiment, so the slope identifies the within-sweep causal effect of contamination on revenue under fixed policy and environment settings.
+The contamination--revenue slope is estimated on a controlled cohort (single sweep, baseline policy, $n_{\text{products}}=100$, $n=95$). In this setting, contamination $\alpha$ is set exogenously by the experiment, so the slope identifies the within-sweep causal effect of contamination on revenue under fixed policy and environment settings. These results are in favor of our second research question \hyperlink{sq2}{\textbf{SQ2}} (\textit{Theoretical Impact}) from \cref{sec:research_questions}.

 \begin{table}[ht]
 \centering
@@ -95,11 +101,11 @@ In our complete training runs we logged $\approx 180$ days of net compute time.


 \subsection{Interpretation and Insights}
-The Mann-Whitney result ($p<0.001$) confirms that per-session divergence gaps distinguish the two actor classes with near-zero overlap in rank ordering. This is the condition required for distinguishability to act as a useful control signal in the pricing loop rather than just an auxiliary classifier score.
+The Mann-Whitney result ($p<0.001$) confirms that per-session divergence gaps distinguish the two actor classes with near-zero overlap in rank ordering. This is the condition required for distinguishability to act as a useful control signal in the pricing loop rather than just an auxiliary classifier score. This is a direct result relevant to our first pillar \hyperlink{sq1}{\textbf{SQ1}} (\textit{Distinguishability}) from \cref{sec:research_questions}.

 The first calibration and paired benchmark runs additionally confirm three practical points aligned with the thesis. First, the control loop is reproducible end-to-end (training, evaluation, artifact generation) across algorithms and contamination levels. Second, policy class materially changes price trajectories and resulting COI/revenue profiles under identical environment settings. Third, objective improvements from robustness are regime-dependent in the current baseline, which is consistent with the thesis claim that contamination-aware pricing needs explicit calibration rather than a one-size-fits-all penalty.

-We also note that maximizing revenue in isolation can favor aggressive high-price behavior, even in our early runs, the non-robust aggregate shows slightly higher mean COI and margin. For this reason, all subsequent reporting in this thesis is interpreted on a multi-metric basis (objective, revenue, COI, and stability), and not by revenue alone.
+We also note that maximizing revenue in isolation can favor aggressive high-price behavior, even in our early runs, the non-robust aggregate shows slightly higher mean COI and margin. For this reason, all subsequent reporting in this thesis is interpreted on a multi-metric basis (objective, revenue, COI, and stability), and not by revenue alone. This is another direct answer to our third pillar \hyperlink{sq3}{\textbf{SQ3}} (\textit{Robust Mitigation}) from \cref{sec:research_questions}.


 \subsection{Anomalies}
--- a/paper/src/chapters/05-discussion.tex
+++ b/paper/src/chapters/05-discussion.tex
@@ -1,4 +1,5 @@
 \section{Discussion}
+\label{sec:discussion}

 % TODO: Gpdr here

--- a/paper/src/chapters/06-conclusion.tex
+++ b/paper/src/chapters/06-conclusion.tex
@@ -1,8 +1,11 @@
 \section{Conclusion}
+\label{sec:conclusion}

 This thesis examined reinforcement-learning policies for dynamic pricing when a fraction of traffic is orchestrated by non-human agents intent on extracting information before purchase. We introduced COI-oriented metrics, a behavioral distinguishability layer, and a distributionally robust training loop; empirical runs show where robustness helps and where it must be tuned.

 \subsection{Summary of contributions}
+Our work has yielded a broad set of dependencies which we carefully orchestrated to give us measurable results. To give a clear picture we outline the specific contributions of each stage of our work. The theoretical component formalizes why agent-mediated reconnaissance erodes pricing power, the behavioral component establishes that such contamination is detectable from interaction traces alone, the control component translates that distinguishability into a robust pricing mechanism, and the systems component provides the controlled experimental environment required to observe, test, and reproduce these effects.
+
 \begin{itemize}
    \item TPU-accelerated parallelization of the behavioral simulation and reinforcement learning pipeline, making large factorial sweeps tractable.
    \item Formalization of non-human transaction orchestration in e-commerce as a distinct source of contamination in dynamic pricing systems.