work summary and notes

2026-06-01 00:53:36 +00:00 · 2026-04-09 19:58:08 +02:00
parent 895a004807
commit 51de0cbdc5
6 changed files with 23 additions and 6 deletions
--- a/paper/src/chapters/04-results.tex
+++ b/paper/src/chapters/04-results.tex
@@ -1,4 +1,10 @@
 \section{Results}
+\label{sec:results}
+
+% The gap we target is not detection for its own sake but whether behavioral signals can support pricing decisions once agent traffic is present. This section follows the supporting questions in \cref{sec:research_questions}: we first establish session-level distinguishability (behavioral evidence and a rank test), then estimate how contamination shifts revenue in a controlled sweep, and finally compare robust and baseline policies under factorial training with COI and revenue readouts. The ordering is deliberate---each stage feeds the next so that separability, contamination effects, and policy outcomes form one connected line of evidence.
+
+In our work, the gap we target is not the detection for its own sake. Our aim is to understand behavioral signals which can support pricing decisions once agent traffic is present. Now we set to conclude and piece together the path we laid out in \cref{sec:research_questions}. We established distinguishability (behavioral evidence and test) that estimate how contamination shifts revenue in an adversarial environment and finally we compare robust and baseline pricing under factorial training.
+
 \begin{figure}[ht]
    \centering
    \input{chapters/figures/supra/supra.tex}
@@ -40,7 +46,7 @@ We report two preliminary stages before the full factorial interpretation. First

 \subsubsection{The Impact of Contamination on Revenue}

-The contamination--revenue slope is estimated on a controlled cohort (single sweep, baseline policy, $n_{\text{products}}=100$, $n=95$). In this setting, contamination $\alpha$ is set exogenously by the experiment, so the slope identifies the within-sweep causal effect of contamination on revenue under fixed policy and environment settings.
+The contamination--revenue slope is estimated on a controlled cohort (single sweep, baseline policy, $n_{\text{products}}=100$, $n=95$). In this setting, contamination $\alpha$ is set exogenously by the experiment, so the slope identifies the within-sweep causal effect of contamination on revenue under fixed policy and environment settings. These results are in favor of our second research question \hyperlink{sq2}{\textbf{SQ2}} (\textit{Theoretical Impact}) from \cref{sec:research_questions}.

 \begin{table}[ht]
 \centering
@@ -95,11 +101,11 @@ In our complete training runs we logged $\approx 180$ days of net compute time.


 \subsection{Interpretation and Insights}
-The Mann-Whitney result ($p<0.001$) confirms that per-session divergence gaps distinguish the two actor classes with near-zero overlap in rank ordering. This is the condition required for distinguishability to act as a useful control signal in the pricing loop rather than just an auxiliary classifier score.
+The Mann-Whitney result ($p<0.001$) confirms that per-session divergence gaps distinguish the two actor classes with near-zero overlap in rank ordering. This is the condition required for distinguishability to act as a useful control signal in the pricing loop rather than just an auxiliary classifier score. This is a direct result relevant to our first pillar \hyperlink{sq1}{\textbf{SQ1}} (\textit{Distinguishability}) from \cref{sec:research_questions}.

 The first calibration and paired benchmark runs additionally confirm three practical points aligned with the thesis. First, the control loop is reproducible end-to-end (training, evaluation, artifact generation) across algorithms and contamination levels. Second, policy class materially changes price trajectories and resulting COI/revenue profiles under identical environment settings. Third, objective improvements from robustness are regime-dependent in the current baseline, which is consistent with the thesis claim that contamination-aware pricing needs explicit calibration rather than a one-size-fits-all penalty.

-We also note that maximizing revenue in isolation can favor aggressive high-price behavior, even in our early runs, the non-robust aggregate shows slightly higher mean COI and margin. For this reason, all subsequent reporting in this thesis is interpreted on a multi-metric basis (objective, revenue, COI, and stability), and not by revenue alone.
+We also note that maximizing revenue in isolation can favor aggressive high-price behavior, even in our early runs, the non-robust aggregate shows slightly higher mean COI and margin. For this reason, all subsequent reporting in this thesis is interpreted on a multi-metric basis (objective, revenue, COI, and stability), and not by revenue alone. This is another direct answer to our third pillar \hyperlink{sq3}{\textbf{SQ3}} (\textit{Robust Mitigation}) from \cref{sec:research_questions}.


 \subsection{Anomalies}