From e0b074161b227e9d12991113d5bae6ceedc9f936 Mon Sep 17 00:00:00 2001 From: Daniel Rosel Date: Mon, 2 Feb 2026 12:08:24 +0100 Subject: [PATCH] fix: typo --- paper/src/chapters/03-methodology.tex | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/paper/src/chapters/03-methodology.tex b/paper/src/chapters/03-methodology.tex index 540ae68..ff859e9 100644 --- a/paper/src/chapters/03-methodology.tex +++ b/paper/src/chapters/03-methodology.tex @@ -300,7 +300,7 @@ where $R(p, d)$ is the revenue function and $\lambda$ weighs the penalty for inf Another proposed formulation of the optimal policy would be to adjust the ambiguity set dyanmically over the live computed divergence where $\epsilon(\Delta_H)$ to adjust the ball around or estimator according to each behavioral signal emited through a given trajctory. We state this as a possibility but do not peruse it due to literature suggesting that wesserstine methods do not require absolute continuity and are better with ``black swans'' \parencite{kuhn_wasserstein_2024}. \subsubsection{Actor Implementation} -In our simulation, the "Follower" is implemented as a set of Actors. Each Actor is initialized with a type $\theta$ which samples a specific demand curve $d(p; \theta)$ from the latent distribution. This formalization ensures that our DR-RL agent does not overfit to a single deterministic demand function but learns a policy robust to the distributional uncertainty defined by $\mathcal{U}_\epsilon$. +In our simulation, the ``follower'' is implemented as a set of Actors. Each Actor is initialized with a type $\theta$ which samples a specific demand curve $d(p; \theta)$ from the latent distribution. This formalization ensures that our DR-RL agent does not overfit to a single deterministic demand function but learns a policy robust to the distributional uncertainty defined by $\mathcal{U}_\epsilon$. As part of our reward engineering we think about the UX factor ($UX \in [0,1]$) whic his our proxy for user experience degradation, this is computed as a mixture of contribution from the separability model metric of $\frac{1}{\text{Specificity}}$. @@ -320,7 +320,7 @@ We also need to think about a policy like taxation to the agents Strategy-Proof We now present the complete pricing mechanism that integrates the behavioral separability, contamination estimation, and robust optimization components developed in the preceding sections. Algorithm~\ref{alg:phantom_loop_clean} formalizes the defensive pricing loop as a Stackelberg game where the platform (leader) sets prices and the aggregate demand (follower) responds through observed session trajectories. \begin{algorithm}[t] -\caption{PHANTOM defensive pricing loop (bachelor-thesis level)} +\caption{PHANTOM defensive pricing loop} \label{alg:phantom_loop_clean} \DontPrintSemicolon \SetKwInOut{Input}{Input}\SetKwInOut{Output}{Output}