From a3dc5125df447602abcd9b50d40a96fb86f876c5 Mon Sep 17 00:00:00 2001 From: Daniel Rosel Date: Thu, 9 Apr 2026 20:25:37 +0200 Subject: [PATCH] fix: typos and flow --- paper/src/chapters/03-methodology.tex | 6 +++--- paper/src/chapters/05-discussion.tex | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/paper/src/chapters/03-methodology.tex b/paper/src/chapters/03-methodology.tex index 17b0071..11dc265 100644 --- a/paper/src/chapters/03-methodology.tex +++ b/paper/src/chapters/03-methodology.tex @@ -140,9 +140,9 @@ This result implies that standard pricing policies $\pi$ cannot extract the same In order for our research to have grounding in interactions we built a robust e-commerce web-platform. In this framing Kappa represents streamed processing and Lambda batch operations as is given by terminology in big-data processing. We initially conducted a survey of the leading platforms of airlines and hotel booking sites to identify the specific interface patterns that effectively manage complex travel data. To better understand the playing field, we collected artifacts on design across various airlines and hotels. While both sectors rely on tabbed service selection and left-sidebar filtering to streamline navigation, they diverge in result presentation: airlines utilize visual date-price bars and multi-step wizards to optimize for logistical transparency, whereas hotel platforms leverage image-led cards and scarcity triggers to drive emotional engagement and urgency. Our web framework defines a highly agnostic boilerplate which can be seeded with any data-modality with an easy-to-tailor pattern, which we leverage to define a \texttt{hotel} and \texttt{airline} mode. Both modes are then individually deployed via an environment-level argument which adjusts the proxy routing with custom middleware in Next.js to render only the desired mode. The purpose of this was to create a baseline adaptable to any use-case or desired commercial application. -The architecture begins with deployed web applications posting interaction data to a backend that stores each record in Apache Kafka. Kafka acts as the reservoir linking sessions to experiments. Behavioral events and, separately, price observations from the pricing-provider microservice (invoked by the frontend) land in Kafka topics. A scheduled Airflow pipeline (with manual triggers) consumes the stream and the final pricing stage writes vectors to Redis for low-latency reads by the provider and display in the client. This design pattern allows us to generalize to other commercial settings, where Kafka is used for durability and replay, Redis for serving and quick queries. We invested in this stack to keep runs reproducible and to limit extraneous variance so the same skeleton applies across e-commerce settings +The architecture begins with deployed web applications posting interaction data to a backend that stores each record in Apache Kafka. Kafka acts as the reservoir linking sessions to experiments. Behavioral events and, separately, price observations from the pricing-provider microservice (invoked by the frontend) land in Kafka topics. A scheduled Airflow pipeline (with manual triggers) consumes the stream and the final pricing stage writes vectors to Redis for low-latency reads by the provider and display in the client. This design pattern allows us to generalize to other commercial settings, where Kafka is used for durability and replay, Redis for serving and quick queries. We invested in this stack to keep runs reproducible and to limit extraneous variance so the same skeleton applies across e-commerce settings. -\paragraph{Public Web Artifact} We transition the Kappa like architecture of the data collection to a Lambda architecture for actual learning in a surrogate environment. This allows us to move faster on data which is provided and helps us create a feedback loop for production deployment. To support further research in this intersection of fields we release P4P \footnote{\url{https://github.com/velocitatem/p4p}} as a public repository providing the interaction layer of the PHANTOM framework. This provides a configurable storefront which can be tailored to any commercial setting with a standardized session-level event tracking. We document the API adapters and expected schemas for pricing providers and log ingestion services. The repository is intended for controlled experimentation and method replication rather than production commerce deployment. +\paragraph{Public Web Artifact} We transition the Kappa-like architecture of the data collection to a Lambda architecture for actual learning in a surrogate environment. This allows us to move faster on data which is provided and helps us create a feedback loop for production deployment. To support further research in this intersection of fields we release P4P \footnote{\url{https://github.com/velocitatem/p4p}} as a public repository providing the interaction layer of the PHANTOM framework. This provides a configurable storefront which can be tailored to any commercial setting with a standardized session-level event tracking. We document the API adapters and expected schemas for pricing providers and log ingestion services. The repository is intended for controlled experimentation and method replication rather than production commerce deployment. \paragraph{Public Dataset} For reproducibility of the behavioral analysis and distinguishability experiments, we also release the interaction dataset used in this thesis as \textit{WhoClickedIt}. The dataset is hosted on Hugging Face \footnote{\url{https://huggingface.co/datasets/velocitatem/whoclickedit}} and is distributed as one flattened event sheet (\texttt{whoclicked.csv}) with explicit labels (\texttt{actor\_type}, \texttt{is\_agent}, and \texttt{record\_type}). The dataset card on that page documents the schema, collection process, and known limitations. @@ -498,7 +498,7 @@ In practice, we parameterize this with a session-level leakage term: \begin{equation} \text{COI}_{\text{leak}}(p,\tau') = f(\tau')\cdot \text{InfoValue}(p,\tau') \end{equation} -where $f(\tau')$ is the weak agent probability and $\text{InfoValue}$ is implemented either as a constant query-tax surrogate or as a revelation surrogate $-\log\pi(p\mid\tau')$. This is the surprise of the probability of a certain price-setting probability. Essentially, we proxy the leakage term as a surprise of the price our policy is setting, weighted by the contamination estimate. Appendix~\ref{app:revelation_log} expands on why the logarithm is used in the revelation surrogate. +where $f(\tau')$ is the weak agent probability and $\text{InfoValue}$ is implemented either as a constant query-tax surrogate or as a revelation surrogate $-\log\pi(p\mid\tau')$. This is the surprise of a certain price-setting probability. Essentially, we proxy the leakage term as a surprise of the price our policy is setting, weighted by the contamination estimate. Appendix~\ref{app:revelation_log} expands on why the logarithm is used in the revelation surrogate. The inner minimization selects the contamination candidate that makes the penalized reward smallest, so the outer policy update faces the worst plausible leakage scenario inside the ambiguity set rather than an average case. diff --git a/paper/src/chapters/05-discussion.tex b/paper/src/chapters/05-discussion.tex index b13486c..7e7cb5f 100644 --- a/paper/src/chapters/05-discussion.tex +++ b/paper/src/chapters/05-discussion.tex @@ -21,6 +21,6 @@ Behavior-based pricing raises predictable ethics questions when models are opaqu We balance human and agent sessions near one-to-one so cohorts are comparable despite different population sizes. The row-level dataset still contains thousands of events. % Rapid change in agent capabilities and user expectations induces model drift; the UX term in reward shaping was included partly to penalize policies that sacrifice legitimate users for short-run revenue. Reinforcement learning adds its own risks---reward hacking and limited interpretability---which matter when policies touch live revenue; deployment would require monitoring and constraints beyond what we exercised here. -With the exponential growth in capability of agents aswell as user expectations, a degree of model drift is expected in this setting. The computational requirements for continuous extraction of margin as demonstrated by our work are required by the persistent speed of the market. Reinforcement learning that sacrifices legitimate user experience for short run revenue does not hold up in the long run. Reward hacking, to which pricing algorithms are not impervious due to their limited interpretability is a significant risk for a company if live revenue is in play. Deployment requires consistent monitoring and constraints beyond what was done as exercise in this work. +With the exponential growth in capability of agents aswell as user expectations, a degree of model drift is expected in this setting. The computational requirements for continuous extraction of margin as demonstrated by our work are required by the persistent speed of the market. Reinforcement learning that sacrifices legitimate user experience for short run revenue does not hold up in the long run. Reward hacking, to which pricing algorithms are not impervious due to their limited interpretability, is a significant risk for a company if live revenue is in play. Deployment requires consistent monitoring and constraints beyond what was done as an exercise in this work. % \subsection{Implications of Findings} Interpretation of results and altenrative scenarios with broader market implications.