fixing typos and consistency

This commit is contained in:
2025-12-15 21:13:05 +01:00
parent 53ccbc8289
commit 0a1149b460
4 changed files with 31 additions and 31 deletions

View File

@@ -1,21 +1,21 @@
\section{Literature Review}
To better understand all wedges of the work, we must start by exploring the nature of agents and agentic computer use and web automation, complementing that with economic reasoning and strategic interaction. The final surface to cover, leads us to data-driven dynamic pricing under uncertainty. The key technical risk is not ``agents buying things'' per se, but agents shaping the behavioral and demand signals that downstream pricing systems consume and depend on. The introduction of these mediating actor entities into economic systems, is further creating a threat of false-name bidding, which prior research has explored in a trading context. Dynamic pricing assumes demand proxies are behaviorally meaningful, while bot detection aims at security and access control. The missing bridge is a principled framework for separating non-human reconnaissance from genuine human demand expression and integrating that separation into pricing heuristics without degrading legitimate user experience (in our research tracked by the user-experience index). This gap, is what our contribution aims to address - particularly for the afformentioned stakeholder groups.
To better understand all wedges of the work, we must start by exploring the nature of agents and agentic computer use and web automation, complementing that with economic reasoning and strategic interaction. The final surface to cover, leads us to data-driven dynamic pricing under uncertainty. The key technical risk is not ``agents buying things'' per se, but agents shaping the behavioral and demand signals that downstream pricing systems consume and depend on. The introduction of these mediating actor entities into economic systems, is further creating a threat of false-name bidding, which prior research has explored in a trading context. Dynamic pricing assumes demand proxies are behaviorally meaningful, while bot detection aims at security and access control. The missing bridge is a principled framework for separating non-human reconnaissance from genuine human demand expression and integrating that separation into pricing heuristics without degrading legitimate user experience (in our research tracked by the user-experience index). This gap, is what our contribution aims to address - particularly for the aforementioned stakeholder groups.
\subsection{Agent Taxonomy and Definitions}
An agent in the contex of artificial inteligence is generally defined by anything that can reason and act uppon observations of its environments (collected through some sensory inputs) and carry out actions trough effectors. Moreover, a rational agent is an entity that is capable of perceiving the world around them and taking actions to advance specified goals. This definition by \cite{Russell} is further developed in an economic context by \cite{Parkes2015}, suggesting AI research attempts to construct a synthetic \textit{homo economicus}, which may also be termed \textit{machina economicus}.
An agent in the context of artificial intelligence is generally defined by anything that can reason and act upon observations of its environments (collected through some sensory inputs) and carry out actions through effectors. Moreover, a rational agent is an entity that is capable of perceiving the world around them and taking actions to advance specified goals. This definition by \cite{Russell} is further developed in an economic context by \cite{Parkes2015}, suggesting AI research attempts to construct a synthetic \textit{homo economicus}, which may also be termed \textit{machina economicus}.
A specific class or taxon of this \textit{machina economicus}, the Large Language Model (LLM) agent, is defined as an autonomous system capable of achieving goals and adapting post-training, often without needing explicit code or fundamental model changes. \cite{Xia2025}
We must however acknowledge the current SOTA as presented by OSWORLD simulations in \cite{Xie} have demonstrated that multi-modal tasks across desktop and web interaction modes, have a top-performing score of only 12.24\% sucess, whereas humans have a higher 72\% sucess rate. This weakness matters for this research because it clarifies the near-term threat model: practical exploitation does not require a fully competent ``computer assistant'', only enough automation to perform high-volume reconnaissance actions (search/filter/open product pages, probe availability/price boundaries) that can contaminate behavioral signals. With the expected growth of these capabilities, this threat only becomes more perilous to revenue management systems.
We must however acknowledge the current SOTA as presented by OSWORLD simulations in \cite{Xie} have demonstrated that multi-modal tasks across desktop and web interaction modes, have a top-performing score of only 12.24\% success, whereas humans have a higher 72\% success rate. This weakness matters for this research because it clarifies the near-term threat model: practical exploitation does not require a fully competent ``computer assistant'', only enough automation to perform high-volume reconnaissance actions (search/filter/open product pages, probe availability/price boundaries) that can contaminate behavioral signals. With the expected growth of these capabilities, this threat only becomes more perilous to revenue management systems.
We model an agent session as producing some events with lower in-session conversion levels relative to humans, this we state in our assumption that $P(\text{purchase} \vert A) \ll P\text{purcahse} \vert H)$ but with a potentially higher volatility in $\hat{q}$, which we observe through the look-to-book metrics in our simulation.
We model an agent session as producing some events with lower in-session conversion levels relative to humans, this we state in our assumption that $P(\text{purchase} \vert A) \ll P(\text{purchase} \vert H)$ but with a potentially higher volatility in $\hat{q}$, which we observe through the look-to-book metrics in our simulation.
\subsection{Economic Agents: From Homo Economicus to Machina Economicus}
Existing behvarioal economic models tend to be criticized for the assumption of rational behavior, as is embodied in the term of homo economicus. The definition of a machina economicus by \cite{Parkes2015} is quite apropriate for our case, particularly becuase these assumptions of rationality have been argued to be a very adequeate reference for AI research by \cite{Varian}. For modeling this behavior, the trajectories of these agents can be formally defined to be partially observable Markov decision processes. \cite{Xie} Agents are however not to be confused with web-bots which have previously been known as automated software applications or scrapers which are set with a purpose of carrying out specific tasks on the internet, without a higher level of internal judgement. \cite{Imperva2025} In our research, we refer to this actor simply as an Agent belonging to the distribution $A$.
Existing behavioral economic models tend to be criticized for the assumption of rational behavior, as is embodied in the term of homo economicus. The definition of a machina economicus by \cite{Parkes2015} is quite appropriate for our case, particularly because these assumptions of rationality have been argued to be a very adequate reference for AI research by \cite{Varian}. For modeling this behavior, the trajectories of these agents can be formally defined to be partially observable Markov decision processes. \cite{Xie} Agents are however not to be confused with web-bots which have previously been known as automated software applications or scrapers which are set with a purpose of carrying out specific tasks on the internet, without a higher level of internal judgement. \cite{Imperva2025} In our research, we refer to this actor simply as an Agent belonging to the distribution $A$.
This economic framing also helps separate two related but distinct phenomena of agents as buyers (changing market demand composition), and agents as information gatherers (changing the observed itneractions used by pricing/recommendation systems). The thesis focuses on the second, where information acquisition strategically precedes purchase execution.
This economic framing also helps separate two related but distinct phenomena of agents as buyers (changing market demand composition), and agents as information gatherers (changing the observed interactions used by pricing/recommendation systems). The thesis focuses on the second, where information acquisition strategically precedes purchase execution.
\subsection{Problem Evidence and Market Impact}
@@ -30,7 +30,7 @@ The statistical issue of contamination in dynamic pricing systems that observe d
Documented instances of agent-driven market disruptions - Quantitative evidence of pricing manipulation - Case studies from affected industries
\subsection{Theoretical Foundations: Economic Prallels}
\subsection{Theoretical Foundations: Economic Parallels}
Economic foundations: relating the problem to options pricing theory. Cost of Information (COI) concept and its relevance

View File

@@ -2,14 +2,14 @@
\subsection{Problem Formalization}
In a commercial setting we can collect behavioral data on any actors interactions within a platform we have control over. This collection is done through sessions such each session belongs to an actor class $Y_s \in \{H,A\}$ with randomized assignment. This lets us build a trajectory $\tau_s$ of observable interaction events $\tau_s=(e_{s,1},\ldots,e_{s,L_s})$ where each event is defined as $e_{s,k} = (a_{s,k},i_{s,k},t_{s,k})$. We additionally define the rest of the components in each event acordingly:
In a commercial setting we can collect behavioral data on any actors interactions within a platform we have control over. This collection is done through sessions such each session belongs to an actor class $Y_s \in \{H,A\}$ with randomized assignment. This lets us build a trajectory $\tau_s$ of observable interaction events $\tau_s=(e_{s,1},\ldots,e_{s,L_s})$ where each event is defined as $e_{s,k} = (a_{s,k},i_{s,k},t_{s,k})$. We additionally define the rest of the components in each event accordingly:
\begin{itemize}
\item $a_{s,k} \in A$ where $A = \{\text{page\_view}, \text{view\_item\_page}, \text{add\_item}\}$. % TODO: translate all from /home/velocitatem/Documents/Projects/PHANTOM/web/src/lib/events.ts into this latex
\item $a_{s,k} \in \mathcal{A}$ where $\mathcal{A} = \{\text{page\_view}, \text{view\_item\_page}, \text{add\_item}\}$. % TODO: translate all from /home/velocitatem/Documents/Projects/PHANTOM/web/src/lib/events.ts into this latex
\item $i_{s,k} \in \{1, \ldots, N\}$ which is the product association per-event (if applicable).
\item $t_{s,k}$ which is the timestamp mapped to the session.
\end{itemize}
What the platform observes is the interaction logs $\tau_s$, price query logs and purchase signals. It is important to note that our pricing pipeline works not direclty with observed true human demand, but rather a behavioral proxy which is a composite of $H+A$.
What the platform observes is the interaction logs $\tau_s$, price query logs and purchase signals. It is important to note that our pricing pipeline works not directly with observed true human demand, but rather a behavioral proxy which is a composite of $q_H+q_A$.
Each interaction $i$ gives us some information about the willingness to pay ($v$) of a given customer, which we can try to estimate and measure against the true baseline.
@@ -20,7 +20,7 @@ $$
This lets us formalize the quality of our proxy $\hat{v}$ about the true $v$ from observing $\tau$ from any session $s$
\subsubsection{Proxy Definition for Demand Estimation}
Our proxy estimator is a critical component which has direct impact all downstream tasks, we start with a mapping of weights $\omega: A \to \mathbb{R}_+$ where for an epoch $t$ and product $i$ the observed demand proxy of a session $s$ looks like:
Our proxy estimator is a critical component which has direct impact all downstream tasks, we start with a mapping of weights $\omega: \mathcal{A} \to \mathbb{R}_+$ where for an epoch $t$ and product $i$ the observed demand proxy of a session $s$ looks like:
$$
\hat{q}_{t,i} = \sum_{e_{s,k}\in t} \omega(a_{s,k}) \cdot \mathbf{1} [i_{s,k}=i]
$$
@@ -41,16 +41,16 @@ Mathematical demonstration and validation of the COI and citation backed evidenc
\subsection{System Architecture}
In order for our research to have grounding in interactions we built a robust e-commerce web-platform. We initially conducted a survey of the leading platforms of airlines and hotel booking sites to identify the specific interface patterns that effectively manage complex travel data. Our analysis revealed a clear industry standard: while both sectors rely on tabbed service selection and left-sidebar filtering to streamline navigation, they diverge in result presentation: airlines utilize visual date-price bars and multi-step wizards to optimize for logistical transparency, whereas hotel platforms leverage image-led cards and scarcity triggers to drive emotional engagement and urgency. Our web framework defines a highly agnostic boilerplane which can be seeded with any data-modality with an easy-to-tailor pattern, which we leverage to define a \texttt{hotel} and \texttt{airline} mode. Both modes are then individually deployed via an envrionment level argument which adjusts the proxy routing with a custom middleware inside next.js to render only the desired mode. The purpose of this was to create a baseline adaptable to any use-case or desired commercial application.
In order for our research to have grounding in interactions we built a robust e-commerce web-platform. We initially conducted a survey of the leading platforms of airlines and hotel booking sites to identify the specific interface patterns that effectively manage complex travel data. Our analysis revealed a clear industry standard: while both sectors rely on tabbed service selection and left-sidebar filtering to streamline navigation, they diverge in result presentation: airlines utilize visual date-price bars and multi-step wizards to optimize for logistical transparency, whereas hotel platforms leverage image-led cards and scarcity triggers to drive emotional engagement and urgency. Our web framework defines a highly agnostic boilerplate which can be seeded with any data-modality with an easy-to-tailor pattern, which we leverage to define a \texttt{hotel} and \texttt{airline} mode. Both modes are then individually deployed via an environment level argument which adjusts the proxy routing with a custom middleware inside next.js to render only the desired mode. The purpose of this was to create a baseline adaptable to any use-case or desired commercial application.
\
The architectuer of this platform begins with the deployed web-apps posting interaction data to our backend which processes them and stores each ingested interaction into a kafka cluster. This serves as our data reservoir tracking and associating each interaction with its session and importantly with which experiment it belongs to. Not only do we track the behavioral interactions, but our pricing provider micro-service, once called by the frontend reports the observed/queried price-product into kafka. This kafak cluster is subscribed to by our pipeline which is configured on a schedule in Airflow, with the possibility of manual trigger. The final stage of the pricing pipeline, submits computed dyanmic pricing results into a redis database for quick updates which is then read by the pricing provider and displayed on the webapp. This is a very generic end-to-end mechanism which is applicable to a variety of different e-commerce tasks.
The architecture of this platform begins with the deployed web-apps posting interaction data to our backend which processes them and stores each ingested interaction into a kafka cluster. This serves as our data reservoir tracking and associating each interaction with its session and importantly with which experiment it belongs to. Not only do we track the behavioral interactions, but our pricing provider micro-service, once called by the frontend reports the observed/queried price-product into kafka. This kafka cluster is subscribed to by our pipeline which is configured on a schedule in Airflow, with the possibility of manual trigger. The final stage of the pricing pipeline, submits computed dynamic pricing results into a redis database for quick updates which is then read by the pricing provider and displayed on the webapp. This is a very generic end-to-end mechanism which is applicable to a variety of different e-commerce tasks.
\subsubsection{DevOps Principles}
\subsubsection{Online Dynamic Pricing}
The dynamic pricing done is handled by a pipeline which computes a demand estimate on a per-product basis of a specific window of the data, defined by the period $T$ which by default is 5 mintues. This dynamic pricing pipeline computes a demand estimate vector $\hat{q} \in \mathbb{R}^N$ by a weighted sum of interactions for each product, it additionally computes a price elasticity vector $\hat{\epsilon}$ in the same dimensions as our demand. The final features matrix is of the size $N \times 2$ which we translate to a new price vector $\hat{p} \in \mathbb{R}^N$. The transformation that governs this dynamic pricing is a very simple surge-based pricing (a special case of our later defined policy $\pi$):
The dynamic pricing done is handled by a pipeline which computes a demand estimate on a per-product basis of a specific window of the data, defined by the period $T$ which by default is 5 minutes. This dynamic pricing pipeline computes a demand estimate vector $\hat{q} \in \mathbb{R}^N$ by a weighted sum of interactions for each product, it additionally computes a price elasticity vector $\hat{\epsilon}$ in the same dimensions as our demand. The final features matrix is of the size $N \times 2$ which we translate to a new price vector $\hat{p} \in \mathbb{R}^N$. The transformation that governs this dynamic pricing is a very simple surge-based pricing (a special case of our later defined policy $\pi$):
\begin{equation}
\hat{p}_i = \begin{cases}
@@ -67,10 +67,10 @@ where $p_0 \in \mathbb{R}^N$ is the base price vector (which is seeded into our
The experimentation begins with the design of goals, with careful consideration to assure a uniform spanning across different variables within each product-architecture of either the hotel or airline platforms. Our crafted collection of goals (jobs to be done) is then tracked in a postgress database with one table to track goals and another table to track different experiment runs, and their associated goals in a experiment-goal one-to-one relationship.
The purpose of this effort to gather data on interactions, is the first half of our research. With this collected data on behavioral characteristics, enhanced by our feature augmentation, we can create distribution separation into two bins $y \in \{A,H\}$ with a certain probability $p$ dependent on the session-specific features. To adddres the second loop of our system, we use this gained capability of discrimination to enhance the learner design involved in our surrogate dynamic pricing task which simulates an independent dynamic pricing scenario under which we can train a more controlled policy with the ability to account for true demand signals under conditions of contamination from non-human actors.
The purpose of this effort to gather data on interactions, is the first half of our research. With this collected data on behavioral characteristics, enhanced by our feature augmentation, we can create distribution separation into two bins $y \in \{A,H\}$ with a certain probability $p$ dependent on the session-specific features. To address the second loop of our system, we use this gained capability of discrimination to enhance the learner design involved in our surrogate dynamic pricing task which simulates an independent dynamic pricing scenario under which we can train a more controlled policy with the ability to account for true demand signals under conditions of contamination from non-human actors.
Our approach can be well summarized by a three-stage division, first we intend to observe and \textit{vectorize} the behavioral interaction data from our experiments, we then develop the separability which helps us deepen the semantic understanding of the behavioral patterns. Finally we use our newly gained learner to leverage a defensive mechanism within the simulation stage of a controled dynamic pricing loop.
Our approach can be well summarized by a three-stage division, first we intend to observe and \textit{vectorize} the behavioral interaction data from our experiments, we then develop the separability which helps us deepen the semantic understanding of the behavioral patterns. Finally we use our newly gained learner to leverage a defensive mechanism within the simulation stage of a controlled dynamic pricing loop.
\begin{figure}[ht]
\resizebox{\columnwidth}{!}{%
@@ -85,44 +85,44 @@ Study methodology and approach. Data acquisition strategy. Defined objectives an
\subsection{Discriminative Model Design}
With data collected from our platform we have a series of observed interactions, with each interaction having a mapping to a specific \texttt{sessionId} and \texttt{experimentId} which allows us to join all components of the experiment design into an information rich feature vector for each session in our observed data. To develop more explicitly the demand estimation, we propose a decomposition of the proxy $\hat{q_t}$ into two latent components:
With data collected from our platform we have a series of observed interactions, with each interaction having a mapping to a specific \texttt{sessionId} and \texttt{experimentId} which allows us to join all components of the experiment design into an information rich feature vector for each session in our observed data. To develop more explicitly the demand estimation, we propose a decomposition of the proxy $\hat{q}_t$ into two latent components:
$$
\hat{q_t} = \hat{q_t}^H + \hat{q_t}^A
\hat{q}_t = \hat{q}_t^H + \hat{q}_t^A
$$
\subsubsection{Feature Development}
The schema of our features is developed in \cref{tab:features} which shows the diferent types of features we produce in order to train our model to understand the origin of the traffic and to which distribution it belongs to. The features can be computed on a rolling basis of each session, for online deployment, however for our purposes it is currently compouted uniquely for each \texttt{sessionId} in our historical data.
The schema of our features is developed in \cref{tab:features} which shows the different types of features we produce in order to train our model to understand the origin of the traffic and to which distribution it belongs to. The features can be computed on a rolling basis of each session, for online deployment, however for our purposes it is currently computed uniquely for each \texttt{sessionId} in our historical data.
\input{chapters/feature_table.tex}
The problem we have is constrained by two fronteirs, one is extreme (paranoid) detection which includes methods such as CAPTCHA or more mechanical solutions to traffic blocking and detection. % TODO: talk about more methodologies here
On the other hand, a more lax system without detection (myopic) defines the lower bound of performance for our solution. Our goal is to achieve a paretto optimal detection sytem which creates a balance across the dimension of performance aswell as a more subjective but none the less important user experience index. To meaure or approach to this optimal solution we define a strong evalutation platform to compare our solutions to this learning task. Following the no free lunch theorem we must be proliphic in our approach to finding the correct method.
The problem we have is constrained by two frontiers, one is extreme (paranoid) detection which includes methods such as CAPTCHA or more mechanical solutions to traffic blocking and detection. % TODO: talk about more methodologies here
On the other hand, a more lax system without detection (myopic) defines the lower bound of performance for our solution. Our goal is to achieve a Pareto optimal detection system which creates a balance across the dimension of performance as well as a more subjective but none the less important user experience index. To measure our approach to this optimal solution we define a strong evaluation platform to compare our solutions to this learning task. Following the no free lunch theorem we must be prolific in our approach to finding the correct method.
\subsection{Dynamic Pricing Algorithm Analysis}
Deep dive into how the algorithm works, different kinds and justification for chosen appraoches + agent impact modeling and quantification.
Deep dive into how the algorithm works, different kinds and justification for chosen approaches + agent impact modeling and quantification.
\subsection{Reinforcement Learning Formulation}
We define our surrogate commercial environment within which we can accurately control for all the variables such as the true demand, providing a clear transaparency of the entire system. We start with a product catalogue of size $N$ with random supply initialization per-product. At every step the commercial simulation recieves a price vector $p$ according to which we simulate a set of interactions $I$ with a certain proportion $l_a$ of agents contributing interactions. The interactions serve as a proxy to estimating the true demand $q(p)$ which is composed of two separate demand generators $q_A(p)$ and $q_H(p)$.
On top of this our gym environment has a built demand estimator callback which is defined individually by each pricing engine. This engine is constructed to interact with the gym environment with the gyn environment at each step running a cycle via the comercial environment, creating an observation of all the interactions $I$ and a baseline vector which tells us the ground truth of demand, sales statistic and revenue. The engine is then responsible for learning the pricing policy prociding a pricing vector $p_{t+1}$ motivated by a per-episode summary reward composed by.
We define our surrogate commercial environment within which we can accurately control for all the variables such as the true demand, providing a clear transparency of the entire system. We start with a product catalogue of size $N$ with random supply initialization per-product. At every step the commercial simulation receives a price vector $p$ according to which we simulate a set of interactions $I$ with a certain proportion $l_a$ of agents contributing interactions. The interactions serve as a proxy to estimating the true demand $q(p)$ which is composed of two separate demand generators $q_A(p)$ and $q_H(p)$.
On top of this our gym environment has a built demand estimator callback which is defined individually by each pricing engine. This engine is constructed to interact with the gym environment with the gym environment at each step running a cycle via the commercial environment, creating an observation of all the interactions $I$ and a baseline vector which tells us the ground truth of demand, sales statistic and revenue. The engine is then responsible for learning the pricing policy providing a pricing vector $p_{t+1}$ motivated by a per-episode summary reward composed by.
$$
R = \text{revenue} - \text{COI} - \text{UX friction index}
$$
As part of our reward engineering we want to take inot account the cost of information in our reward with a weight.
As part of our reward engineering we want to take into account the cost of information in our reward with a weight.
Our pricing engine can be modeled by the mapping:
$$
\pi : \mathbb{R}^N_+ \times H_t \to \mathbb{R}_+^N
\pi : \mathbb{R}^N_+ \times \mathcal{H}_t \to \mathbb{R}_+^N
$$
where $H_t$ is the history and state we keep track of, allowing us to define a progression of prices as $p_{t+1} \gets \pi(\hat{q}_t,H_t)$. With this we can establish that $\tau$ influences $p_{t+1}$ through $\hat{q}_t$
where $\mathcal{H}_t$ is the history and state we keep track of, allowing us to define a progression of prices as $p_{t+1} \gets \pi(\hat{q}_t,\mathcal{H}_t)$. With this we can establish that $\tau$ influences $p_{t+1}$ through $\hat{q}_t$
How do we define the state space, action space and reward function breakdown and algorithm benchmarking.

View File

@@ -31,7 +31,7 @@
}
\begin{abstract}
The primary objective of this thesis is to develop and validate pricing heuristics that protect e-commerce platforms from systematic exploitation by Large Language Model (LLM) agents within dynamic pricing environments. As AI agents increasingly mediate consumer transactions, they enable users to circumvent the Cost of Information (the price premium accumulated through demand signal expression) by conducting reconnaissance in isolated sessions before executing purchases through clean sessions at base prices. This research will make an anticipatory contribution by adapting recommendation system methodologies to distinguish between genuine human browsing behaviour and agent-orchestrated information gathering, thereby enabling pricing systems to maintain margin integrity without degrading the user experience for legitimate customers or getting rid of leads generated by LLMs.
The primary objective of this thesis is to develop and validate pricing heuristics that protect e-commerce platforms from systematic exploitation by Large Language Model (LLM) agents within dynamic pricing environments. As AI agents increasingly mediate consumer transactions, they enable users to circumvent the Cost of Information (the price premium accumulated through demand signal expression) by conducting reconnaissance in isolated sessions before executing purchases through clean sessions at base prices. This research will make an anticipatory contribution by adapting recommendation system methodologies to distinguish between genuine human browsing behavior and agent-orchestrated information gathering, thereby enabling pricing systems to maintain margin integrity without degrading the user experience for legitimate customers or getting rid of leads generated by LLMs.
\end{abstract}
\maketitle
@@ -57,8 +57,8 @@ The primary objective of this thesis is to develop and validate pricing heuristi
\appendix
\section{Terminology}
\begin{description}
\item[Agent $a$] An actor of non-human nature, powered by an LLM.
\item[Human $h$] An individual human with some job to be done.
\item[Agent $A$] An actor of non-human nature, powered by an LLM.
\item[Human $H$] An individual human with some job to be done.
\end{description}
\input{../build/concatenated_code}