mdp additionally

2026-07-15 17:43:36 +00:00 · 2026-01-10 11:55:53 +01:00
parent ea5e432672
commit b5a067e2f9
3 changed files with 8 additions and 1 deletions
--- a/paper/src/chapters/02-literature-review.tex
+++ b/paper/src/chapters/02-literature-review.tex
@@ -31,7 +31,7 @@ When dynamic pricing algorithms operate on highly contaminated or noisy data, th



-Early hints of exploration of prices in a standard English auction explored in \cite{varian_economic_nodate} which hints at exploration of prices in a sequential manner, which leads to a marginally different cost to the bidder than the reservation price of the seller. This is a setting in which there is no cost incured by the buyer for their actions or exploring prices in the market. They propose that any agent responsable for the pricing of a good must be imune to dynamic strategies which might extract private information from a market. A key take-away which relates to the Vickery auction mechanism (also called a \textit{direct mechanism}) suggests that not only would defenses against such exploitation be necessary, but the construction of a mechanism in which revelation of the true willingness to pay is the dominant strategy for commerce.
+Early hints of exploration of prices in a standard English auction explored in \cite{varian_economic_1995} which hints at exploration of prices in a sequential manner, which leads to a marginally different cost to the bidder than the reservation price of the seller. This is a setting in which there is no cost incured by the buyer for their actions or exploring prices in the market. They propose that any agent responsable for the pricing of a good must be imune to dynamic strategies which might extract private information from a market. A key take-away which relates to the Vickery auction mechanism (also called a \textit{direct mechanism}) suggests that not only would defenses against such exploitation be necessary, but the construction of a mechanism in which revelation of the true willingness to pay is the dominant strategy for commerce.

 Like in classical revenue-maximizing auctions \cite{roughgarden_cs364a_2013} we assume that the human actor in our system has a private valuation $v$ which we formally draw from later defined distributions. The important note here is that the agent proxy does not have a mechanism to convey this private information into the demand data which directly impacts the pricing systems.

--- a/paper/src/chapters/03-methodology.tex
+++ b/paper/src/chapters/03-methodology.tex
@@ -187,6 +187,13 @@ To develop a robust pricing agent, we require a simulation environment capable o
 \subsubsection{GOFAI-Based Separability}
 We employ Good Old-Fashioned AI (GOFAI) heuristics to generate initial weak labels for separability. We define a set of rule-based predicates $\phi_j: \tau \to \{0, 1\}$ to partition the dataset $\mathcal{D}$ into high-confidence sets $\mathcal{D}_H$ and $\mathcal{D}_A$. We construct distinct MDPs per each behavioral profile of humans and agents and from those we establish $D_{KL}$.

+\begin{figure}[ht]
+    \centering
+    \includegraphics[width=0.8\textwidth]{chapters/mdp_human.pdf}
+    \caption{Markov Decision Process visualization illustrating the behavioral transition dynamics for human and agent actor profiles. The state space and transition probabilities are learned from observed session trajectories to enable generative contamination.}
+    \label{fig:mdp_viz}
+\end{figure}
+
 \subsubsection{Transition Probability Estimation}
 For both subsets, we model the session dynamics as a Markov Decision Process (MDP) and estimate the transition kernel $\mathcal{T}$. The probability of transitioning to state $s'$ given state $s$ is estimated via maximum likelihood:
 \begin{equation}
--- a/paper/src/chapters/mdp_human.pdf
+++ b/paper/src/chapters/mdp_human.pdf