noisy formulations

2026-07-15 17:43:36 +00:00 · 2025-12-15 19:38:25 +01:00
parent b5461b9277
commit 5fb5f5b858
1 changed files with 35 additions and 7 deletions
--- a/paper/src/chapters/03-methodology.tex
+++ b/paper/src/chapters/03-methodology.tex
@@ -2,13 +2,31 @@

 \subsection{Problem Formalization}

-Mathematical formalization of agent-induced pricing distortions. Formal definition of potential loss mechanisms $\alpha D$
+In a commercial setting we can collect behavioral data on any actors interactions within a platform we have control over. This collection is done through sessions such each session belongs to an actor class $Y_s \in \{H,A\}$ with randomized assignment. This lets us build a trajectory $\tau_s$ of observable interaction events $\tau_s=(e_{s,1},\ldots,e_{s,L_s})$ where each event is defined as $e_{s,k} = (a_{s,k},i_{s,k},t_{s,k})$.
+\begin{itemize}
+\item $a_{s,k}$ \in $A$
+\end{itemize}
+$A = \{\text{page\_view}, \text{view\_item\_page}, \text{add\_item}\}$. % TODO: translate all from /home/velocitatem/Documents/Projects/PHANTOM/web/src/lib/events.ts into this latex
+Each interaction $i$ gives us some information about the willingness to pay ($v$) of a given customer, which we can try to estimate and measure against the true baseline.

-We consider a business across time during which we have an evolving vector $p_t \in \mathbb{R}^N$ where $N$ is the number of products in our catalogue. our price vector is directly dependent on a demand function $q_t$ which we define as a linear method of a price elasticity matrix $B_t$. This is the same setup that Microsoft created in their research.
+$$
+I(\tau) = \mathbb{E}[v \vert \tau] - \mathbb{E}[v]
+$$
+
+This lets us formalize the quality of our proxy $\hat{v}$ about the true $v$ from observing $\tau$ from any session $s$


 \subsection{Cost of Information Framework}

+
+The Cost of Information proposed in our research serves as proxy to understand and represent the complex interaction patterns between humans and agents. It is the expected markup a platform applies to a product from derived demand signals.
+
+$$
+COI(\tau) = \mathbb{E}[p(\tau)] - p_0
+$$
+
+Where the $p_0$ vector is both the initial state of the system and the base price for each product. We also define a pricing method at any time $t$ as $t: p_t \in \mathbb{R}_+^N$, satisfying a discrete cap $p \ in \mathbb{R}_+^N \vert \underline{p} \leq p \leq \overline{p}$ which act as our business constraints. We treat $p_t$ as the price vector shown to the an actor both experimentally and in-simulation.
+
 Mathematical demonstration and validation of the COI and citation backed evidence, and framework overview + show harm to user via other cost distortions. Maybe split into 3.2.1 (COI Theory) and 3.2.2 (Framework Design)

 \subsection{System Architecture}
@@ -73,11 +91,21 @@ Deep dive into how the algorithm works, different kinds and justification for ch

 \subsection{Reinforcement Learning Formulation}

-Rewards to consider:
-\begin{itemize}
-\item A formulation of how well the business is doing over a longer period of time in terms of sales and revenue lost, compared to expected revenue which would be generated by ground truth demand.
-\item
-\end{itemize}
+We define our surrogate commercial environment within which we can accurately control for all the variables such as the true demand, providing a clear transaparency of the entire system. We start with a product catalogue of size $N$ with random supply initialization per-product. At every step the commercial simulation recieves a price vector $p$ according to which we simulate a set of interactions $I$ with a certain proportion $l_a$ of agents contributing interactions. The interactions serve as a proxy to estimating the true demand $q(p)$ which is composed of two separate demand generators $q_A(p)$ and $q_H(p)$.
+On top of this our gym environment has a built demand estimator callback which is defined individually by each pricing engine. This engine is constructed to interact with the gym environment with the gyn environment at each step running a cycle via the comercial environment, creating an observation of all the interactions $I$ and a baseline vector which tells us the ground truth of demand, sales statistic and revenue. The engine is then responsible for learning the pricing policy prociding a pricing vector $p_{t+1}$ motivated by a per-episode summary reward composed by.
+
+$$
+R &= \text{revenue} - \text{COI} - \text{UX frinction index}
+$$
+
+
+As part of our reward engineering we want to take inot account the cost of information in our reward with a weight. Our pricing engine can be modeled by:
+$$
+\textmd{O}: X \to p_{t+1}
+$$
+
+where $X$
+

 How do we define the state space, action space and reward function breakdown and algorithm benchmarking.
 POSSIBLY: Expand into full subsections: 3.6.1 (State-Action Space), 3.6.2 (Reward Design), 3.6.3 (Benchmarking)