noisy formulations

This commit is contained in:
2025-12-15 19:38:25 +01:00
parent b5461b9277
commit 5fb5f5b858

View File

@@ -2,13 +2,31 @@
\subsection{Problem Formalization} \subsection{Problem Formalization}
Mathematical formalization of agent-induced pricing distortions. Formal definition of potential loss mechanisms $\alpha D$ In a commercial setting we can collect behavioral data on any actors interactions within a platform we have control over. This collection is done through sessions such each session belongs to an actor class $Y_s \in \{H,A\}$ with randomized assignment. This lets us build a trajectory $\tau_s$ of observable interaction events $\tau_s=(e_{s,1},\ldots,e_{s,L_s})$ where each event is defined as $e_{s,k} = (a_{s,k},i_{s,k},t_{s,k})$.
\begin{itemize}
\item $a_{s,k}$ \in $A$
\end{itemize}
$A = \{\text{page\_view}, \text{view\_item\_page}, \text{add\_item}\}$. % TODO: translate all from /home/velocitatem/Documents/Projects/PHANTOM/web/src/lib/events.ts into this latex
Each interaction $i$ gives us some information about the willingness to pay ($v$) of a given customer, which we can try to estimate and measure against the true baseline.
We consider a business across time during which we have an evolving vector $p_t \in \mathbb{R}^N$ where $N$ is the number of products in our catalogue. our price vector is directly dependent on a demand function $q_t$ which we define as a linear method of a price elasticity matrix $B_t$. This is the same setup that Microsoft created in their research. $$
I(\tau) = \mathbb{E}[v \vert \tau] - \mathbb{E}[v]
$$
This lets us formalize the quality of our proxy $\hat{v}$ about the true $v$ from observing $\tau$ from any session $s$
\subsection{Cost of Information Framework} \subsection{Cost of Information Framework}
The Cost of Information proposed in our research serves as proxy to understand and represent the complex interaction patterns between humans and agents. It is the expected markup a platform applies to a product from derived demand signals.
$$
COI(\tau) = \mathbb{E}[p(\tau)] - p_0
$$
Where the $p_0$ vector is both the initial state of the system and the base price for each product. We also define a pricing method at any time $t$ as $t: p_t \in \mathbb{R}_+^N$, satisfying a discrete cap $p \ in \mathbb{R}_+^N \vert \underline{p} \leq p \leq \overline{p}$ which act as our business constraints. We treat $p_t$ as the price vector shown to the an actor both experimentally and in-simulation.
Mathematical demonstration and validation of the COI and citation backed evidence, and framework overview + show harm to user via other cost distortions. Maybe split into 3.2.1 (COI Theory) and 3.2.2 (Framework Design) Mathematical demonstration and validation of the COI and citation backed evidence, and framework overview + show harm to user via other cost distortions. Maybe split into 3.2.1 (COI Theory) and 3.2.2 (Framework Design)
\subsection{System Architecture} \subsection{System Architecture}
@@ -73,11 +91,21 @@ Deep dive into how the algorithm works, different kinds and justification for ch
\subsection{Reinforcement Learning Formulation} \subsection{Reinforcement Learning Formulation}
Rewards to consider: We define our surrogate commercial environment within which we can accurately control for all the variables such as the true demand, providing a clear transaparency of the entire system. We start with a product catalogue of size $N$ with random supply initialization per-product. At every step the commercial simulation recieves a price vector $p$ according to which we simulate a set of interactions $I$ with a certain proportion $l_a$ of agents contributing interactions. The interactions serve as a proxy to estimating the true demand $q(p)$ which is composed of two separate demand generators $q_A(p)$ and $q_H(p)$.
\begin{itemize} On top of this our gym environment has a built demand estimator callback which is defined individually by each pricing engine. This engine is constructed to interact with the gym environment with the gyn environment at each step running a cycle via the comercial environment, creating an observation of all the interactions $I$ and a baseline vector which tells us the ground truth of demand, sales statistic and revenue. The engine is then responsible for learning the pricing policy prociding a pricing vector $p_{t+1}$ motivated by a per-episode summary reward composed by.
\item A formulation of how well the business is doing over a longer period of time in terms of sales and revenue lost, compared to expected revenue which would be generated by ground truth demand.
\item $$
\end{itemize} R &= \text{revenue} - \text{COI} - \text{UX frinction index}
$$
As part of our reward engineering we want to take inot account the cost of information in our reward with a weight. Our pricing engine can be modeled by:
$$
\textmd{O}: X \to p_{t+1}
$$
where $X$
How do we define the state space, action space and reward function breakdown and algorithm benchmarking. How do we define the state space, action space and reward function breakdown and algorithm benchmarking.
POSSIBLY: Expand into full subsections: 3.6.1 (State-Action Space), 3.6.2 (Reward Design), 3.6.3 (Benchmarking) POSSIBLY: Expand into full subsections: 3.6.1 (State-Action Space), 3.6.2 (Reward Design), 3.6.3 (Benchmarking)