mirror of
https://github.com/velocitatem/PHANTOM.git
synced 2026-05-31 16:43:36 +00:00
noisy formulations
This commit is contained in:
@@ -2,13 +2,31 @@
|
||||
|
||||
\subsection{Problem Formalization}
|
||||
|
||||
Mathematical formalization of agent-induced pricing distortions. Formal definition of potential loss mechanisms $\alpha D$
|
||||
In a commercial setting we can collect behavioral data on any actors interactions within a platform we have control over. This collection is done through sessions such each session belongs to an actor class $Y_s \in \{H,A\}$ with randomized assignment. This lets us build a trajectory $\tau_s$ of observable interaction events $\tau_s=(e_{s,1},\ldots,e_{s,L_s})$ where each event is defined as $e_{s,k} = (a_{s,k},i_{s,k},t_{s,k})$.
|
||||
\begin{itemize}
|
||||
\item $a_{s,k}$ \in $A$
|
||||
\end{itemize}
|
||||
$A = \{\text{page\_view}, \text{view\_item\_page}, \text{add\_item}\}$. % TODO: translate all from /home/velocitatem/Documents/Projects/PHANTOM/web/src/lib/events.ts into this latex
|
||||
Each interaction $i$ gives us some information about the willingness to pay ($v$) of a given customer, which we can try to estimate and measure against the true baseline.
|
||||
|
||||
We consider a business across time during which we have an evolving vector $p_t \in \mathbb{R}^N$ where $N$ is the number of products in our catalogue. our price vector is directly dependent on a demand function $q_t$ which we define as a linear method of a price elasticity matrix $B_t$. This is the same setup that Microsoft created in their research.
|
||||
$$
|
||||
I(\tau) = \mathbb{E}[v \vert \tau] - \mathbb{E}[v]
|
||||
$$
|
||||
|
||||
This lets us formalize the quality of our proxy $\hat{v}$ about the true $v$ from observing $\tau$ from any session $s$
|
||||
|
||||
|
||||
\subsection{Cost of Information Framework}
|
||||
|
||||
|
||||
The Cost of Information proposed in our research serves as proxy to understand and represent the complex interaction patterns between humans and agents. It is the expected markup a platform applies to a product from derived demand signals.
|
||||
|
||||
$$
|
||||
COI(\tau) = \mathbb{E}[p(\tau)] - p_0
|
||||
$$
|
||||
|
||||
Where the $p_0$ vector is both the initial state of the system and the base price for each product. We also define a pricing method at any time $t$ as $t: p_t \in \mathbb{R}_+^N$, satisfying a discrete cap $p \ in \mathbb{R}_+^N \vert \underline{p} \leq p \leq \overline{p}$ which act as our business constraints. We treat $p_t$ as the price vector shown to the an actor both experimentally and in-simulation.
|
||||
|
||||
Mathematical demonstration and validation of the COI and citation backed evidence, and framework overview + show harm to user via other cost distortions. Maybe split into 3.2.1 (COI Theory) and 3.2.2 (Framework Design)
|
||||
|
||||
\subsection{System Architecture}
|
||||
@@ -73,11 +91,21 @@ Deep dive into how the algorithm works, different kinds and justification for ch
|
||||
|
||||
\subsection{Reinforcement Learning Formulation}
|
||||
|
||||
Rewards to consider:
|
||||
\begin{itemize}
|
||||
\item A formulation of how well the business is doing over a longer period of time in terms of sales and revenue lost, compared to expected revenue which would be generated by ground truth demand.
|
||||
\item
|
||||
\end{itemize}
|
||||
We define our surrogate commercial environment within which we can accurately control for all the variables such as the true demand, providing a clear transaparency of the entire system. We start with a product catalogue of size $N$ with random supply initialization per-product. At every step the commercial simulation recieves a price vector $p$ according to which we simulate a set of interactions $I$ with a certain proportion $l_a$ of agents contributing interactions. The interactions serve as a proxy to estimating the true demand $q(p)$ which is composed of two separate demand generators $q_A(p)$ and $q_H(p)$.
|
||||
On top of this our gym environment has a built demand estimator callback which is defined individually by each pricing engine. This engine is constructed to interact with the gym environment with the gyn environment at each step running a cycle via the comercial environment, creating an observation of all the interactions $I$ and a baseline vector which tells us the ground truth of demand, sales statistic and revenue. The engine is then responsible for learning the pricing policy prociding a pricing vector $p_{t+1}$ motivated by a per-episode summary reward composed by.
|
||||
|
||||
$$
|
||||
R &= \text{revenue} - \text{COI} - \text{UX frinction index}
|
||||
$$
|
||||
|
||||
|
||||
As part of our reward engineering we want to take inot account the cost of information in our reward with a weight. Our pricing engine can be modeled by:
|
||||
$$
|
||||
\textmd{O}: X \to p_{t+1}
|
||||
$$
|
||||
|
||||
where $X$
|
||||
|
||||
|
||||
How do we define the state space, action space and reward function breakdown and algorithm benchmarking.
|
||||
POSSIBLY: Expand into full subsections: 3.6.1 (State-Action Space), 3.6.2 (Reward Design), 3.6.3 (Benchmarking)
|
||||
|
||||
Reference in New Issue
Block a user