diff --git a/paper/src/auto/main.el b/paper/src/auto/main.el index 3e186b1..517411a 100644 --- a/paper/src/auto/main.el +++ b/paper/src/auto/main.el @@ -6,7 +6,7 @@ (setq TeX-command-extra-options "-file-line-error -interaction=nonstopmode") (TeX-add-to-alist 'LaTeX-provided-class-options - '(("report" "12pt") ("article" "12pt") ("acmart" "sigconf" "nonacm" "natbib=false"))) + '(("report" "12pt") ("article" "12pt") ("acmart" "sigconf" "nonacm" "natbib=false" "manuscript"))) (TeX-run-style-hooks "latex2e" "preamble" diff --git a/paper/src/chapters/03-methodology.tex b/paper/src/chapters/03-methodology.tex index fc5d8f5..ae48c3a 100644 --- a/paper/src/chapters/03-methodology.tex +++ b/paper/src/chapters/03-methodology.tex @@ -7,8 +7,6 @@ Mathematical formalization of agent-induced pricing distortions. Formal definiti We consider a business across time during which we have an evolving vector $p_t \in \mathbb{R}^N$ where $N$ is the number of products in our catalogue. our price vector is directly dependent on a demand function $q_t$ which we define as a linear method of a price elasticity matrix $B_t$. This is the same setup that Microsoft created in their research. - - \subsection{Cost of Information Framework} Mathematical demonstration and validation of the COI and citation backed evidence, and framework overview + show harm to user via other cost distortions. Maybe split into 3.2.1 (COI Theory) and 3.2.2 (Framework Design) @@ -43,6 +41,9 @@ The experimentation begins with the design of goals, with careful consideration The purpose of this effort to gather data on interactions, is the first half of our research. With this collected data on behavioral characteristics, enhanced by our feature augmentation, we can create distribution separation into two bins $y \in \{A,H\}$ with a certain probability $p$ dependent on the session-specific features. To adddres the second loop of our system, we use this gained capability of discrimination to enhance the learner design involved in our surrogate dynamic pricing task which simulates an independent dynamic pricing scenario under which we can train a more controlled policy with the ability to account for true demand signals under conditions of contamination from non-human actors. + +Our approach can be well summarized by a three-stage division, first we intend to observe and \textit{vectorize} the behavioral interaction data from our experiments, we then develop the separability which helps us deepen the semantic understanding of the behavioral patterns. Finally we use our newly gained learner to leverage a defensive mechanism within the simulation stage of a controled dynamic pricing loop. + \begin{figure}[ht] \resizebox{\columnwidth}{!}{% \input{chapters/loop_figure.tex} @@ -71,6 +72,13 @@ On the other hand, a more lax system without detection (myopic) defines the lowe Deep dive into how the algorithm works, different kinds and justification for chosen appraoches + agent impact modeling and quantification. \subsection{Reinforcement Learning Formulation} + +Rewards to consider: +\begin{itemize} +\item A formulation of how well the business is doing over a longer period of time in terms of sales and revenue lost, compared to expected revenue which would be generated by ground truth demand. +\item +\end{itemize} + How do we define the state space, action space and reward function breakdown and algorithm benchmarking. POSSIBLY: Expand into full subsections: 3.6.1 (State-Action Space), 3.6.2 (Reward Design), 3.6.3 (Benchmarking) diff --git a/paper/src/main.tex b/paper/src/main.tex index 7098e97..1790f56 100644 --- a/paper/src/main.tex +++ b/paper/src/main.tex @@ -36,10 +36,6 @@ The primary objective of this thesis is to develop and validate pricing heuristi \maketitle -\begin{acks} - Eugene Bykovets, PhD - ETH for helping with problem formulation \\ - Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC). -\end{acks} \input{chapters/01-intro} \input{chapters/02-literature-review} @@ -49,11 +45,21 @@ The primary objective of this thesis is to develop and validate pricing heuristi \input{chapters/06-conclusion} +\begin{acks} + Eugene Bykovets, PhD - ETH for helping with problem formulation \\ + Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC). +\end{acks} + \printbibliography \clearpage \onecolumn \appendix +\section{Terminology} +\begin{description} +\item[Agent $a$] An actor of non-human nature, powered by an LLM. +\item[Human $h$] An individual human with some job to be done. +\end{description} \input{../build/concatenated_code} \end{document}