chore: refactoring, proper citation and updating on data and refs and apendices

2026-07-16 01:53:37 +00:00 · 2026-03-15 21:15:23 +01:00
parent 0521a63937
commit 375445f260
5 changed files with 82 additions and 118 deletions
--- a/paper/src/chapters/03-methodology.tex
+++ b/paper/src/chapters/03-methodology.tex
@@ -246,7 +246,8 @@ v4 & 64 (32 + 32) & us-central2-b & 32 Spot + 32 On-demand \\
 \end{tabular}
 \end{table}

-For connections from Madrid, we prioritize the europe-west4 allocation for latency-sensitive runs with the benefit of having the most grouped chips within a single region. This regional grouping is important for the deployment of our Kubernetes cluster which cannot span multiple regions. All sweep metadata, model checkpoints, and reward traces are logged in Weights \& Biases. Hardware specifications are from the official Google Cloud TPU documentation \parencite{noauthor_tpu_2026,noauthor_tpu_2025-1,noauthor_tpu_2025}.
+For connections from Madrid, we prioritize the europe-west4 allocation for latency-sensitive runs with the benefit of having the most grouped chips within a single region. This regional grouping is important for the deployment of our Kubernetes cluster which cannot span multiple regions. All sweep metadata, model checkpoints, and reward traces are logged in Weights \& Biases. % TODO: cite this (from bib)
+Hardware specifications are from the official Google Cloud TPU documentation \parencite{noauthor_tpu_2026,noauthor_tpu_2025-1,noauthor_tpu_2025}.

 Design of training processes: we build docker image with the fact in mind of different caching over layers in order to most speed up docker re-building and such we place the most volatile steps towards the end of the image building. What is means in practice is that any dependency installations are isolated so edits to source code do no trigger rebuilds. Only if we update our entry point of training a sweep, Docker will also rebuild the source-code copy stage.

@@ -388,8 +389,10 @@ The complete pricing-demand-trajectory loop is illustrated in Figure~\ref{fig:or

 \begin{figure}[ht]
 \centering
-\[
-\text{Oracle}(\vec{p}_{t-1},\vec{\hat{q}})\to
+{\setlength{\arraycolsep}{4pt}%
+\resizebox{0.98\linewidth}{!}{$
+\begin{aligned}
+&\text{Oracle}(\vec{p}_{t-1},\vec{\hat{q}})\to
 \begin{pmatrix}
 p_0\\
 p_1\\
@@ -398,14 +401,15 @@ p_N
 \end{pmatrix}
 \underrightarrow{d_i \sim \mathcal{N}_{\vec{p}}}
 \begin{pmatrix}d_0\\ d_1\\ \cdots \\ d_N\end{pmatrix}
-\underrightarrow{\vec{d}\times \tau_\theta \to \tau^\prime}
+\underrightarrow{\vec{d}\otimes \tau_\theta}
 \begin{bmatrix}
 0.01 & 0.02 & \cdots & 0.3 \\
 0.41 & 0.24 & \cdots & 0.0 \\
 \cdots & \cdots & \cdots & \cdots \\
 0.51 & 0.09 & \cdots & 0.1 \\
 \end{bmatrix}
-\underrightarrow{\tau_k \sim \tau^\prime}
+\\
+&\underrightarrow{\tau_k \sim \tau^\prime}
 \{\tau_k\}_{k=0}^K \to \hat{Q}(\tau_k)
 \to \begin{pmatrix}
 \hat{q}_0 \\
@@ -414,8 +418,10 @@ p_N
 \hat{q}_N \\
 \end{pmatrix}
 \to \text{Oracle}(\cdot)
-\]
-\caption{Oracle-based pricing loop: historical price and demand state map to a new price vector; each product samples demand curves from $\mathcal{N}_{\vec{p}}$; trajectories are generated by mixing demand with behavioral kernels $\tau_\theta$ into transition matrix $\tau'$; sampled trajectories $\{\tau_k\}$ aggregate through proxy $Q(\cdot)$ to yield updated demand $\vec{\hat{q}}$, closing the feedback loop.}
+\end{aligned}
+$}%
+}
+\caption{Oracle-based pricing loop: historical price and demand state map to a new price vector; each product samples demand curves from $\mathcal{N}_{\vec{p}}$; trajectories are generated via the Kronecker product $\vec{d}\otimes\tau_\theta$ into transition matrix $\tau'$; sampled trajectories $\{\tau_k\}$ aggregate through proxy $Q(\cdot)$ to yield updated demand $\vec{\hat{q}}$, closing the feedback loop.}
 \label{fig:oracle_flow}
 \end{figure}

@@ -498,7 +504,7 @@ The algorithm operates in discrete epochs indexed by $t$. At each epoch, the pla

 \subsection{Parallelization Strategy}

-To avoid preemption of compute mid-training we settle on using a v4 generation, 40 chip compute node with 5 parallel workers. The login node creates an orchestration node with Ray and we distribute ray compute nodes per each other worker.
+To avoid preemption of compute mid-training we settle on using a v4 generation, 40 chip compute node with 5 parallel workers. The login node creates an orchestration node with Ray \parencite{moritz_ray_2018} and we distribute ray compute nodes per each other worker.

 \subsubsection{Computational Cost Analysis of the Simulation Step}
 The per-step cost of Algorithm~\ref{alg:phantom_loop_clean} is not uniform across its components. To inform hardware provisioning and to identify where algorithmic improvements are most impactful, we profile the hot path of the engine using Python's \texttt{cProfile} instrumentation over 20 environment steps under two configurations: a baseline with the robustness inner loop disabled ($K=1$, $\epsilon_\alpha=0$) and a standard robust setting ($K=5$, $\epsilon_\alpha=0.2$). Both runs use $M=10$ sessions per market call and $N=3$ products.