mirror of
https://github.com/velocitatem/PHANTOM.git
synced 2026-05-31 16:43:36 +00:00
class separaiblity significance
This commit is contained in:
@@ -297,8 +297,13 @@ To train a robust pricing learner, we need a simulator that can generate realist
|
||||
\subsubsection{Ground-Truth Separability}
|
||||
Because sessions are collected under controlled experimental conditions where each actor is assigned a known type at the start of the trial, labels $y_s \in \{H, A\}$ are available as ground truth rather than as the output of a heuristic classifier. We therefore estimate separate transition kernels directly from each labeled partition $\mathcal{D}_H$ and $\mathcal{D}_A$, treating the resulting $\hat{\mathcal{T}}_H$ and $\hat{\mathcal{T}}_A$ as the ground-truth behavioral profiles for each class. We then ask a direct methodological question: are the kernels separable enough to justify downstream pricing control that depends on that separability?
|
||||
|
||||
To answer this, we compute average KL divergence between transition probability matrices. This statistic gives global separability and event-level diagnostics at the same time. In our balanced dataset (50\% human, 50\% agent), the average divergence is approximately $1.8$. To contextualize this divergence metric we compare with an intra-class comparison baseline of randomly selected transitions.
|
||||
% To contextualize this figure a useful intra-class baseline is to randomly split D_H into two equal halves, estimate a kernel from each half, compute the same average KL statistic, and repeat for B bootstrap samples (e.g. B=100). The resulting null distribution (mean +/- std) gives the divergence expected purely from estimation noise at this sample size. A between-class KL substantially above this null confirms the separation is real and not a finite-sample artefact. In practice: for each of B splits, partition D_H 50/50 without replacement, run build_kernel() on each half, average the per-state KL values, and collect the B scores into a reference distribution to compare against the 1.8 figure.
|
||||
To answer this, we compute average KL divergence between transition probability matrices. This statistic gives global separability and event-level diagnostics at the same time. To test whether the observed between-class value exceeds finite-sample estimation noise, we compute an intra-class bootstrap baseline by repeatedly splitting $\mathcal{D}_H$ and $\mathcal{D}_A$ into two random halves, fitting a transition kernel on each half, and re-computing the same average KL statistic for each split.
|
||||
|
||||
Formally, for $B$ bootstrap splits per class we obtain reference samples $\{d_{H,b}^{\text{intra}}\}_{b=1}^B$ and $\{d_{A,b}^{\text{intra}}\}_{b=1}^B$, then compare the between-class divergence $d^{\text{inter}}$ against the pooled null distribution. We report pooled mean and variance, lift ratio $d^{\text{inter}}/\mathbb{E}[d^{\text{intra}}]$, and the empirical one-sided p-value
|
||||
\begin{equation}
|
||||
\hat p = \frac{1 + \sum_{j=1}^{2B}\mathbf{1}\{d_j^{\text{intra}} \ge d^{\text{inter}}\}}{2B + 1},
|
||||
\end{equation}
|
||||
which gives a direct significance check for separability before using divergence-derived control signals in pricing.
|
||||
|
||||
\begin{definition}[Kullback-Leibler Divergence for Transition Distributions]
|
||||
Let $P_e$ and $Q_e$ be categorical distributions over destination states following event $e$, derived from human and agent trajectories respectively. The KL divergence between these distributions is:
|
||||
|
||||
@@ -8,15 +8,48 @@
|
||||
|
||||
\subsection{Behavioral Analysis}
|
||||
|
||||
Include markov chains of transition matrices, compare distributions (look at Divergence metrics)
|
||||
The transition-kernel analysis is evaluated with both between-class divergence and an intra-class bootstrap null baseline. This allows us to separate real behavioral differences from finite-sample estimation noise.
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Divergence significance using intra-class bootstrap baseline (B=100 per class).}
|
||||
\label{tab:divergence_significance}
|
||||
\begin{tabular}{lcccc}
|
||||
\toprule
|
||||
Metric & Mean KL & Std & 5\% quantile & 95\% quantile \\
|
||||
\midrule
|
||||
Between-class (Human vs Agent) & 5.3067 & -- & -- & -- \\
|
||||
Human intra-class split & 2.5271 & 1.2501 & 0.6845 & 4.6015 \\
|
||||
Agent intra-class split & 1.2065 & 1.2607 & 0.2177 & 4.2345 \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
For this run ($n_H=11$, $n_A=7$, $B=100$), the pooled lift ratio is $2.84\times$ and the empirical one-sided p-value is $0.0149$, both computed as defined in Section~\ref{sec:tpe}. This places the between-class divergence clearly above the intra-class null and supports the use of divergence-derived contamination signals in downstream pricing control.
|
||||
|
||||
|
||||
\subsection{Experimental Outcomes}
|
||||
|
||||
Align with defined objectives, show results and statistical significance (or not).
|
||||
To evaluate robustness contributions, we compare two policies on the same environment family: (i) robust pricing with COI-aware reward and adversarial contamination step, and (ii) non-robust baseline with revenue-only reward (\texttt{--no-robust}).
|
||||
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\caption{Pricing policy benchmark for robust vs non-robust training.}
|
||||
\label{tab:pricing_benchmark}
|
||||
\begin{tabular}{lcccc}
|
||||
\toprule
|
||||
Policy & Eval reward & Eval revenue & COI leakage & Margin collapse rate \\
|
||||
\midrule
|
||||
Robust policy & \textit{TBD} & \textit{TBD} & \textit{TBD} & \textit{TBD} \\
|
||||
Non-robust baseline (\texttt{--no-robust}) & \textit{TBD} & \textit{TBD} & \textit{TBD} & \textit{TBD} \\
|
||||
\bottomrule
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
|
||||
This comparison isolates the effect of robustness terms from model capacity and optimization settings, and provides the benchmark needed for interpreting the value of COI-aware control.
|
||||
|
||||
|
||||
\subsection{Interpretation and Insights}
|
||||
Inference from given patterns and show key findings.
|
||||
Between-class divergence substantially above the intra-class null indicates that the two actor classes are behaviorally separable at the transition-kernel level. In pricing experiments, this is the condition required for separability to act as a useful control signal rather than just an auxiliary classifier score.
|
||||
|
||||
\subsection{Anomalies}
|
||||
|
||||
Reference in New Issue
Block a user