class separaiblity significance

This commit is contained in:
2026-02-28 21:38:46 +01:00
parent 8f20359c8c
commit 233ce3be34
5 changed files with 285 additions and 57 deletions

View File

@@ -297,8 +297,13 @@ To train a robust pricing learner, we need a simulator that can generate realist
\subsubsection{Ground-Truth Separability}
Because sessions are collected under controlled experimental conditions where each actor is assigned a known type at the start of the trial, labels $y_s \in \{H, A\}$ are available as ground truth rather than as the output of a heuristic classifier. We therefore estimate separate transition kernels directly from each labeled partition $\mathcal{D}_H$ and $\mathcal{D}_A$, treating the resulting $\hat{\mathcal{T}}_H$ and $\hat{\mathcal{T}}_A$ as the ground-truth behavioral profiles for each class. We then ask a direct methodological question: are the kernels separable enough to justify downstream pricing control that depends on that separability?
To answer this, we compute average KL divergence between transition probability matrices. This statistic gives global separability and event-level diagnostics at the same time. In our balanced dataset (50\% human, 50\% agent), the average divergence is approximately $1.8$. To contextualize this divergence metric we compare with an intra-class comparison baseline of randomly selected transitions.
% To contextualize this figure a useful intra-class baseline is to randomly split D_H into two equal halves, estimate a kernel from each half, compute the same average KL statistic, and repeat for B bootstrap samples (e.g. B=100). The resulting null distribution (mean +/- std) gives the divergence expected purely from estimation noise at this sample size. A between-class KL substantially above this null confirms the separation is real and not a finite-sample artefact. In practice: for each of B splits, partition D_H 50/50 without replacement, run build_kernel() on each half, average the per-state KL values, and collect the B scores into a reference distribution to compare against the 1.8 figure.
To answer this, we compute average KL divergence between transition probability matrices. This statistic gives global separability and event-level diagnostics at the same time. To test whether the observed between-class value exceeds finite-sample estimation noise, we compute an intra-class bootstrap baseline by repeatedly splitting $\mathcal{D}_H$ and $\mathcal{D}_A$ into two random halves, fitting a transition kernel on each half, and re-computing the same average KL statistic for each split.
Formally, for $B$ bootstrap splits per class we obtain reference samples $\{d_{H,b}^{\text{intra}}\}_{b=1}^B$ and $\{d_{A,b}^{\text{intra}}\}_{b=1}^B$, then compare the between-class divergence $d^{\text{inter}}$ against the pooled null distribution. We report pooled mean and variance, lift ratio $d^{\text{inter}}/\mathbb{E}[d^{\text{intra}}]$, and the empirical one-sided p-value
\begin{equation}
\hat p = \frac{1 + \sum_{j=1}^{2B}\mathbf{1}\{d_j^{\text{intra}} \ge d^{\text{inter}}\}}{2B + 1},
\end{equation}
which gives a direct significance check for separability before using divergence-derived control signals in pricing.
\begin{definition}[Kullback-Leibler Divergence for Transition Distributions]
Let $P_e$ and $Q_e$ be categorical distributions over destination states following event $e$, derived from human and agent trajectories respectively. The KL divergence between these distributions is:

View File

@@ -8,15 +8,48 @@
\subsection{Behavioral Analysis}
Include markov chains of transition matrices, compare distributions (look at Divergence metrics)
The transition-kernel analysis is evaluated with both between-class divergence and an intra-class bootstrap null baseline. This allows us to separate real behavioral differences from finite-sample estimation noise.
\begin{table}[ht]
\centering
\caption{Divergence significance using intra-class bootstrap baseline (B=100 per class).}
\label{tab:divergence_significance}
\begin{tabular}{lcccc}
\toprule
Metric & Mean KL & Std & 5\% quantile & 95\% quantile \\
\midrule
Between-class (Human vs Agent) & 5.3067 & -- & -- & -- \\
Human intra-class split & 2.5271 & 1.2501 & 0.6845 & 4.6015 \\
Agent intra-class split & 1.2065 & 1.2607 & 0.2177 & 4.2345 \\
\bottomrule
\end{tabular}
\end{table}
For this run ($n_H=11$, $n_A=7$, $B=100$), the pooled lift ratio is $2.84\times$ and the empirical one-sided p-value is $0.0149$, both computed as defined in Section~\ref{sec:tpe}. This places the between-class divergence clearly above the intra-class null and supports the use of divergence-derived contamination signals in downstream pricing control.
\subsection{Experimental Outcomes}
Align with defined objectives, show results and statistical significance (or not).
To evaluate robustness contributions, we compare two policies on the same environment family: (i) robust pricing with COI-aware reward and adversarial contamination step, and (ii) non-robust baseline with revenue-only reward (\texttt{--no-robust}).
\begin{table}[ht]
\centering
\caption{Pricing policy benchmark for robust vs non-robust training.}
\label{tab:pricing_benchmark}
\begin{tabular}{lcccc}
\toprule
Policy & Eval reward & Eval revenue & COI leakage & Margin collapse rate \\
\midrule
Robust policy & \textit{TBD} & \textit{TBD} & \textit{TBD} & \textit{TBD} \\
Non-robust baseline (\texttt{--no-robust}) & \textit{TBD} & \textit{TBD} & \textit{TBD} & \textit{TBD} \\
\bottomrule
\end{tabular}
\end{table}
This comparison isolates the effect of robustness terms from model capacity and optimization settings, and provides the benchmark needed for interpreting the value of COI-aware control.
\subsection{Interpretation and Insights}
Inference from given patterns and show key findings.
Between-class divergence substantially above the intra-class null indicates that the two actor classes are behaviorally separable at the transition-kernel level. In pricing experiments, this is the condition required for separability to act as a useful control signal rather than just an auxiliary classifier score.
\subsection{Anomalies}