more on revelatin

This commit is contained in:
2026-04-09 10:29:38 +02:00
parent 02328b20f2
commit 835e10d6ef
4 changed files with 7 additions and 14 deletions

View File

@@ -22,6 +22,7 @@
(LaTeX-add-labels
"app:compute_budget"
"tab:compute_derivation"
"app:kl_zeros"))
"app:kl_zeros"
"app:revelation_log"))
:latex)

View File

@@ -478,7 +478,7 @@ In practice, we parameterize this with a session-level leakage term:
\begin{equation}
\text{COI}_{\text{leak}}(p,\tau') = f(\tau')\cdot \text{InfoValue}(p,\tau')
\end{equation}
where $f(\tau')$ is the weak agent probability and $\text{InfoValue}$ is implemented either as a constant query-tax surrogate or as a revelation surrogate $-\log\pi(p\mid\tau')$. In the latter case, leakage is \emph{contamination-weighted surprisal}: $f(\tau')$ scales how much we treat the session as agentic, and $-\log\pi(p\mid\tau')$ scores how unexpected the realized quote is under the policy's own distribution over prices. Appendix~\ref{app:revelation_log} records why the logarithm is the conventional choice for that second factor.
where $f(\tau')$ is the weak agent probability and $\text{InfoValue}$ is implemented either as a constant query-tax surrogate or as a revelation surrogate $-\log\pi(p\mid\tau')$. This is the surprise of the probability of a certain price-setting probability. Essentially, we proxy the leakage term as a surprise of the price our policy is setting, weighted by the contamination estimate. Appendix~\ref{app:revelation_log} expands on why the logarithm is used in the revelation surrogate.
The inner minimization selects the contamination candidate that makes the penalized reward smallest, so the outer policy update faces the worst plausible leakage scenario inside the ambiguity set rather than an average case.

View File

@@ -94,10 +94,8 @@ In code we do the boring fix: add a tiny floor $\varepsilon$ to both the numerat
\section{Why the logarithm appears in the revelation surrogate}
\label{app:revelation_log}
Recall that $\text{COI}_{\text{leak}}(p,\tau') = f(\tau')\cdot\text{InfoValue}(p,\tau')$. The query-tax surrogate fixes $\text{InfoValue}$ to a positive constant: every suspected reconnaissance quote is penalized equally, which tracks the erosion story where independent query volume drives COI to zero. The revelation surrogate instead sets $\text{InfoValue}(p,\tau') = -\log \pi(p\mid\tau')$, where $\pi(\cdot\mid\tau')$ is the pricing policy's distribution over quoted prices in context $\tau'$ (after whatever discretization or binning the engine uses).
$\text{COI}_{\text{leak}} = f(\tau')\cdot\text{InfoValue}$. Either $\text{InfoValue}=c>0$ (query-tax) or $\text{InfoValue}=-\log\pi(p\mid\tau')$ (revelation), with $\pi(\cdot\mid\tau')$ the policy over quoted prices in context $\tau'$.
For an outcome with probability $q$, the quantity $-\log q$ is \emph{surprisal}: likely draws are unsurprising, rare draws are highly surprising. That matches the informal ``surprise'' people talk about in recommender systems when they formalize novelty as low predicted probability---here the model is our own policy. The log is the standard information-theoretic way to turn ``how probable was this draw?'' into a penalty that grows sharply in the tails. In the reconnaissance reading, a price from a thin slice of the policy's support is more identifying than a typical quote.
So the revelation form is \emph{contamination-weighted surprisal}: $f(\tau')$ scales how agent-like we judge the session, and $-\log\pi(p\mid\tau')$ scales how informative that price is relative to $\pi(\cdot\mid\tau')$. In code you still floor $\pi(p\mid\tau')$ away from zero so tail bins do not explode the penalty, same spirit as Appendix~\ref{app:kl_zeros}.
For probability $q$, $-\log q$ is surprisal; for independent events, $-\log\prod_i q_i=\sum_i(-\log q_i)$. The revelation surrogate is that surprisal under $\pi(\cdot\mid\tau')$, scaled by $f(\tau')$. Use $\max\{\pi,\varepsilon\}$ so the term stays finite (cf.\ Appendix~\ref{app:kl_zeros}).
\end{document}

View File

@@ -126,15 +126,9 @@ In code we do the basic fix: add a tiny floor $\varepsilon$ to both the numerato
\section{Why the logarithm appears in the revelation surrogate}
\label{app:revelation_log}
Recall that $\text{COI}_{\text{leak}}(p,\tau') = f(\tau')\cdot\text{InfoValue}(p,\tau')$. The query-tax surrogate fixes $\text{InfoValue}$ to a positive constant: every suspected reconnaissance quote is penalized equally, which tracks the erosion theorem where independent query volume drives COI to zero. The revelation surrogate instead sets
\begin{equation}
\text{InfoValue}(p,\tau') = -\log \pi(p\mid\tau'),
\end{equation}
where $\pi(\cdot\mid\tau')$ is the pricing policy's distribution over quoted prices in context $\tau'$ (after whatever discretization or binning the engine uses).
Leakage is $\text{COI}_{\text{leak}} = f(\tau')\cdot\text{InfoValue}$. The query-tax form fixes $\text{InfoValue}=c>0$. The revelation form sets $\text{InfoValue}(p,\tau')=-\log\pi(p\mid\tau')$, with $\pi(\cdot\mid\tau')$ the policy distribution over quoted prices in context $\tau'$ (discretized as in the engine).
For an outcome that occurs with probability $q$, the quantity $-\log q$ is the usual \emph{surprisal}: likely draws have small surprisal, rare draws have large surprisal. That is the same ``surprise'' people import into recommender systems when they formalize novelty as low predicted probability under a model---here the model is our own policy. The log is not decorative: it is the standard information-theoretic coding of ``how unexpected was this draw under $\pi$?'' In the reconnaissance reading, a quote from a thin slice of the policy's support is more identifying than a modal quote, because it pins down what the rule is willing to do in places where little mass sits.
Put together, the revelation form is \emph{contamination-weighted surprisal}: $f(\tau')$ scales how agent-like we judge the session, and $-\log\pi(p\mid\tau')$ scales how informative that realized price is relative to $\pi(\cdot\mid\tau')$. In implementation you still floor $\pi(p\mid\tau')$ away from zero so tail bins do not explode the penalty---the same honesty as Appendix~\ref{app:kl_zeros}: we use a regularized surrogate, not a literal infinite penalty.
For an outcome with probability $q$, the quantity $-\log q$ is \emph{surprisal}. For independent events, $-\log\prod_i q_i=\sum_i(-\log q_i)$. The revelation term is surprisal under $X\sim\pi(\cdot\mid\tau')$, multiplied by $f(\tau')$. In practice we do $\max\{\pi,\varepsilon\}$ in place of $\pi$ so the log stays finite (same spirit as Appendix~\ref{app:kl_zeros}).
% \input{../build/concatenated_code}