From d36a34ead97b1c61960be68b9e30ad5559c663cb Mon Sep 17 00:00:00 2001 From: Daniel Rosel Date: Fri, 10 Apr 2026 11:29:21 +0200 Subject: [PATCH] updating appendix --- paper/src/chapters/mdp_agent.pdf | Bin 10932 -> 10932 bytes paper/src/chapters/mdp_human.pdf | Bin 11953 -> 11953 bytes paper/src/main.tex | 2 +- 3 files changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/src/chapters/mdp_agent.pdf b/paper/src/chapters/mdp_agent.pdf index 83e10d3e525abf81b3b745ea82e4aff20b7db528..37df0f5cfd8f533a526b80a5be8c54a7c38386d4 100644 GIT binary patch delta 284 zcmV+%0ptF(RkT&GgeZS{in)|J6l|^HI6s&i;y@^+5R%+V4?+=+p;$(eOY`=XoH(WE zv>!W~8J55q6_{}*Kr>=A$8=t>vP5>eb{g4U*4i-u*;{jV_<#V%0iOe^Oy7DCTdxPj z-kNi-v_+z1V+|;!MVceIc;C5ImJPJFvb!|OrN-8i$?Y4frEh;syOmxe*{%c^1<#2S zi6!MPTs(6=8BP9T;VD>0{a#ufbHro=c-QAfZp=@J%fl0kd6AaKYZik&rm zhq8=AoNq1z9nmpRD_hq^H+3EHE4=OXI8|Y5tG^qi54%l-A$8;j?M!c#c9=dqk3p#X iExZeN_*bx!UVaam`ra?x;9f5*{Q~zGZ$-0_DI)<3`-J`g delta 284 zcmV+%0ptF(RkT&GgeZT0in){~6l|^HI6s&iVnZmUV3OQQ4?+=+p;$(eOY`=X>^P<9 zv>!W~8J55z%CKY+2Q4Y06&9;3%5x;=Tc?mzqEVkrXy{P>@Tgdi8g=DYSPCWz%!{0rC9?NxN63zOSFi$tr&8i-V^(gl)>_nlj3*+6Tnx=W*7YHWW!n~mRCQob?mUU@>Y zUrR1?z7SjpS%~=)myev!CX>HdcnH=}e^i!Yj+kr!Z&WpEYkopp9-e&8SCIL#xW{gN z5I8LEii0(Mhq8=AoJW^|PUsw{Rjuoyo4!u?1>W{FO}ntQr|(A7>9CD3q>lVV&h%E3 p!~E`f4npn6@GchI;a|a>^6Jafw|?OUw|Zge7x&h0N3)MDA_20>kHi1~ diff --git a/paper/src/main.tex b/paper/src/main.tex index 2959d04..9bf7f99 100644 --- a/paper/src/main.tex +++ b/paper/src/main.tex @@ -125,7 +125,7 @@ The textbook definition $D_{\mathrm{KL}}(P\parallel Q)=\sum_k P(k)\log(P(k)/Q(k) In code we do the basic fix: add a tiny floor $\varepsilon$ to both the numerator and denominator inside the log so nothing is exactly zero, which turns the sum into a finite, smoothed surrogate rather than a literal KL to raw counts. We also skip source states that do not exist at all in the reference kernel, because there is nowhere honest to compare against. This keeps the pipeline running and the divergence scores on a comparable scale, at the cost that the number is regularized KL behavior, not a purist information-theoretic quantity, which is acceptable here because we only use the gap between human-anchored and agent-anchored scores as a weak separability signal. -\section{Why the logarithm appears in the revelation surrogate} +\section{Expanding the Intuition of Information Value in the Reward} \label{app:revelation_log} Leakage is $\text{COI}_{\text{leak}} = f(\tau')\cdot\text{InfoValue}$. The query-tax form fixes $\text{InfoValue}=c>0$. The revelation form sets $\text{InfoValue}(p,\tau')=-\log\pi(p\mid\tau')$, with $\pi(\cdot\mid\tau')$ the policy distribution over quoted prices in context $\tau'$ (discretized as in the engine).