From d36a34ead97b1c61960be68b9e30ad5559c663cb Mon Sep 17 00:00:00 2001
From: Daniel Rosel <daniel@alves.world>
Date: Fri, 10 Apr 2026 11:29:21 +0200
Subject: [PATCH] updating appendix

---
 paper/src/chapters/mdp_agent.pdf | Bin 10932 -> 10932 bytes
 paper/src/chapters/mdp_human.pdf | Bin 11953 -> 11953 bytes
 paper/src/main.tex               |   2 +-
 3 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/paper/src/chapters/mdp_agent.pdf b/paper/src/chapters/mdp_agent.pdf
index 83e10d3e525abf81b3b745ea82e4aff20b7db528..37df0f5cfd8f533a526b80a5be8c54a7c38386d4 100644
GIT binary patch
delta 284
zcmV+%0ptF(RkT&GgeZS{in)|J6l|^HI6s&i;y@^+5R%+V4?+=+p;$(eOY`=XoH(WE
zv>!W~8J55q6_{}*Kr>=A$8=t>vP5>eb{g4U*4i-u*;{jV_<#V%0iOe^Oy7DCTdxPj
z-kNi-v_+z1V+|;!MVceIc;C5ImJPJFvb!|OrN-8i$?Y4frEh;syOmxe*{%c^1<#2S
zi6!MPTs(6=8BP9T;VD>0{a#ufbHro=c-QAfZp=@J%fl0kd<iOO`H0>6AaKYZik&rm
zhq8=AoNq1z9nmpRD_hq^H+3EHE4=OXI8|Y5tG^qi54%l-A$8;j?M!c#c9=dqk3p#X
iExZeN_*bx!UVaam`ra?x;9f5*{Q~zGZ$-0_DI)<3`-J`g

delta 284
zcmV+%0ptF(RkT&GgeZT0in){~6l|^HI6s&iVnZmUV3OQQ4?+=+p;$(eOY`=X>^P<9
zv>!W~8J55z%CKY+2Q4Y06&9;3%5x;=Tc?mzqE<Enki0b)iw|%x9Pv4zLieo$MZ4Xg
zsEoe!QW?yoXp9EMw2o8EuFiFKn<N9Rjp(k8w6Qkzd_I0-lyHBIZcE`5X62SMp0Sh|
z5ns{bh4E*`=99@^EIfJZps$2cAxB6yfOmCi#9sgSxXeGkPB);8Hjmh?^8$<Xp*R@b
zcgXWFgn4us=!8z58qwM=xXIguU*Sxzrm65-rTT89Iv(}``qY6RmDRnG%3|^GJb9sX
iV|eTD@ULJYocIsY^}V0J!M&be`UUgeZ$h(?DI)><y^<vW

diff --git a/paper/src/chapters/mdp_human.pdf b/paper/src/chapters/mdp_human.pdf
index 41751c1a5ab428f936f4314db8b746ceb76219df..4803a60d8b57eebe2471bbc8d0914e7f7b274250 100644
GIT binary patch
delta 291
zcmV+;0o?wvU9nxTdn|uZYr-%Th2Q%t&dXS3&?d3AD#ZsY$QT1r-^Lz7dZ~rclq4PV
z-!G}wG4gaj?#Vfa%fTX^W5FT;zTgorFfa0`ERmkCJs~~FhHL~NeQRrr4+t<E@j0SO
z51j*#Hk(m#FuEQT8N^DqMuR7OnPiAB&h>VkrXy{P>@Tgdi8g=DY<By`kaVr?cG3~z
z-C8h_vy3y&#foPyOgu9-n@s*<;mKPE{XrTEIYP1#yi?uEt^V<GnSWxHFCp?p@rd0y
zFR;iSioMZ8kFpFym~Sp4ozTfsBRks%H+7xxE1Vf<nkv7wqwiMHalZ}Frw;r`Ru4vz
p#r)xU@<Qcq;jI?G!@q)+bn;6#_kR8c_j-Qm7yA%zO0$nGA^`{mjo|<Q

delta 291
zcmV+;0o?wvU9nxTdn|v!YQr!PMDO~Fxs*B-Y)MJ%G?*NcKq#dUlHN)WLS7t0v5X{_
z=I<-naZ1r?A3K^ER)8~+W67C>SPCWz%!{0rC9?NxN63z<CdU9|uTACf0SS&1J_ocj
zed{6EW-};`)>OSFi$tr&8i-V^(gl)>_nlj3*+6Tnx=W*7YHWW!n~mRCQob?mUU@>Y
zUrR1?z7SjpS%~=)myev!CX>HdcnH=}e^i!Yj+kr!Z&WpEYkopp9-e&8SCIL#xW{gN
z5I8LEii0(Mhq8=AoJW^|PUsw{Rjuoyo4!u?1>W{FO}ntQr|(A7>9CD3q>lVV&h%E3
p!~E`f4npn6@GchI;a|a>^6Jafw|?OUw|Zge7x&h0N3)MDA_20>kHi1~

diff --git a/paper/src/main.tex b/paper/src/main.tex
index 2959d04..9bf7f99 100644
--- a/paper/src/main.tex
+++ b/paper/src/main.tex
@@ -125,7 +125,7 @@ The textbook definition $D_{\mathrm{KL}}(P\parallel Q)=\sum_k P(k)\log(P(k)/Q(k)
 In code we do the basic fix: add a tiny floor $\varepsilon$ to both the numerator and denominator inside the log so nothing is exactly zero, which turns the sum into a finite, smoothed surrogate rather than a literal KL to raw counts. We also skip source states that do not exist at all in the reference kernel, because there is nowhere honest to compare against. This keeps the pipeline running and the divergence scores on a comparable scale, at the cost that the number is regularized KL behavior, not a purist information-theoretic quantity, which is acceptable here because we only use the gap between human-anchored and agent-anchored scores as a weak separability signal.
 
 
-\section{Why the logarithm appears in the revelation surrogate}
+\section{Expanding the Intuition of Information Value in the Reward}
 \label{app:revelation_log}
 
 Leakage is $\text{COI}_{\text{leak}} = f(\tau')\cdot\text{InfoValue}$. The query-tax form fixes $\text{InfoValue}=c>0$. The revelation form sets $\text{InfoValue}(p,\tau')=-\log\pi(p\mid\tau')$, with $\pi(\cdot\mid\tau')$ the policy distribution over quoted prices in context $\tau'$ (discretized as in the engine).