adding missing ideas and apendix kl

This commit is contained in:
2026-04-08 21:46:54 +02:00
parent cc823ec63c
commit e72e3c81c1
5 changed files with 16 additions and 11 deletions

View File

@@ -111,15 +111,13 @@ v4 & 64 & 275 & $64 \times 275 = 17{,}600$ \\
Converting to petaFLOPS: $160{,}320\;\text{TFLOPS} = 160.32\;\text{PFLOPS} \approx 160\;\text{PFLOPS}$. This is the theoretical peak under sustained BF16 arithmetic; realized throughput depends on memory bandwidth utilization and inter-chip communication overhead, but the figure serves as a useful upper bound for provisioning decisions.
\section{whoclickedit Dataset Card}
\label{app:whoclicked_card}
\section{KL divergence when the reference has zeros}
\label{app:kl_zeros}
For transparency and reproducibility, this appendix includes the full dataset card used for the public release of the \texttt{whoclickedit} dataset.
The textbook definition $D_{\mathrm{KL}}(P\parallel Q)=\sum_k P(k)\log(P(k)/Q(k))$ is not usable as-is when our empirical reference puts $Q(k)=0$ somewhere the session distribution still visits: if $P(k)>0$ and $Q(k)=0$, that term wants to blow up to infinity. With only 29 sessions the estimated transition rows are incredibly sparse.
In code we do the basic fix: add a tiny floor $\varepsilon$ to both the numerator and denominator inside the log so nothing is exactly zero, which turns the sum into a finite, smoothed surrogate rather than a literal KL to raw counts. We also skip source states that do not exist at all in the reference kernel, because there is nowhere honest to compare against. This keeps the pipeline running and the divergence scores on a comparable scale, at the cost that the number is regularized KL behavior, not a purist information-theoretic quantity, which is acceptable here because we only use the gap between human-anchored and agent-anchored scores as a weak separability signal.
\lstinputlisting[
caption={whoclickedit dataset card (README snapshot)},
label={lst:whoclicked_dataset_card}
]{chapters/auto/whoclicked_dataset_card.md}
% \input{../build/concatenated_code}