From 03b4996bea715f5876170c31f75c28329a4184fe Mon Sep 17 00:00:00 2001 From: Daniel Rosel Date: Fri, 10 Apr 2026 11:27:15 +0200 Subject: [PATCH] moer on future work --- paper/src/chapters/06-conclusion.tex | 6 ++++-- paper/src/chapters/mdp_agent.pdf | Bin 10931 -> 10932 bytes paper/src/chapters/mdp_human.pdf | Bin 11953 -> 11953 bytes 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/paper/src/chapters/06-conclusion.tex b/paper/src/chapters/06-conclusion.tex index 76b5153..1c87465 100644 --- a/paper/src/chapters/06-conclusion.tex +++ b/paper/src/chapters/06-conclusion.tex @@ -1,7 +1,7 @@ \section{Conclusion} \label{sec:conclusion} -This thesis examined reinforcement-learning policies for dynamic pricing when a fraction of traffic is orchestrated by non-human agents intent on extracting information before purchase. We introduced COI-oriented metrics, a behavioral distinguishability layer, and a distributionally robust training loop; empirical runs show where robustness helps and where it must be tuned. +This thesis examined reinforcement-learning policies for dynamic pricing when a fraction of traffic is orchestrated by non-human agents intent on extracting information before purchase. We introduced COI-oriented metrics, a behavioral distinguishability layer, and a distributionally robust training loop, empirical runs show where robustness helps and where it must be tuned. \subsection{Summary of contributions} Our work has yielded a broad set of dependencies which we carefully orchestrated to give us measurable results. To give a clear picture we outline the specific contributions of each stage of our work. The theoretical component formalizes why agent-mediated reconnaissance erodes pricing power, the behavioral component establishes that such contamination is detectable from interaction traces alone, the control component translates that distinguishability into a robust pricing mechanism, and the systems component provides the controlled experimental environment required to observe, test, and reproduce these effects. @@ -20,6 +20,8 @@ Our work has yielded a broad set of dependencies which we carefully orchestrated \subsection{Limitations and future work} -Several constraints are intentional and could be relaxed later. Action weights in the demand proxy are hand-set; learning them from data is an obvious next step. The Stackelberg interface assumes a clean alternation between platform move and market response; richer histories (multi-agent, multi-platform) would need a less rigid state definition. Non-perishable catalog supply in the simulator widens the sim-to-real gap for inventory-constrained domains. Within-session contamination is modeled as stable; time-varying $\alpha$ inside a session would better match some attack patterns. +Several constraints are intentional and could be relaxed later. Action weights in the demand proxy are currently derived from simple divergence rankings, learning them from data is an obvious next step. We propose a jointly learn the demand proxy, policy, and simulator parameters instead of treating them modularly. Another avenue we could not cover in this work is incorporating Bayesian methods better capture demand uncertainty and propagation of that uncertainty into reward systems. +The Stackelberg interface assumes a clean alternation between platform move and market response. Richer histories (multi-agent, multi-platform) would need a less rigid state definition. Non-perishable catalog supply in the simulator widens the sim-to-real gap for inventory-constrained domains. Within-session contamination is modeled as stable, time-varying $\alpha$ inside a session would better match some attack patterns. Before any deployment, human baselines should grow beyond the convenience sample used here, catalog scaling laws should be re-checked when transition matrices grow with SKU count, and the full pipeline should be re-validated under production traffic volumes, governance constraints, and product mixes. +We conclude our work with enthusiasm for future developments in the field of agent mediated commerce, we are excited to provide the foundations for these developments and hope to see future work in similar spirit. diff --git a/paper/src/chapters/mdp_agent.pdf b/paper/src/chapters/mdp_agent.pdf index 21460dfb20d52faa2eb8283c3f01db8a7948e373..83e10d3e525abf81b3b745ea82e4aff20b7db528 100644 GIT binary patch delta 323 zcmV-J0lfaRRkT&GeJFp;YQr!PgztWexs)apY^~xrKbRb1Lnx(SlH5uULJ^LkSVodd z^Y)eOIHl;cA3K^EmcSy)uw)SjEh(ZE7OO1Eb0p_mr;t>lRyG2Vyfqh#4{$IX@j0ME z_pJj(yWOCujK1_z8O)?;j0VNDj#JF8&UJR1Bm=FD=&p^lu{MA8d_I0-lyHr1OW_n| z<(4y^v6L7QU(w=)@n^>7lgVE!JbCM&uY^${M@Tk+cXesRUjO*G%s;+PH=v9*kJzpA z0*myaI2he`$n!9Sd2|`*gif9s(b_J!$=ifq;Y_clsqkB+`fj8;9`*tH)PWzB)xD9* zV)5`id7*S;cy`C7Y281)sA{NzjnyVG93d%%JyCG|^VE;nr~-O4WY`!TgO9%yhD^m=W-+ zeN5tvq?C|6AvAp<@iQU8V)7RYPuAM$Yo>(D;gU_@T{Im#${!n-*ca#N4m6{?HFj^T zKqFlhCnd)LMd1cF&n^>P(8W^4dObKdeqZn_j2gu@=7hl0Yb>5R%?X4?OtIX&*b98CHNZ zkz>i3gjfnD3e1a~l_j$GYe&eAswT$(WUo!-@Bs;q6FvvDGkxnJ*k&^*j@DGYCW}O? z#u|uJtkMONi}#&dXW2k&tGY|0U21GSn~mRCQob?mUU@>YUrT>3bG{H<2w8~v6PJ&i z&nA<0B=+^YHNN%Tppf$&R3B6vbe`?eGoV-?uvsoeTTA)L!3vK zfllZgs8y}&qMN=>_yykfG)=p(wWse!)9J8{Fr<$BM9%b9lf(S(c@9GD$M7!P;a|a> a^6Jafw|?OUw|WO*=@<9bZ%4C_Eg}KMXNZpg delta 276 zcmV+v0qg#;U9nxTiY$M2Y&V!3l0Yb>5R%?X4?-4>pjbwdOY`@YoH(WEw2vLl49h?X zE3u@6gDn|h74oViO@r+HGALvR(J4m&WUsn)_y7mR5uX#a;*|_I_}yESqR&#cJP%Im?I{ez?*8d*qEOXmxjlm%No2~R`=Mw4+4kcuGm{M4rrP<#QEwn z(HWfrwW4=Jbd#4Ezrfp3&C@n)?bNrH>bT!T7*a=mRL+c6Du?{;c?v@5ui;&|!@q)C a;l-EfZvDayZuJMk(l7Y*Z%DI`Eg}IfNR4>_