Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions
Marc Brooks, Gabriel Durham, Kihyuk Hong, Ambuj Tewari

TL;DR
This paper introduces GAMBITTS, a novel bandit algorithm tailored for GenAI-powered personalized interventions, which models treatment and reward processes to improve learning efficiency and guarantees in adaptive decision-making.
Contribution
The paper proposes GAMBITTS, a new bandit method that explicitly incorporates the generative structure of treatments and rewards, enhancing policy learning in GenAI-driven applications.
Findings
GAMBITTS outperforms standard bandit algorithms in simulations.
It provides stronger regret bounds under certain conditions.
Leveraging treatment information accelerates reward estimation.
Abstract
Recent advances in generative artificial intelligence (GenAI) models have enabled the generation of personalized content that adapts to up-to-date user context. While personalized decision systems are often modeled using bandit formulations, the integration of GenAI introduces new structure into otherwise classical sequential learning problems. In GenAI-powered interventions, the agent selects a query, but the environment experiences a stochastic response drawn from the generative model. Standard bandit methods do not explicitly account for this structure, where actions influence rewards only through stochastic, observed treatments. We introduce generator-mediated bandit-Thompson sampling (GAMBITTS), a bandit approach designed for this action/treatment split, using mobile health interventions with large language model-generated text as a motivating case study. GAMBITTS explicitly models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Artificial Intelligence in Healthcare and Education
