Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions

Marc Brooks; Gabriel Durham; Kihyuk Hong; Ambuj Tewari

arXiv:2505.16311·stat.ML·May 23, 2025

Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions

Marc Brooks, Gabriel Durham, Kihyuk Hong, Ambuj Tewari

PDF

Open Access 1 Video

TL;DR

This paper introduces GAMBITTS, a novel bandit algorithm tailored for GenAI-powered personalized interventions, which models treatment and reward processes to improve learning efficiency and guarantees in adaptive decision-making.

Contribution

The paper proposes GAMBITTS, a new bandit method that explicitly incorporates the generative structure of treatments and rewards, enhancing policy learning in GenAI-driven applications.

Findings

01

GAMBITTS outperforms standard bandit algorithms in simulations.

02

It provides stronger regret bounds under certain conditions.

03

Leveraging treatment information accelerates reward estimation.

Abstract

Recent advances in generative artificial intelligence (GenAI) models have enabled the generation of personalized content that adapts to up-to-date user context. While personalized decision systems are often modeled using bandit formulations, the integration of GenAI introduces new structure into otherwise classical sequential learning problems. In GenAI-powered interventions, the agent selects a query, but the environment experiences a stochastic response drawn from the generative model. Standard bandit methods do not explicitly account for this structure, where actions influence rewards only through stochastic, observed treatments. We introduce generator-mediated bandit-Thompson sampling (GAMBITTS), a bandit approach designed for this action/treatment split, using mobile health interventions with large language model-generated text as a motivating case study. GAMBITTS explicitly models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Artificial Intelligence in Healthcare and Education