Finite-Time Guarantees for Multi-Agent Combinatorial Bandits with Nonstationary Rewards

Katherine B. Adams; Justin J. Boutilier; Qinyang He; Yonatan Mintz

arXiv:2508.20923·cs.LG·August 29, 2025

Finite-Time Guarantees for Multi-Agent Combinatorial Bandits with Nonstationary Rewards

Katherine B. Adams, Justin J. Boutilier, Qinyang He, Yonatan Mintz

PDF

Open Access

TL;DR

This paper introduces a novel framework for combinatorial multi-armed bandits with nonstationary rewards, addressing dynamic effects like habituation and recovery, with theoretical guarantees and practical success in health intervention case studies.

Contribution

It is the first to incorporate nonstationary reward effects into combinatorial bandit algorithms, providing theoretical guarantees and demonstrating real-world effectiveness.

Findings

01

Algorithms with theoretical guarantees on dynamic regret.

02

Practical case study showing threefold improvement in program enrollment.

03

Bridging adaptive learning theory with behavioral intervention applications.

Abstract

We study a sequential resource allocation problem where a decision maker selects subsets of agents at each period to maximize overall outcomes without prior knowledge of individual-level effects. Our framework applies to settings such as community health interventions, targeted digital advertising, and workforce retention programs, where intervention effects evolve dynamically. Agents may exhibit habituation (diminished response from frequent selection) or recovery (enhanced response from infrequent selection). The technical challenge centers on nonstationary reward distributions that lead to changing intervention effects over time. The problem requires balancing two key competing objectives: heterogeneous individual rewards and the exploration-exploitation tradeoff in terms of learning for improved future decisions as opposed to maximizing immediate outcomes. Our contribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems