Bandit Learning in General Open Multi-agent Systems

Mengfan Xu

arXiv:2605.06202·cs.LG·May 8, 2026

Bandit Learning in General Open Multi-agent Systems

Mengfan Xu

PDF

TL;DR

This paper studies bandit learning in open multi-agent systems, addressing challenges like non-stationarity and information flow, and introduces new concepts and algorithms with provable regret guarantees.

Contribution

It formulates a unified open-system bandit framework with novel concepts and develops certified global-UCB algorithms with tight regret bounds.

Findings

01

Regret scales linearly with entry uncertainty via pre-training degree.

02

Stable regimes' regret depends on identifying persistent optimal arms.

03

Lower bounds confirm the tightness of the proposed regret dependencies.

Abstract

Recent developments in digital platforms have highlighted the prevalence of open systems, where agents can arrive and depart over time. While bandit learning in open systems has recently received initial attention, existing work imposes structural assumptions that are frequently violated in practice. A learning paradigm for general open systems creates fresh challenges: newly arriving agents induce endogenous non-stationarity; agent patterns determine how quickly information accumulates; and new agents make regret scale further with the time horizon. To this end, we formulate a unified open-system bandit problem with general dynamics, including heterogeneous rewards and general agent patterns. We introduce new concepts to capture the inherent complexities: the \emph{pre-training degree} of new agents quantifies how much information an agent carries upon entry, \emph{stability} measures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.