Regret Lower Bounds in Multi-agent Multi-armed Bandit
Mengfan Xu, Diego Klabjan

TL;DR
This paper provides the first comprehensive analysis of regret lower bounds in multi-agent multi-armed bandit problems across various settings, establishing tight bounds and bridging gaps with existing upper bounds.
Contribution
It introduces tight regret lower bounds for multiple scenarios in multi-agent bandits, including stochastic, adversarial, connected, and disconnected graph settings.
Findings
Lower bound of O(log T) for stochastic, connected graphs
Lower bound of √T for mean-gap independent stochastic case
Lower bound of O(T^{2/3}) for adversarial rewards
Abstract
Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Optimization and Search Problems
