Refined Lower Bounds for Adversarial Bandits
S\'ebastien Gerchinovitz (IMT, AOC), Tor Lattimore

TL;DR
This paper establishes new lower bounds on adversarial bandit regret, showing that recent upper bounds are nearly tight and proving fundamental impossibility results that differentiate bandit from full-information settings.
Contribution
It introduces refined lower bounds for adversarial bandit regret and proves key impossibility results that highlight limitations of bandit algorithms.
Findings
Recent upper bounds are close to tight.
Existence of a universally optimal arm does not improve regret.
Regret cannot scale with the effective range of losses.
Abstract
We provide new lower bounds on the regret that must be suffered by adversarial bandit algorithms. The new results show that recent upper bounds that either (a) hold with high-probability or (b) depend on the total lossof the best arm or (c) depend on the quadratic variation of the losses, are close to tight. Besides this we prove two impossibility results. First, the existence of a single arm that is optimal in every round cannot improve the regret in the worst case. Second, the regret cannot scale with the effective range of the losses. In contrast, both results are possible in the full-information setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
