Linear Bandits on Ellipsoids: Minimax Optimal Algorithms
Raymond Zhang, Hedi Hadiji, Richard Combes

TL;DR
This paper introduces a new minimax optimal algorithm for linear stochastic bandits with actions on ellipsoids, combining novel estimation and explore-and-commit strategies, which is computationally efficient and theoretically optimal.
Contribution
The paper presents the first minimax optimal algorithm for ellipsoid action sets in linear bandits, using a non-classical approach that is computationally efficient and theoretically robust.
Findings
Achieves regret matching the minimax lower bound.
Algorithm is computationally efficient with polynomial time complexity.
Demonstrates local asymptotic minimax optimality.
Abstract
We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least where is the dimension, the time horizon, the noise variance, a matrix defining the set of actions and the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant. The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm. The main idea is to combine a novel sequential procedure to estimate , followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
MethodsSparse Evolutionary Training
