Linear Bandits on Ellipsoids: Minimax Optimal Algorithms

Raymond Zhang; Hedi Hadiji; Richard Combes

arXiv:2502.17175·stat.ML·February 25, 2025

Linear Bandits on Ellipsoids: Minimax Optimal Algorithms

Raymond Zhang, Hedi Hadiji, Richard Combes

PDF

Open Access

TL;DR

This paper introduces a new minimax optimal algorithm for linear stochastic bandits with actions on ellipsoids, combining novel estimation and explore-and-commit strategies, which is computationally efficient and theoretically optimal.

Contribution

The paper presents the first minimax optimal algorithm for ellipsoid action sets in linear bandits, using a non-classical approach that is computationally efficient and theoretically robust.

Findings

01

Achieves regret matching the minimax lower bound.

02

Algorithm is computationally efficient with polynomial time complexity.

03

Demonstrates local asymptotic minimax optimality.

Abstract

We consider linear stochastic bandits where the set of actions is an ellipsoid. We provide the first known minimax optimal algorithm for this problem. We first derive a novel information-theoretic lower bound on the regret of any algorithm, which must be at least $Ω (min (d σ T + d ∥ θ ∥_{A}, ∥ θ ∥_{A} T))$ where $d$ is the dimension, $T$ the time horizon, $σ^{2}$ the noise variance, $A$ a matrix defining the set of actions and $θ$ the vector of unknown parameters. We then provide an algorithm whose regret matches this bound to a multiplicative universal constant. The algorithm is non-classical in the sense that it is not optimistic, and it is not a sampling algorithm. The main idea is to combine a novel sequential procedure to estimate $∥ θ ∥$ , followed by an explore-and-commit strategy informed by this estimate. The algorithm is highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research

MethodsSparse Evolutionary Training