Optimal Batched Linear Bandits
Xuanfei Ren, Tianyuan Jin, Pan Xu

TL;DR
The paper introduces E$^4$, an optimal batched linear bandit algorithm that achieves minimax and asymptotic regret bounds with minimal batch complexity, and demonstrates superior empirical performance.
Contribution
E$^4$ is the first algorithm to simultaneously attain minimax and asymptotic optimality in regret with optimal batch complexities in linear bandits.
Findings
Achieves finite-time minimax optimal regret with O(log log T) batches.
Achieves asymptotically optimal regret with only 3 batches as T→∞.
Outperforms baseline algorithms in experiments on challenging instances.
Abstract
We introduce the E algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E achieves the finite-time minimax optimal regret with only batches, and the asymptotically optimal regret with only batches as , where is the time horizon. We further prove a lower bound on the batch complexity of linear contextual bandits showing that any asymptotically optimal algorithm must require at least batches in expectation as , which indicates E achieves the asymptotic optimality in regret and batch complexity simultaneously. To the best of our knowledge, E is the first algorithm for linear bandits that simultaneously achieves the minimax and asymptotic optimality in regret with the corresponding optimal batch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Machine Learning and Algorithms
