Experimental Design for Regret Minimization in Linear Bandits
Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

TL;DR
This paper introduces a new experimental design-based algorithm for regret minimization in linear and combinatorial bandits, outperforming optimism-based methods by balancing information gain and reward, with strong theoretical guarantees.
Contribution
The paper presents a novel experimental design approach that achieves state-of-the-art regret bounds and applies efficiently in semi-bandit settings, including pure exploration.
Findings
Achieves regret bounds scaling with Gaussian width of action set
Efficient in combinatorial semi-bandit setting using linear maximization oracle
First example showing optimism fails in semi-bandit regime, where the algorithm succeeds
Abstract
In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to be suboptimal in many cases--our approach carefully plans which action to take by balancing the tradeoff between information gain and reward, overcoming the failures of optimism. In addition, we leverage tools from the theory of suprema of empirical processes to obtain regret guarantees that scale with the Gaussian width of the action set, avoiding wasteful union bounds. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime. In the combinatorial semi-bandit setting, we show that our algorithm is computationally efficient and relies only on calls to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications
