Experimental Design for Regret Minimization in Linear Bandits

Andrew Wagenmaker; Julian Katz-Samuels; Kevin Jamieson

arXiv:2011.00576·cs.LG·March 2, 2021·1 cites

Experimental Design for Regret Minimization in Linear Bandits

Andrew Wagenmaker, Julian Katz-Samuels, Kevin Jamieson

PDF

Open Access

TL;DR

This paper introduces a new experimental design-based algorithm for regret minimization in linear and combinatorial bandits, outperforming optimism-based methods by balancing information gain and reward, with strong theoretical guarantees.

Contribution

The paper presents a novel experimental design approach that achieves state-of-the-art regret bounds and applies efficiently in semi-bandit settings, including pure exploration.

Findings

01

Achieves regret bounds scaling with Gaussian width of action set

02

Efficient in combinatorial semi-bandit setting using linear maximization oracle

03

First example showing optimism fails in semi-bandit regime, where the algorithm succeeds

Abstract

In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to be suboptimal in many cases--our approach carefully plans which action to take by balancing the tradeoff between information gain and reward, overcoming the failures of optimism. In addition, we leverage tools from the theory of suprema of empirical processes to obtain regret guarantees that scale with the Gaussian width of the action set, avoiding wasteful union bounds. We provide state-of-the-art finite time regret guarantees and show that our algorithm can be applied in both the bandit and semi-bandit feedback regime. In the combinatorial semi-bandit setting, we show that our algorithm is computationally efficient and relies only on calls to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Auction Theory and Applications