Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for   Martingale Mixtures

Hamish Flynn; David Reeb; Melih Kandemir; Jan Peters

arXiv:2309.14298·stat.ML·September 6, 2024

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures

Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters

PDF

Open Access 1 Video

TL;DR

This paper introduces improved algorithms for stochastic linear bandits that leverage novel tail bounds for adaptive martingale mixtures, resulting in tighter confidence sequences, better regret guarantees, and enhanced empirical performance.

Contribution

The paper develops a new tail bound for adaptive martingale mixtures and uses it to construct tighter confidence sequences for linear bandits, improving regret bounds and empirical results.

Findings

01

Tighter confidence sequences than existing methods.

02

Achieved competitive worst-case regret guarantees.

03

Improved hyperparameter tuning performance.

Abstract

We present improved algorithms with worst-case regret guarantees for the stochastic linear bandit problem. The widely used "optimism in the face of uncertainty" principle reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. The performance of the resulting bandit algorithm depends on the size of the confidence sequence, with smaller confidence sets yielding better empirical performance and stronger regret guarantees. In this work, we use a novel tail bound for adaptive martingale mixtures to construct confidence sequences which are suitable for stochastic bandits. These confidence sequences allow for efficient action selection via convex programming. We prove that a linear bandit algorithm based on our confidence sequences is guaranteed to achieve competitive worst-case regret. We show that our confidence sequences are tighter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Distributed Sensor Networks and Detection Algorithms