Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures
Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters

TL;DR
This paper introduces improved algorithms for stochastic linear bandits that leverage novel tail bounds for adaptive martingale mixtures, resulting in tighter confidence sequences, better regret guarantees, and enhanced empirical performance.
Contribution
The paper develops a new tail bound for adaptive martingale mixtures and uses it to construct tighter confidence sequences for linear bandits, improving regret bounds and empirical results.
Findings
Tighter confidence sequences than existing methods.
Achieved competitive worst-case regret guarantees.
Improved hyperparameter tuning performance.
Abstract
We present improved algorithms with worst-case regret guarantees for the stochastic linear bandit problem. The widely used "optimism in the face of uncertainty" principle reduces a stochastic bandit problem to the construction of a confidence sequence for the unknown reward function. The performance of the resulting bandit algorithm depends on the size of the confidence sequence, with smaller confidence sets yielding better empirical performance and stronger regret guarantees. In this work, we use a novel tail bound for adaptive martingale mixtures to construct confidence sequences which are suitable for stochastic bandits. These confidence sequences allow for efficient action selection via convex programming. We prove that a linear bandit algorithm based on our confidence sequences is guaranteed to achieve competitive worst-case regret. We show that our confidence sequences are tighter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Distributed Sensor Networks and Detection Algorithms
