Information Directed Sampling and Bandits with Heteroscedastic Noise
Johannes Kirschner, Andreas Krause

TL;DR
This paper introduces a new framework for bandit problems with heteroscedastic noise, developing algorithms that adapt to varying noise levels and outperform traditional methods like UCB and Thompson Sampling in such settings.
Contribution
It proposes a frequentist regret analysis and a novel Information Directed Sampling approach tailored for heteroscedastic noise, with theoretical guarantees and practical algorithms.
Findings
New high-probability regret bounds for heteroscedastic noise
Algorithms outperform UCB and Thompson Sampling in heteroscedastic settings
Bounds recover known results in homoscedastic case
Abstract
In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy evaluations. Typically, the observation noise is assumed to be independent of the evaluation point and to satisfy a tail bound uniformly on the domain; a restrictive assumption for many applications. In this work, we consider bandits with heteroscedastic noise, where we explicitly allow the noise distribution to depend on the evaluation point. We show that this leads to new trade-offs for information and regret, which are not taken into account by existing approaches like upper confidence bound algorithms (UCB) or Thompson Sampling. To address these shortcomings, we introduce a frequentist regret analysis framework, that is similar to the Bayesian framework of Russo and Van Roy (2014), and we prove a new high-probability regret bound for general, possibly randomized policies, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
