Information Directed Sampling and Bandits with Heteroscedastic Noise

Johannes Kirschner; Andreas Krause

arXiv:1801.09667·stat.ML·April 20, 2018·COLT·73 cites

Information Directed Sampling and Bandits with Heteroscedastic Noise

Johannes Kirschner, Andreas Krause

PDF

Open Access

TL;DR

This paper introduces a new framework for bandit problems with heteroscedastic noise, developing algorithms that adapt to varying noise levels and outperform traditional methods like UCB and Thompson Sampling in such settings.

Contribution

It proposes a frequentist regret analysis and a novel Information Directed Sampling approach tailored for heteroscedastic noise, with theoretical guarantees and practical algorithms.

Findings

01

New high-probability regret bounds for heteroscedastic noise

02

Algorithms outperform UCB and Thompson Sampling in heteroscedastic settings

03

Bounds recover known results in homoscedastic case

Abstract

In the stochastic bandit problem, the goal is to maximize an unknown function via a sequence of noisy evaluations. Typically, the observation noise is assumed to be independent of the evaluation point and to satisfy a tail bound uniformly on the domain; a restrictive assumption for many applications. In this work, we consider bandits with heteroscedastic noise, where we explicitly allow the noise distribution to depend on the evaluation point. We show that this leads to new trade-offs for information and regret, which are not taken into account by existing approaches like upper confidence bound algorithms (UCB) or Thompson Sampling. To address these shortcomings, we introduce a frequentist regret analysis framework, that is similar to the Bayesian framework of Russo and Van Roy (2014), and we prove a new high-probability regret bound for general, possibly randomized policies, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms