Adaptation to the Range in $K$-Armed Bandits

H\'edi Hadiji; Gilles Stoltz

arXiv:2006.03378·math.ST·June 16, 2022·J. Mach. Learn. Res.

Adaptation to the Range in $K$-Armed Bandits

H\'edi Hadiji, Gilles Stoltz

PDF

Open Access

TL;DR

This paper investigates stochastic bandit problems with unknown bounded distributions, revealing a fundamental trade-off between distribution-dependent and distribution-free regret bounds, and proposes a strategy that balances these rates.

Contribution

It introduces a new trade-off in regret bounds for bandits with unknown ranges and provides a strategy that achieves the optimal rates within this trade-off.

Findings

01

A new trade-off between distribution-dependent and distribution-free regret bounds.

02

A strategy achieving the indicated regret rates.

03

Fundamental limits on learning the range in bandit problems.

Abstract

We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m, M]$ . We do not assume that the range $[m, M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $ln T$ and $T$ bounds. For instance, a $T$ }distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $T$ . We exhibit a strategy achieving the rates for regret indicated by the new trade-off.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Stochastic Gradient Optimization Techniques