Adaptation to the Range in $K$-Armed Bandits
H\'edi Hadiji, Gilles Stoltz

TL;DR
This paper investigates stochastic bandit problems with unknown bounded distributions, revealing a fundamental trade-off between distribution-dependent and distribution-free regret bounds, and proposes a strategy that balances these rates.
Contribution
It introduces a new trade-off in regret bounds for bandits with unknown ranges and provides a strategy that achieves the optimal rates within this trade-off.
Findings
A new trade-off between distribution-dependent and distribution-free regret bounds.
A strategy achieving the indicated regret rates.
Fundamental limits on learning the range in bandit problems.
Abstract
We consider stochastic bandit problems with arms, each associated with a bounded distribution supported on the range . We do not assume that the range is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical and bounds. For instance, a }distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order . We exhibit a strategy achieving the rates for regret indicated by the new trade-off.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Stochastic Gradient Optimization Techniques
