Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective
Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi and, Yunzong Xu

TL;DR
This paper introduces a framework for understanding and achieving instance-dependent regret bounds in contextual bandits and reinforcement learning, providing new algorithms and complexity measures that adapt to problem difficulty.
Contribution
It develops a family of complexity measures necessary and sufficient for instance-dependent regret bounds and proposes new adaptive, oracle-efficient algorithms for contextual bandits and reinforcement learning.
Findings
Algorithms adapt to problem difficulty and often outperform existing methods.
Theoretical complexity measures characterize when instance-dependent bounds are achievable.
Empirical results show superior performance on challenging exploration tasks.
Abstract
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. Finally, we provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
