Instance-Dependent Complexity of Contextual Bandits and Reinforcement   Learning: A Disagreement-Based Perspective

Dylan J. Foster; Alexander Rakhlin; David Simchi-Levi and; Yunzong Xu

arXiv:2010.03104·cs.LG·October 8, 2020·21 cites

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi and, Yunzong Xu

PDF

Open Access

TL;DR

This paper introduces a framework for understanding and achieving instance-dependent regret bounds in contextual bandits and reinforcement learning, providing new algorithms and complexity measures that adapt to problem difficulty.

Contribution

It develops a family of complexity measures necessary and sufficient for instance-dependent regret bounds and proposes new adaptive, oracle-efficient algorithms for contextual bandits and reinforcement learning.

Findings

01

Algorithms adapt to problem difficulty and often outperform existing methods.

02

Theoretical complexity measures characterize when instance-dependent bounds are achievable.

03

Empirical results show superior performance on challenging exploration tasks.

Abstract

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. Finally, we provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques