When is Agnostic Reinforcement Learning Statistically Tractable?
Zeyu Jia, Gene Li, Alexander Rakhlin, Ayush Sekhari, Nathan Srebro

TL;DR
This paper investigates the sample complexity of agnostic PAC reinforcement learning, introducing the spanning capacity as a key measure, and reveals fundamental differences between generative and online models, proposing a new algorithm for efficient learning.
Contribution
It introduces the spanning capacity as a new complexity measure for agnostic RL and analyzes its implications for learnability in different access models, along with a novel algorithm called POPLER.
Findings
Spanning capacity characterizes PAC learnability with a generative model.
Existence of policy classes with bounded spanning capacity but requiring superpolynomial samples online.
The POPLER algorithm enables statistically efficient online RL under certain structural conditions.
Abstract
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class , how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an -suboptimal policy with respect to ? Towards that end, we introduce a new complexity measure, called the \emph{spanning capacity}, that depends solely on the set and is independent of the MDP dynamics. With a generative model, we show that for any policy class , bounded spanning capacity characterizes PAC learnability. However, for online RL, the situation is more subtle. We show there exists a policy class with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms
