Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model
Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen

TL;DR
This paper demonstrates that model-based reinforcement learning algorithms can achieve minimax-optimal sample complexity with significantly fewer samples than previously thought, breaking the longstanding sample size barrier in MDPs.
Contribution
It establishes the first minimax-optimal sample complexity guarantees for model-based RL algorithms that work across all feasible sample sizes, including smaller ones.
Findings
Overcomes the sample size barrier for infinite-horizon MDPs.
Certifies minimax optimality of two model-based algorithms.
Extends results to finite-horizon MDPs with plain planning.
Abstract
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider -discounted infinite-horizon Markov decision processes (MDPs) with state space and action space . Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least . The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
