Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal
Alekh Agarwal, Sham Kakade, Lin F. Yang

TL;DR
This paper proves that a simple plug-in approach using a generative model for MDPs is minimax optimal in the non-asymptotic regime, matching the best possible policy quality with minimal sample complexity.
Contribution
It establishes the minimax optimality of the naive plug-in method in model-based reinforcement learning with a generative model, using a novel analysis technique.
Findings
Plug-in approach achieves minimax optimal policy quality.
Any efficient planning algorithm can be used in the empirical MDP.
Introduces a novel absorbing MDP construction for analysis.
Abstract
This work considers the sample and computational complexity of obtaining an -optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP. We ask arguably the most basic and unresolved question in model based planning: is the naive "plug-in" approach, non-asymptotically, minimax optimal in the quality of the policy it finds, given a fixed sample size? Here, the non-asymptotic regime refers to when the sample size is sublinear in the model size. With access to a generative model, we resolve this question in the strongest possible sense: our main result shows that \emph{any} high accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification
