The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis
Matthew Zurek, Yudong Chen

TL;DR
This paper provides a theoretical analysis of the simple plug-in approach for learning near-optimal policies in average-reward MDPs, establishing its optimal sample complexity without prior knowledge of problem parameters.
Contribution
It proves the optimal sample complexity bounds for the plug-in approach in average-reward MDPs, filling a gap in theoretical understanding and removing the need for prior problem information.
Findings
Achieves optimal diameter- and mixing-based sample complexities.
Provides span-based bounds and matching lower bounds.
Improves analysis for discounted plug-in approach, removing horizon restrictions.
Abstract
We study the sample complexity of the plug-in approach for learning -optimal policies in average-reward Markov decision processes (MDPs) with a generative model. The plug-in approach constructs a model estimate then computes an average-reward optimal policy in the estimated model. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed. Unlike the more well-studied discounted MDP reduction method, the plug-in approach requires no prior problem information or parameter tuning. Our results fill this gap and address the limitations of prior approaches, as we show that the plug-in approach is optimal in several well-studied settings without using prior knowledge. Specifically it achieves the optimal diameter- and mixing-based sample complexities of …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications
