Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model
Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

TL;DR
This paper presents an algorithm for efficiently computing near-optimal policies in discounted Markov Decision Processes using a generative model, achieving near-optimal time and sample complexities that improve upon previous bounds.
Contribution
The paper introduces a new algorithm with near-optimal bounds for solving discounted MDPs via generative sampling, matching lower bounds up to logarithmic factors.
Findings
Achieves $O(rac{|S||A|}{(1- heta)^3 \epsilon^2} ext{log factors})$ complexity for $oxed{ ext{near-optimal policy computation}}$.
Improves previous bounds by a factor of $(1- heta)^{-1}$ for fixed $oxed{ ext{accuracy and confidence}}$.
Extends results to finite-horizon MDPs with nearly matching lower bounds.
Abstract
In this paper we consider the problem of computing an -optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in time. Given such a DMDP with states , actions , discount factor , and rewards in range we provide an algorithm which computes an -optimal policy with probability where \emph{both} the time spent and number of sample taken are upper bounded by \[ O\left[\frac{|S||A|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|S||A|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. \] For fixed values of , this improves upon the previous best known bounds by a factor of and matches the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Formal Methods in Verification
