Near-Optimal Time and Sample Complexities for Solving Discounted Markov   Decision Process with a Generative Model

Aaron Sidford; Mengdi Wang; Xian Wu; Lin F. Yang; Yinyu Ye

arXiv:1806.01492·math.OC·June 7, 2019·30 cites

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

PDF

Open Access 1 Repo

TL;DR

This paper presents an algorithm for efficiently computing near-optimal policies in discounted Markov Decision Processes using a generative model, achieving near-optimal time and sample complexities that improve upon previous bounds.

Contribution

The paper introduces a new algorithm with near-optimal bounds for solving discounted MDPs via generative sampling, matching lower bounds up to logarithmic factors.

Findings

01

Achieves $O(rac{|S||A|}{(1- heta)^3 \epsilon^2} ext{log factors})$ complexity for $oxed{ ext{near-optimal policy computation}}$.

02

Improves previous bounds by a factor of $(1- heta)^{-1}$ for fixed $oxed{ ext{accuracy and confidence}}$.

03

Extends results to finite-horizon MDPs with nearly matching lower bounds.

Abstract

In this paper we consider the problem of computing an $ϵ$ -optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O (1)$ time. Given such a DMDP with states $S$ , actions $A$ , discount factor $γ \in (0, 1)$ , and rewards in range $[0, 1]$ we provide an algorithm which computes an $ϵ$ -optimal policy with probability $1 - δ$ where \emph{both} the time spent and number of sample taken are upper bounded by \[ O\left[\frac{|S||A|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|S||A|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. \] For fixed values of $ϵ$ , this improves upon the previous best known bounds by a factor of $(1 - γ)^{- 1}$ and matches the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uclaopt/AsyncQVI
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Formal Methods in Verification