AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov   Decision Processes with Near-Optimal Sample Complexity

Yibo Zeng; Fei Feng; Wotao Yin

arXiv:1812.00885·math.OC·February 25, 2020·1 cites

AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity

Yibo Zeng, Fei Feng, Wotao Yin

PDF

Open Access 1 Repo

TL;DR

AsyncQVI is an asynchronous-parallel algorithm for discounted Markov decision processes that achieves near-optimal sample complexity with low memory usage, making it suitable for large-scale problems and demonstrating high efficiency and linear speedup.

Contribution

It introduces AsyncQVI, the first asynchronous-parallel algorithm with near-optimal sample complexity for discounted MDPs, combining low memory footprint and high efficiency.

Findings

01

Achieves near-optimal sample complexity matching theoretical lower bounds.

02

Demonstrates high efficiency and linear parallel speedup in numerical tests.

03

Uses memory of size proportional to the number of states, suitable for large-scale applications.

Abstract

In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with $∣ S ∣$ states, $∣ A ∣$ actions, and a discounted factor $γ \in (0, 1)$ , AsyncQVI uses memory of size $O (∣ S ∣)$ and returns an $ε$ -optimal policy with probability at least $1 - δ$ using $\tilde{O} (\frac{∣ S ∣∣ A ∣}{( 1 - γ ) ^{5} ε ^{2}} lo g (\frac{1}{δ}))$ samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uclaopt/AsyncQVI
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification