AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Discounted Markov Decision Processes with Near-Optimal Sample Complexity
Yibo Zeng, Fei Feng, Wotao Yin

TL;DR
AsyncQVI is an asynchronous-parallel algorithm for discounted Markov decision processes that achieves near-optimal sample complexity with low memory usage, making it suitable for large-scale problems and demonstrating high efficiency and linear speedup.
Contribution
It introduces AsyncQVI, the first asynchronous-parallel algorithm with near-optimal sample complexity for discounted MDPs, combining low memory footprint and high efficiency.
Findings
Achieves near-optimal sample complexity matching theoretical lower bounds.
Demonstrates high efficiency and linear parallel speedup in numerical tests.
Uses memory of size proportional to the number of states, suitable for large-scale applications.
Abstract
In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for discounted Markov decision processes whose transition and reward can only be sampled through a generative model. Given such a problem with states, actions, and a discounted factor , AsyncQVI uses memory of size and returns an -optimal policy with probability at least using samples. AsyncQVI is also the first asynchronous-parallel algorithm for discounted Markov decision processes that has a sample complexity, which nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability make AsyncQVI suitable for large-scale applications. In numerical tests, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Formal Methods in Verification
