Truncated Variance Reduced Value Iteration
Yujia Jin, Ishani Karmarkar, Aaron Sidford, and Jiayi Wang

TL;DR
This paper introduces faster randomized algorithms for computing near-optimal policies in discounted Markov decision processes, improving efficiency over previous methods by leveraging variance reduction and truncation techniques.
Contribution
The paper presents novel variance-reduced value iteration algorithms with improved time complexity for both sampling and offline settings in MDPs.
Findings
Achieves faster algorithms with improved time bounds.
Introduces variance truncation to enhance sampling procedures.
Bridges the gap between model-free and model-based methods.
Abstract
We provide faster randomized algorithms for computing an -optimal policy in a discounted Markov decision process with -state-action pairs, bounded rewards, and discount factor . We provide an -time algorithm in the sampling setting, where the probability transition matrix is unknown but accessible through a generative model which can be queried in -time, and an -time algorithm in the offline setting where the probability transition matrix is known and -sparse. These results improve upon the prior state-of-the-art which either ran in time [Sidford, Wang, Wu, Ye 2018] in the sampling setting, time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
