Truncated Variance Reduced Value Iteration

Yujia Jin; Ishani Karmarkar; Aaron Sidford; and Jiayi Wang

arXiv:2405.12952·cs.LG·May 22, 2024

Truncated Variance Reduced Value Iteration

Yujia Jin, Ishani Karmarkar, Aaron Sidford, and Jiayi Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces faster randomized algorithms for computing near-optimal policies in discounted Markov decision processes, improving efficiency over previous methods by leveraging variance reduction and truncation techniques.

Contribution

The paper presents novel variance-reduced value iteration algorithms with improved time complexity for both sampling and offline settings in MDPs.

Findings

01

Achieves faster algorithms with improved time bounds.

02

Introduces variance truncation to enhance sampling procedures.

03

Bridges the gap between model-free and model-based methods.

Abstract

We provide faster randomized algorithms for computing an $ϵ$ -optimal policy in a discounted Markov decision process with $A_{tot}$ -state-action pairs, bounded rewards, and discount factor $γ$ . We provide an $\tilde{O} (A_{tot} [(1 - γ)^{- 3} ϵ^{- 2} + (1 - γ)^{- 2}])$ -time algorithm in the sampling setting, where the probability transition matrix is unknown but accessible through a generative model which can be queried in $\tilde{O} (1)$ -time, and an $\tilde{O} (s + (1 - γ)^{- 2})$ -time algorithm in the offline setting where the probability transition matrix is known and $s$ -sparse. These results improve upon the prior state-of-the-art which either ran in $\tilde{O} (A_{tot} [(1 - γ)^{- 3} ϵ^{- 2} + (1 - γ)^{- 3}])$ time [Sidford, Wang, Wu, Ye 2018] in the sampling setting, $\tilde{O} (s + A_{tot} (1 - γ)^{- 3})$ time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Truncated Variance Reduced Value Iteration· slideslive

Taxonomy

TopicsNeural Networks and Applications