Solving Ergodic Markov Decision Processes and Perfect Information   Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration

Marianne Akian; St\'ephane Gaubert; Zheng Qu; Omar Saadi

arXiv:1909.06185·math.OC·September 16, 2019·CDC

Solving Ergodic Markov Decision Processes and Perfect Information Zero-sum Stochastic Games by Variance Reduced Deflated Value Iteration

Marianne Akian, St\'ephane Gaubert, Zheng Qu, Omar Saadi

PDF

TL;DR

This paper extends variance-reduced value iteration algorithms to solve mean-payoff Markov decision processes and zero-sum stochastic games with sublinear complexity bounds, using a reduction to discounted problems and spectral theory.

Contribution

It introduces a novel approach combining Doob h-transform and deflation techniques to efficiently solve mean-payoff problems, extending prior discounted MDP algorithms.

Findings

01

Achieves sublinear complexity bounds for mean-payoff problems

02

Extends variance reduction methods to zero-sum stochastic games

03

Uses spectral theory for complexity analysis

Abstract

Recently, Sidford, Wang, Wu and Ye (2018) developed an algorithm combining variance reduction techniques with value iteration to solve discounted Markov decision processes. This algorithm has a sublinear complexity when the discount factor is fixed. Here, we extend this approach to mean-payoff problems, including both Markov decision processes and perfect information zero-sum stochastic games. We obtain sublinear complexity bounds, assuming there is a distinguished state which is accessible from all initial states and for all policies. Our method is based on a reduction from the mean payoff problem to the discounted problem by a Doob h-transform, combined with a deflation technique. The complexity analysis of this algorithm uses at the same time the techniques developed by Sidford et al. in the discounted case and non-linear spectral theory techniques (Collatz-Wielandt characterization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.