Randomized Linear Programming Solves the Discounted Markov Decision   Problem In Nearly-Linear (Sometimes Sublinear) Running Time

Mengdi Wang

arXiv:1704.01869·math.OC·June 4, 2019·20 cites

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

Mengdi Wang

PDF

Open Access

TL;DR

This paper introduces a randomized linear programming algorithm for the discounted Markov decision problem that achieves nearly-linear or sublinear runtime in the worst case, significantly improving efficiency for large-scale problems.

Contribution

The paper presents a novel randomized LP algorithm leveraging value-policy duality and binary trees, achieving nearly-linear or sublinear runtime for solving discounted MDPs.

Findings

01

Achieves $$-optimal policy in nearly-linear time.

02

In ergodic cases with special data formats, finds policy in sublinear time.

03

Provides new complexity benchmarks for stochastic dynamic programming.

Abstract

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality and binary-tree data structures, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an $ϵ$ -optimal policy using nearly-linear run time in the worst case. When the Markov decision process is ergodic and specified in some special data formats, the algorithm finds an $ϵ$ -optimal policy using run time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research