Linear Programming for Large-Scale Markov Decision Problems

Yasin Abbasi-Yadkori; Peter L. Bartlett; Alan Malek

arXiv:1402.6763·math.OC·February 28, 2014·30 cites

Linear Programming for Large-Scale Markov Decision Problems

Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

PDF

Open Access

TL;DR

This paper introduces scalable algorithms for large-scale Markov decision processes using linear programming, focusing on low-dimensional policy classes and providing performance bounds independent of state space size.

Contribution

It develops two novel algorithms based on stochastic convex optimization and constraint sampling for approximate policy optimization in large MDPs.

Findings

01

Algorithms approach the best within the comparison class.

02

Performance bounds are independent of state space size.

03

Preliminary experiments confirm effectiveness in queuing applications.

Abstract

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost. Since it is intractable to compete with the optimal policy for large scale problems, we pursue the more modest goal of competing with a low-dimensional family of policies. We use the dual linear programming formulation of the MDP average cost problem, in which the variable is a stationary distribution over state-action pairs, and we consider a neighborhood of a low-dimensional subset of the set of stationary distributions (defined in terms of state-action features) as the comparison class. We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. In both cases, we give bounds that show that the performance of our algorithms approaches the best achievable by any policy in the comparison class. Most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research