Linear Programming for Large-Scale Markov Decision Problems
Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

TL;DR
This paper introduces scalable algorithms for large-scale Markov decision processes using linear programming, focusing on low-dimensional policy classes and providing performance bounds independent of state space size.
Contribution
It develops two novel algorithms based on stochastic convex optimization and constraint sampling for approximate policy optimization in large MDPs.
Findings
Algorithms approach the best within the comparison class.
Performance bounds are independent of state space size.
Preliminary experiments confirm effectiveness in queuing applications.
Abstract
We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost. Since it is intractable to compete with the optimal policy for large scale problems, we pursue the more modest goal of competing with a low-dimensional family of policies. We use the dual linear programming formulation of the MDP average cost problem, in which the variable is a stationary distribution over state-action pairs, and we consider a neighborhood of a low-dimensional subset of the set of stationary distributions (defined in terms of state-action features) as the comparison class. We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. In both cases, we give bounds that show that the performance of our algorithms approaches the best achievable by any policy in the comparison class. Most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
