From Convex Optimization to MDPs: A Review of First-Order, Second-Order and Quasi-Newton Methods for MDPs
Julien Grand-Cl\'ement

TL;DR
This review explores the deep connections between classical MDP algorithms and modern convex optimization methods, highlighting recent advances and proposing new algorithmic approaches inspired by optimization techniques.
Contribution
It classifies MDP algorithms into first-order, second-order, and quasi-Newton categories, linking them to convex optimization methods and suggesting new algorithmic developments.
Findings
Value Iteration linked to first-order methods
Policy Iteration linked to second-order methods
Quasi-Newton methods like Anderson acceleration applied to MDPs
Abstract
In this paper we present a review of the connections between classical algorithms for solving Markov Decision Processes (MDPs) and classical gradient-based algorithms in convex optimization. Some of these connections date as far back as the 1980s, but they have gained momentum in recent years and have lead to faster algorithms for solving MDPs. In particular, two of the most popular methods for solving MDPs, Value Iteration and Policy Iteration, can be linked to first-order and second-order methods in convex optimization. In addition, recent results in quasi-Newton methods lead to novel algorithms for MDPs, such as Anderson acceleration. By explicitly classifying algorithms for MDPs as first-order, second-order, and quasi-Newton methods, we hope to provide a better understanding of these algorithms, and, further expanding this analogy, to help to develop novel algorithms for MDPs, based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs · Optimization and Search Problems
