A Note on Optimization Formulations of Markov Decision Processes
Lexing Ying, Yuhua Zhu

TL;DR
This paper reviews various optimization formulations for Markov decision processes, including linear programming, Bellman equations, and policy gradients, across different settings like discounted, undiscounted, and entropy-regularized.
Contribution
It provides a comprehensive summary of the primal, dual, and primal-dual formulations and their connections in MDP optimization, clarifying their relationships across multiple settings.
Findings
Unified view of MDP optimization formulations
Connections between linear programming, Bellman equations, and policy gradients
Clarification of formulations in entropy-regularized settings
Abstract
This note summarizes the optimization formulations used in the study of Markov decision processes. We consider both the discounted and undiscounted processes under the standard and the entropy-regularized settings. For each setting, we first summarize the primal, dual, and primal-dual problems of the linear programming formulation. We then detail the connections between these problems and other formulations for Markov decision processes such as the Bellman equation and the policy gradient method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate Change Policy and Economics · Optimization and Variational Analysis · Economic theories and models
