A Note on Optimization Formulations of Markov Decision Processes

Lexing Ying; Yuhua Zhu

arXiv:2012.09417·math.OC·December 18, 2020

A Note on Optimization Formulations of Markov Decision Processes

Lexing Ying, Yuhua Zhu

PDF

Open Access

TL;DR

This paper reviews various optimization formulations for Markov decision processes, including linear programming, Bellman equations, and policy gradients, across different settings like discounted, undiscounted, and entropy-regularized.

Contribution

It provides a comprehensive summary of the primal, dual, and primal-dual formulations and their connections in MDP optimization, clarifying their relationships across multiple settings.

Findings

01

Unified view of MDP optimization formulations

02

Connections between linear programming, Bellman equations, and policy gradients

03

Clarification of formulations in entropy-regularized settings

Abstract

This note summarizes the optimization formulations used in the study of Markov decision processes. We consider both the discounted and undiscounted processes under the standard and the entropy-regularized settings. For each setting, we first summarize the primal, dual, and primal-dual problems of the linear programming formulation. We then detail the connections between these problems and other formulations for Markov decision processes such as the Bellman equation and the policy gradient method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClimate Change Policy and Economics · Optimization and Variational Analysis · Economic theories and models