On the convex formulations of robust Markov decision processes

Julien Grand-Cl\'ement; Marek Petrik

arXiv:2209.10187·math.OC·December 14, 2023·1 cites

On the convex formulations of robust Markov decision processes

Julien Grand-Cl\'ement, Marek Petrik

PDF

Open Access

TL;DR

This paper introduces the first convex optimization formulation for robust Markov decision processes, enabling more tractable solutions under uncertainty with potential for further research and practical applications.

Contribution

It presents a novel convex formulation of RMDPs using entropic regularization and exponential change of variables, applicable to various uncertainty sets.

Findings

01

Convex formulation derived with polynomial variables and constraints.

02

Simplified formulations for polyhedral, ellipsoidal, and entropy-based uncertainty sets.

03

Reformulation of RMDPs as conic programs with exponential, quadratic, and non-negative cones.

Abstract

Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration