On the convex formulations of robust Markov decision processes
Julien Grand-Cl\'ement, Marek Petrik

TL;DR
This paper introduces the first convex optimization formulation for robust Markov decision processes, enabling more tractable solutions under uncertainty with potential for further research and practical applications.
Contribution
It presents a novel convex formulation of RMDPs using entropic regularization and exponential change of variables, applicable to various uncertainty sets.
Findings
Convex formulation derived with polynomial variables and constraints.
Simplified formulations for polyhedral, ellipsoidal, and entropy-based uncertainty sets.
Reformulation of RMDPs as conic programs with exponential, quadratic, and non-negative cones.
Abstract
Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. By using entropic regularization and exponential change of variables, we derive a convex formulation with a number of variables and constraints polynomial in the number of states and actions, but with large coefficients in the constraints. We further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or entropy-based uncertainty sets, showing that, in these cases,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProcess Optimization and Integration
