Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Mengmeng Li; Daniel Kuhn; Tobias Sutter

arXiv:2305.19004·math.OC·September 30, 2025·2 cites

Policy Gradient Algorithms for Robust MDPs with Non-Rectangular Uncertainty Sets

Mengmeng Li, Daniel Kuhn, Tobias Sutter

PDF

Open Access

TL;DR

This paper introduces new policy gradient algorithms for robust MDPs with non-rectangular uncertainty sets, providing the first complete solution scheme with global guarantees and demonstrating favorable numerical performance.

Contribution

It develops the first comprehensive algorithms for robust MDPs with non-rectangular uncertainty sets, including evaluation, policy gradient, and actor-critic methods with theoretical guarantees.

Findings

01

Algorithms outperform state-of-the-art methods in experiments

02

Proposed methods provide global optimality guarantees

03

Approximation error scales with non-rectangularity measure

Abstract

We propose policy gradient algorithms for robust infinite-horizon Markov decision processes (MDPs) with non-rectangular uncertainty sets, thereby addressing an open challenge in the robust MDP literature. Indeed, uncertainty sets that display statistical optimality properties and make optimal use of limited data often fail to be rectangular. Unfortunately, the corresponding robust MDPs cannot be solved with dynamic programming techniques and are in fact provably intractable. We first present a randomized projected Langevin dynamics algorithm that solves the robust policy evaluation problem to global optimality but is inefficient. We also propose a deterministic policy gradient method that is efficient but solves the robust policy evaluation problem only approximately, and we prove that the approximation error scales with a new measure of non-rectangularity of the uncertainty set.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Electric Power System Optimization · Auction Theory and Applications

Methodsfail