Policy Gradient for Robust Markov Decision Processes

Qiuhao Wang; Shaohang Xu; Chin Pang Ho; Marek Petrik

arXiv:2410.22114·cs.LG·November 1, 2024

Policy Gradient for Robust Markov Decision Processes

Qiuhao Wang, Shaohang Xu, Chin Pang Ho, Marek Petrik

PDF

Open Access 1 Repo

TL;DR

This paper introduces DRPMD, a novel policy gradient method with global optimality guarantees for robust MDPs, addressing model ambiguity and ensuring convergence in complex decision-making scenarios.

Contribution

The paper presents DRPMD, a new policy gradient algorithm with convergence guarantees for robust MDPs, including analysis, novel transition kernels, and empirical validation.

Findings

01

DRPMD guarantees convergence to a globally optimal policy.

02

Empirical results demonstrate robustness across various settings.

03

New parametric transition kernels extend applicability to continuous spaces.

Abstract

We develop a generic policy gradient method with the global optimality guarantee for robust Markov Decision Processes (MDPs). While policy gradient methods are widely used for solving dynamic decision problems due to their scalable and efficient nature, adapting these methods to account for model ambiguity has been challenging, often making it impractical to learn robust policies. This paper introduces a novel policy gradient method, Double-Loop Robust Policy Mirror Descent (DRPMD), for solving robust MDPs. DRPMD employs a general mirror descent update rule for the policy optimization with adaptive tolerance per iteration, guaranteeing convergence to a globally optimal policy. We provide a comprehensive analysis of DRPMD, including new convergence results under both direct and softmax parameterizations, and provide novel insights into the inner problem solution through Transition Mirror…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JerrisonWang/JMLR-DRPMD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsSoftmax