Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient,   and Sample Complexity

Runyu Zhang; Yang Hu; Na Li

arXiv:2306.11626·math.OC·May 27, 2024·1 cites

Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

Runyu Zhang, Yang Hu, Na Li

PDF

Open Access 1 Repo

TL;DR

This paper establishes an equivalence between a new risk-sensitive MDP formulation and soft robust MDPs, derives policy gradient methods with convergence guarantees, and introduces a sample-efficient offline learning algorithm.

Contribution

It introduces a novel risk-sensitive MDP formulation, proves its equivalence with soft robust MDPs, and develops policy gradient and sample complexity results.

Findings

01

Established equivalence between risk-sensitive MDPs and soft robust MDPs.

02

Derived policy gradient theorem with convergence guarantees.

03

Proposed a sample-efficient offline learning algorithm (RFZI).

Abstract

Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are both powerful tools for making decisions in the presence of uncertainties. Previous efforts have aimed to establish their connections, revealing equivalences in specific formulations. This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure (Ruszczy\'nski 2010), and establishes its equivalence with a class of soft robust MDP (RMDP) problems, including the standard RMDP as a special case. Leveraging this equivalence, we further derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method under the tabular setting with direct parameterization. This forms a sharp contrast to the Markov risk measure, known to be potentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huyangsh/risk-sensitive-RL_ICRL-2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics