Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity
Runyu Zhang, Yang Hu, Na Li

TL;DR
This paper establishes an equivalence between a new risk-sensitive MDP formulation and soft robust MDPs, derives policy gradient methods with convergence guarantees, and introduces a sample-efficient offline learning algorithm.
Contribution
It introduces a novel risk-sensitive MDP formulation, proves its equivalence with soft robust MDPs, and develops policy gradient and sample complexity results.
Findings
Established equivalence between risk-sensitive MDPs and soft robust MDPs.
Derived policy gradient theorem with convergence guarantees.
Proposed a sample-efficient offline learning algorithm (RFZI).
Abstract
Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are both powerful tools for making decisions in the presence of uncertainties. Previous efforts have aimed to establish their connections, revealing equivalences in specific formulations. This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure (Ruszczy\'nski 2010), and establishes its equivalence with a class of soft robust MDP (RMDP) problems, including the standard RMDP as a special case. Leveraging this equivalence, we further derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method under the tabular setting with direct parameterization. This forms a sharp contrast to the Markov risk measure, known to be potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
