Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Uri Gadot; Esther Derman; Navdeep Kumar; Maxence Mohamed Elfatihi,; Kfir Levy; Shie Mannor

arXiv:2309.01107·cs.LG·February 13, 2024·1 cites

Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi,, Kfir Levy, Shie Mannor

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel approach to robust Markov decision processes with non-rectangular reward uncertainty, leveraging frequency regularization to achieve less conservative and more practical policies, supported by a new policy-gradient method.

Contribution

It proposes a new class of reward-robust MDPs with coupled reward uncertainty and develops a policy-gradient algorithm with convergence guarantees.

Findings

01

Learned policies demonstrate increased robustness.

02

Reduced conservativeness compared to traditional methods.

03

Numerical experiments validate theoretical claims.

Abstract

In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance sensitivity to misspecified environments. Yet, to preserve computational tractability, the uncertainty set is traditionally independently structured for each state. This so-called rectangularity condition is solely motivated by computational concerns. As a result, it lacks a practical incentive and may lead to overly conservative behavior. In this work, we study coupled reward RMDPs where the transition kernel is fixed, but the reward function lies within an $α$ -radius from a nominal one. We draw a direct connection between this type of non-rectangular reward-RMDPs and applying policy visitation frequency regularization. We introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Solving Non-rectangular Reward-Robust MDPs via Frequency Regularization· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Fault Detection and Control Systems