Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating   The Worst Kernel

Kaixin Wang; Uri Gadot; Navdeep Kumar; Kfir Levy; Shie Mannor

arXiv:2306.05859·cs.LG·February 13, 2024·1 cites

Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

PDF

Open Access

TL;DR

EWoK is an online method that enhances existing non-robust RL algorithms to learn robust policies in high-dimensional RMDPs by estimating the worst transition kernel, demonstrated across various environments.

Contribution

The paper introduces EWoK, a flexible, scalable approach that enables any off-the-shelf RL algorithm to handle robust MDPs by estimating worst-case transition kernels.

Findings

01

EWoK effectively learns robust policies in high-dimensional environments.

02

EWoK outperforms existing methods in robustness and scalability.

03

EWoK can be integrated with various off-the-shelf RL algorithms.

Abstract

Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf {\em non-robust} RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)