Robust Reinforcement Learning for Continuous Control with Model Misspecification
Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi and, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy, Mann, Todd Hester, Martin Riedmiller

TL;DR
This paper introduces a robustness framework for continuous control RL algorithms, enhancing their performance under model perturbations by optimizing for worst-case scenarios and demonstrating superior results across multiple Mujoco environments.
Contribution
It develops a novel robust and soft-robust policy optimization method integrated into MPO, with new Bellman operators, and validates improved robustness in diverse control tasks.
Findings
Robust policies outperform non-robust in Mujoco with environment perturbations.
Enhanced robustness demonstrated on a high-dimensional robotic hand.
Framework adaptable to other RL algorithms and offline data learning.
Abstract
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
