Robust Reinforcement Learning for Continuous Control with Model   Misspecification

Daniel J. Mankowitz; Nir Levine; Rae Jeong; Yuanyuan Shi and; Jackie Kay; Abbas Abdolmaleki; Jost Tobias Springenberg; Timothy; Mann; Todd Hester; Martin Riedmiller

arXiv:1906.07516·cs.LG·February 12, 2020·38 cites

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Daniel J. Mankowitz, Nir Levine, Rae Jeong, Yuanyuan Shi and, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy, Mann, Todd Hester, Martin Riedmiller

PDF

Open Access

TL;DR

This paper introduces a robustness framework for continuous control RL algorithms, enhancing their performance under model perturbations by optimizing for worst-case scenarios and demonstrating superior results across multiple Mujoco environments.

Contribution

It develops a novel robust and soft-robust policy optimization method integrated into MPO, with new Bellman operators, and validates improved robustness in diverse control tasks.

Findings

01

Robust policies outperform non-robust in Mujoco with environment perturbations.

02

Enhanced robustness demonstrated on a high-dimensional robotic hand.

03

Framework adaptable to other RL algorithms and offline data learning.

Abstract

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning