A Bayesian Approach to Robust Reinforcement Learning

Esther Derman; Daniel Mankowitz; Timothy Mann; Shie Mannor

arXiv:1905.08188·cs.LG·July 25, 2019·20 cites

A Bayesian Approach to Robust Reinforcement Learning

Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces a Bayesian method for robust reinforcement learning using the Uncertainty Robust Bellman Equation, enabling safer exploration, faster adaptation to changing dynamics, and improved robustness in uncertain environments.

Contribution

It proposes the URBE framework and DQN-URBE algorithm, advancing robust RL by adaptive uncertainty set learning and scalable implementation.

Findings

01

URBE encourages safe exploration and adapts uncertainty sets to new data.

02

DQN-URBE scales to high-dimensional domains.

03

The method outperforms fixed-uncertainty approaches in adaptability and robustness.

Abstract

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Control Systems and Identification · Advanced Control Systems Optimization