Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu; Shuangfei Zhai; Nitish Srivastava; Joshua Susskind; Jian; Zhang; Ruslan Salakhutdinov; Hanlin Goh

arXiv:2105.08140·cs.LG·May 19, 2021·20 cites

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua Susskind, Jian, Zhang, Ruslan Salakhutdinov, Hanlin Goh

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces UWAC, an offline RL algorithm that effectively manages uncertainty by down-weighting OOD actions, leading to improved stability and performance on various tasks.

Contribution

The paper proposes a novel uncertainty-aware actor-critic method using dropout-based estimation to enhance offline RL stability and performance.

Findings

01

UWAC improves training stability.

02

UWAC outperforms existing offline RL methods.

03

Significant gains on datasets with sparse human demonstrations.

Abstract

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Neural dynamics and brain function

MethodsQ-Learning