Uncertainty-Based Offline Reinforcement Learning with Diversified   Q-Ensemble

Gaon An; Seungyong Moon; Jang-Hyun Kim; Hyun Oh Song

arXiv:2110.01548·cs.LG·October 6, 2021·48 cites

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble

Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces an uncertainty-based offline RL method using Q-ensemble diversification, which outperforms existing methods by effectively penalizing OOD data without requiring explicit data distribution estimation.

Contribution

It proposes a novel ensemble-diversified actor-critic algorithm leveraging clipped Q-learning to improve offline RL performance with fewer networks.

Findings

01

Outperforms existing offline RL methods on D4RL benchmarks

02

Ensemble diversification reduces the number of networks needed by tenfold

03

Clipped Q-learning effectively penalizes high-uncertainty OOD data

Abstract

Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset. However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem. Moreover, these methods under-utilize the generalization ability of deep neural networks and often fall into suboptimal solutions too close to the given dataset. In this work, we propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Elevator Systems and Control

MethodsQ-Learning