Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement   Learning

Chenjia Bai; Lingxiao Wang; Zhuoran Yang; Zhihong Deng; Animesh Garg,; Peng Liu; Zhaoran Wang

arXiv:2202.11566·cs.LG·February 24, 2022·24 cites

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg,, Peng Liu, Zhaoran Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces PBRL, a novel offline RL algorithm that uses uncertainty quantification and pessimistic updates based on bootstrapped Q-function disagreement, improving performance without explicit policy constraints.

Contribution

The paper proposes a purely uncertainty-driven offline RL method using bootstrapped Q-function disagreement for pessimistic updates, with a new OOD sampling technique and theoretical guarantees.

Findings

01

PBRL outperforms state-of-the-art algorithms on D4RL benchmarks.

02

The method provides provable uncertainty quantification in linear MDPs.

03

PBRL avoids explicit policy constraints, enabling better generalization.

Abstract

Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. Previous methods tackle such problem by penalizing the Q-values of OOD actions or constraining the trained policy to be close to the behavior policy. Nevertheless, such methods typically prevent the generalization of value functions beyond the offline data and also lack precise characterization of OOD data. In this paper, we propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints. Specifically, PBRL conducts uncertainty quantification via the disagreement of bootstrapped Q-functions, and performs pessimistic updates by penalizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baichenjia/pbrl
pytorchOfficial

Videos

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning