Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning
Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg,, Peng Liu, Zhaoran Wang

TL;DR
This paper introduces PBRL, a novel offline RL algorithm that uses uncertainty quantification and pessimistic updates based on bootstrapped Q-function disagreement, improving performance without explicit policy constraints.
Contribution
The paper proposes a purely uncertainty-driven offline RL method using bootstrapped Q-function disagreement for pessimistic updates, with a new OOD sampling technique and theoretical guarantees.
Findings
PBRL outperforms state-of-the-art algorithms on D4RL benchmarks.
The method provides provable uncertainty quantification in linear MDPs.
PBRL avoids explicit policy constraints, enabling better generalization.
Abstract
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. Previous methods tackle such problem by penalizing the Q-values of OOD actions or constraining the trained policy to be close to the behavior policy. Nevertheless, such methods typically prevent the generalization of value functions beyond the offline data and also lack precise characterization of OOD data. In this paper, we propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints. Specifically, PBRL conducts uncertainty quantification via the disagreement of bootstrapped Q-functions, and performs pessimistic updates by penalizing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
