Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Gaon An, Seungyong Moon, Jang-Hyun Kim, Hyun Oh Song

TL;DR
This paper introduces an uncertainty-based offline RL method using Q-ensemble diversification, which outperforms existing methods by effectively penalizing OOD data without requiring explicit data distribution estimation.
Contribution
It proposes a novel ensemble-diversified actor-critic algorithm leveraging clipped Q-learning to improve offline RL performance with fewer networks.
Findings
Outperforms existing offline RL methods on D4RL benchmarks
Ensemble diversification reduces the number of networks needed by tenfold
Clipped Q-learning effectively penalizes high-uncertainty OOD data
Abstract
Offline reinforcement learning (offline RL), which aims to find an optimal policy from a previously collected static dataset, bears algorithmic difficulties due to function approximation errors from out-of-distribution (OOD) data points. To this end, offline RL algorithms adopt either a constraint or a penalty term that explicitly guides the policy to stay close to the given dataset. However, prior methods typically require accurate estimation of the behavior policy or sampling from OOD data points, which themselves can be a non-trivial problem. Moreover, these methods under-utilize the generalization ability of deep neural networks and often fall into suboptimal solutions too close to the given dataset. In this work, we propose an uncertainty-based offline RL method that takes into account the confidence of the Q-value prediction and does not require any estimation or sampling of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Elevator Systems and Control
MethodsQ-Learning
