Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar; Aurick Zhou; George Tucker; Sergey Levine

arXiv:2006.04779·cs.LG·August 20, 2020·536 cites

Conservative Q-Learning for Offline Reinforcement Learning

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

PDF

Open Access 5 Repos 1 Video

TL;DR

Conservative Q-learning (CQL) is a novel offline RL method that learns a conservative Q-function to mitigate overestimation and distributional shift, leading to significantly improved policy performance on complex datasets.

Contribution

This paper introduces CQL, a new offline RL algorithm with theoretical guarantees and practical implementation that outperforms existing methods on various control tasks.

Findings

01

CQL achieves 2-5 times higher final returns than existing offline RL methods.

02

CQL effectively reduces overestimation bias in value functions.

03

CQL performs well on both discrete and continuous control domains.

Abstract

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static datasets without further interaction. However, in practice, offline RL presents a major challenge, and standard off-policy RL methods can fail due to overestimation of values induced by the distributional shift between the dataset and the learned policy, especially when training on complex and multi-modal data distributions. In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the expected value of a policy under this Q-function lower-bounds its true value. We theoretically show that CQL produces a lower bound on the value of the current policy and that it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Conservative Q-Learning for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research

MethodsRandom Ensemble Mixture · Dense Connections · Convolution · Deep Q-Network · Q-Learning