Mildly Conservative Q-Learning for Offline Reinforcement Learning

Jiafei Lyu; Xiaoteng Ma; Xiu Li; Zongqing Lu

arXiv:2206.04745·cs.LG·February 22, 2024·27 cites

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Jiafei Lyu, Xiaoteng Ma, Xiu Li, Zongqing Lu

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces Mildly Conservative Q-learning (MCQ), a novel offline RL method that balances conservatism and generalization, leading to improved performance and transferability on benchmark tasks.

Contribution

MCQ actively trains OOD actions with pseudo Q values, providing a theoretically justified approach that enhances offline RL performance without excessive pessimism.

Findings

01

MCQ outperforms prior methods on D4RL benchmarks.

02

MCQ demonstrates superior transfer from offline to online settings.

03

MCQ maintains conservative estimates without overestimating OOD actions.

Abstract

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated. However, existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic, which suppresses the generalization of the value function and hinders the performance improvement. This paper explores mild but enough conservatism for offline learning while not harming generalization. We propose Mildly Conservative Q-learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values. We theoretically show that MCQ induces a policy that behaves at least as well…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Mildly Conservative Q-Learning for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsQ-Learning