ACL-QL: Adaptive Conservative Level in Q-Learning for Offline   Reinforcement Learning

Kun Wu; Yinuo Zhao; Zhiyuan Xu; Zhengping Che; Chengxiang Yin; Chi; Harold Liu; Feiferi Feng; Jian Tang

arXiv:2412.16848·cs.LG·March 18, 2025

ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning

Kun Wu, Yinuo Zhao, Zhiyuan Xu, Zhengping Che, Chengxiang Yin, Chi, Harold Liu, Feiferi Feng, Jian Tang

PDF

TL;DR

ACL-QL introduces an adaptive conservative control mechanism in offline reinforcement learning, allowing nuanced Q-value regulation per state-action pair, leading to improved performance and safety over fixed conservative methods.

Contribution

The paper presents a novel framework and algorithm for adaptively controlling conservative levels in Q-learning, addressing over-conservatism and fixed constraint issues in offline RL.

Findings

01

Achieves state-of-the-art results on D4RL benchmarks.

02

Effectively balances conservatism and performance.

03

Demonstrates the benefits of adaptive conservative control.

Abstract

Offline Reinforcement Learning (RL), which operates solely on static datasets without further interactions with the environment, provides an appealing alternative to learning a safe and promising control policy. The prevailing methods typically learn a conservative policy to mitigate the problem of Q-value overestimation, but it is prone to overdo it, leading to an overly conservative policy. Moreover, they optimize all samples equally with fixed constraints, lacking the nuanced ability to control conservative levels in a fine-grained manner. Consequently, this limitation results in a performance decline. To address the above two challenges in a united way, we propose a framework, Adaptive Conservative Level in Q-Learning (ACL-QL), which limits the Q-values in a mild range and enables adaptive control on the conservative level over each state-action pair, i.e., lifting the Q-values more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning