ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning
Kun Wu, Yinuo Zhao, Zhiyuan Xu, Zhengping Che, Chengxiang Yin, Chi, Harold Liu, Feiferi Feng, Jian Tang

TL;DR
ACL-QL introduces an adaptive conservative control mechanism in offline reinforcement learning, allowing nuanced Q-value regulation per state-action pair, leading to improved performance and safety over fixed conservative methods.
Contribution
The paper presents a novel framework and algorithm for adaptively controlling conservative levels in Q-learning, addressing over-conservatism and fixed constraint issues in offline RL.
Findings
Achieves state-of-the-art results on D4RL benchmarks.
Effectively balances conservatism and performance.
Demonstrates the benefits of adaptive conservative control.
Abstract
Offline Reinforcement Learning (RL), which operates solely on static datasets without further interactions with the environment, provides an appealing alternative to learning a safe and promising control policy. The prevailing methods typically learn a conservative policy to mitigate the problem of Q-value overestimation, but it is prone to overdo it, leading to an overly conservative policy. Moreover, they optimize all samples equally with fixed constraints, lacking the nuanced ability to control conservative levels in a fine-grained manner. Consequently, this limitation results in a performance decline. To address the above two challenges in a united way, we propose a framework, Adaptive Conservative Level in Q-Learning (ACL-QL), which limits the Q-values in a mild range and enables adaptive control on the conservative level over each state-action pair, i.e., lifting the Q-values more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsQ-Learning
