Imagination-Limited Q-Learning for Offline Reinforcement Learning

Wenhui Liu; Zhijian Wu; Jingchao Wang; Dingjiang Huang; Shuigeng Zhou

arXiv:2505.12211·cs.LG·May 20, 2025

Imagination-Limited Q-Learning for Offline Reinforcement Learning

Wenhui Liu, Zhijian Wu, Jingchao Wang, Dingjiang Huang, Shuigeng Zhou

PDF

Open Access

TL;DR

This paper introduces Imagination-Limited Q-learning (ILQ), a novel offline RL method that balances optimism for out-of-distribution actions with conservative evaluation, achieving state-of-the-art results.

Contribution

ILQ uses a dynamics model to imagine and clip OOD action-values, maintaining optimism without overestimation, and provides theoretical convergence guarantees.

Findings

01

Achieves state-of-the-art performance on D4RL benchmarks.

02

Proves convergence and bounded error in tabular MDPs.

03

Effectively mitigates bias in OOD value estimates.

Abstract

Offline reinforcement learning seeks to derive improved policies entirely from historical data but often struggles with over-optimistic value estimates for out-of-distribution (OOD) actions. This issue is typically mitigated via policy constraint or conservative value regularization methods. However, these approaches may impose overly constraints or biased value estimates, potentially limiting performance improvements. To balance exploitation and restriction, we propose an Imagination-Limited Q-learning (ILQ) method, which aims to maintain the optimism that OOD actions deserve within appropriate limits. Specifically, we utilize the dynamics model to imagine OOD action-values, and then clip the imagined values with the maximum behavior values. Such design maintains reasonable evaluation of OOD actions to the furthest extent, while avoiding its over-optimism. Theoretically, we prove the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsQ-Learning · Contrastive Language-Image Pre-training