Imagination-Limited Q-Learning for Offline Reinforcement Learning
Wenhui Liu, Zhijian Wu, Jingchao Wang, Dingjiang Huang, Shuigeng Zhou

TL;DR
This paper introduces Imagination-Limited Q-learning (ILQ), a novel offline RL method that balances optimism for out-of-distribution actions with conservative evaluation, achieving state-of-the-art results.
Contribution
ILQ uses a dynamics model to imagine and clip OOD action-values, maintaining optimism without overestimation, and provides theoretical convergence guarantees.
Findings
Achieves state-of-the-art performance on D4RL benchmarks.
Proves convergence and bounded error in tabular MDPs.
Effectively mitigates bias in OOD value estimates.
Abstract
Offline reinforcement learning seeks to derive improved policies entirely from historical data but often struggles with over-optimistic value estimates for out-of-distribution (OOD) actions. This issue is typically mitigated via policy constraint or conservative value regularization methods. However, these approaches may impose overly constraints or biased value estimates, potentially limiting performance improvements. To balance exploitation and restriction, we propose an Imagination-Limited Q-learning (ILQ) method, which aims to maintain the optimism that OOD actions deserve within appropriate limits. Specifically, we utilize the dynamics model to imagine OOD action-values, and then clip the imagined values with the maximum behavior values. Such design maintains reasonable evaluation of OOD actions to the furthest extent, while avoiding its over-optimism. Theoretically, we prove the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsQ-Learning · Contrastive Language-Image Pre-training
