PIQL: Projective Implicit Q-Learning with Support Constraint for Offline Reinforcement Learning
Xinchen Han, Hossam Afifi, Michel Marot

TL;DR
PIQL introduces a support constraint and a projection-based hyperparameter to improve implicit Q-learning, leading to better offline RL performance and state-of-the-art results on benchmarks.
Contribution
It proposes Projective IQL (PIQL), a novel offline RL method that enhances IQL with support constraints and multi-step evaluation for improved adaptability and performance.
Findings
Achieves state-of-the-art results on D4RL and NeoRL2 benchmarks.
Demonstrates robust performance across diverse offline RL domains.
Guarantees monotonic policy improvement with theoretical support.
Abstract
Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL relies on a fixed expectile hyperparameter and a density-based policy improvement method, both of which impede its adaptability and performance. In this paper, we propose Projective IQL (PIQL), a projective variant of IQL enhanced with a support constraint. In the policy evaluation stage, PIQL substitutes the fixed expectile hyperparameter with a projection-based parameter and extends the one-step value estimation to a multi-step formulation. In the policy improvement stage, PIQL adopts a support constraint instead of a density constraint, ensuring closer alignment with the policy evaluation. Theoretically, we demonstrate that PIQL maintains the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control · Reinforcement Learning in Robotics
MethodsQ-Learning · Implicit Q-Learning
