Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov, Ashvin Nair, Sergey Levine

TL;DR
Implicit Q-Learning (IQL) is an offline RL method that improves policies without evaluating unseen actions, leveraging a novel value estimation technique to outperform existing methods on benchmarks.
Contribution
We introduce IQL, a new offline RL algorithm that estimates the value of the best actions implicitly, avoiding direct evaluation of unseen actions and achieving state-of-the-art results.
Findings
IQL outperforms previous offline RL methods on D4RL benchmarks.
IQL effectively fine-tunes with online interaction after offline training.
The method demonstrates strong generalization without explicit action evaluation.
Abstract
Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Data Stream Mining Techniques
MethodsImplicit Q-Learning
