Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov; Ashvin Nair; Sergey Levine

arXiv:2110.06169·cs.LG·October 13, 2021·129 cites

Offline Reinforcement Learning with Implicit Q-Learning

Ilya Kostrikov, Ashvin Nair, Sergey Levine

PDF

Open Access 5 Repos 1 Video

TL;DR

Implicit Q-Learning (IQL) is an offline RL method that improves policies without evaluating unseen actions, leveraging a novel value estimation technique to outperform existing methods on benchmarks.

Contribution

We introduce IQL, a new offline RL algorithm that estimates the value of the best actions implicitly, avoiding direct evaluation of unseen actions and achieving state-of-the-art results.

Findings

01

IQL outperforms previous offline RL methods on D4RL benchmarks.

02

IQL effectively fine-tunes with online interaction after offline training.

03

The method demonstrates strong generalization without explicit action evaluation.

Abstract

Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy, and therefore need to either constrain these actions to be in-distribution, or else regularize their values. We propose an offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization. The main insight in our work is that, instead of evaluating unseen actions from the latest policy, we can approximate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Offline Reinforcement Learning with Implicit Q-Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Data Stream Mining Techniques

MethodsImplicit Q-Learning