Imitation Learning via Off-Policy Distribution Matching

Ilya Kostrikov; Ofir Nachum; Jonathan Tompson

arXiv:1912.05032·cs.LG·December 12, 2019·30 cites

Imitation Learning via Off-Policy Distribution Matching

Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

PDF

Open Access 3 Repos

TL;DR

This paper introduces ValueDICE, an off-policy distribution matching method for imitation learning that improves data efficiency and eliminates the need for separate reinforcement learning steps, achieving state-of-the-art results.

Contribution

It transforms the distribution ratio estimation into an off-policy objective, enabling direct imitation policy learning without explicit rewards.

Findings

01

Achieves state-of-the-art sample efficiency in benchmarks.

02

Eliminates the need for separate RL optimization.

03

Demonstrates superior performance over existing methods.

Abstract

When performing imitation learning from expert demonstrations, distribution matching is a popular approach, in which one alternates between estimating distribution ratios and then using these ratios as rewards in a standard reinforcement learning (RL) algorithm. Traditionally, estimation of the distribution ratio requires on-policy data, which has caused previous work to either be exorbitantly data-inefficient or alter the original objective in a manner that can drastically change its optimum. In this work, we show how the original distribution ratio estimation objective may be transformed in a principled manner to yield a completely off-policy objective. In addition to the data-efficiency that this provides, we are able to show that this objective also renders the use of a separate RL optimization unnecessary.Rather, an imitation policy may be learned directly from this objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Human Pose and Action Recognition