Deterministic and Discriminative Imitation (D2-Imitation): Revisiting   Adversarial Imitation for Sample Efficiency

Mingfei Sun; Sam Devlin; Katja Hofmann; Shimon Whiteson

arXiv:2112.06054·cs.LG·April 14, 2022

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces D2-Imitation, a novel off-policy imitation learning method that improves sample efficiency without adversarial training by leveraging deterministic policies and TD learning.

Contribution

It presents a new off-policy imitation approach that avoids adversarial training, based on insights connecting Bellman equations and stationary distributions, and simplifies the process with deterministic policies.

Findings

01

D2-Imitation outperforms existing off-policy imitation methods in control tasks.

02

The approach achieves higher sample efficiency without adversarial training.

03

Empirical results validate the effectiveness of the proposed method.

Abstract

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mingfeisun/d2-imitation
tfOfficial

Videos

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Robot Manipulation and Learning