Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency
Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson

TL;DR
This paper introduces D2-Imitation, a novel off-policy imitation learning method that improves sample efficiency without adversarial training by leveraging deterministic policies and TD learning.
Contribution
It presents a new off-policy imitation approach that avoids adversarial training, based on insights connecting Bellman equations and stationary distributions, and simplifies the process with deterministic policies.
Findings
D2-Imitation outperforms existing off-policy imitation methods in control tasks.
The approach achieves higher sample efficiency without adversarial training.
Empirical results validate the effectiveness of the proposed method.
Abstract
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Robot Manipulation and Learning
