Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward   Bias in Adversarial Imitation Learning

Ilya Kostrikov; Kumar Krishna Agrawal; Debidatta Dwibedi; Sergey; Levine; Jonathan Tompson

arXiv:1809.02925·cs.LG·October 16, 2018·67 cites

Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey, Levine, Jonathan Tompson

PDF

Open Access 3 Repos

TL;DR

This paper introduces Discriminator-Actor-Critic, an off-policy reinforcement learning algorithm that reduces sample complexity and eliminates reward bias in adversarial imitation learning, enabling more efficient and generalizable imitation of expert behavior.

Contribution

The paper proposes a novel off-policy algorithm that addresses sample inefficiency and reward bias issues in adversarial imitation learning, improving performance and applicability.

Findings

01

Reduces environment interaction sample complexity by an average of 10 times.

02

Produces unbiased reward functions applicable across various tasks.

03

Demonstrates improved imitation performance in multiple environments.

Abstract

We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they require a prohibitively large number of interactions with the environment in order to imitate the expert for many real-world applications. In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. Furthermore, since our reward function is designed to be unbiased, we can apply our algorithm to many problems without making any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Anomaly Detection Techniques and Applications