Primal Wasserstein Imitation Learning
Robert Dadashi, L\'eonard Hussenot, Matthieu Geist, Olivier Pietquin

TL;DR
This paper introduces Primal Wasserstein Imitation Learning (PWIL), a new offline IL method that efficiently matches expert behavior using Wasserstein distance, requiring minimal fine-tuning and demonstrating strong results on continuous control tasks.
Contribution
PWIL is a novel IL approach that derives a reward function offline based on Wasserstein distance, differing from adversarial methods that require environment interactions for reward learning.
Findings
Successfully recovers expert behavior on MuJoCo tasks
Achieves sample efficiency in both agent and expert interactions
Matches expert behavior using Wasserstein distance as a metric
Abstract
Imitation Learning (IL) methods seek to match the behavior of an agent with that of an expert. In the present work, we propose a new IL method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL), which ties to the primal form of the Wasserstein distance between the expert and the agent state-action distributions. We present a reward function which is derived offline, as opposed to recent adversarial IL algorithms that learn a reward function through interactions with the environment, and which requires little fine-tuning. We show that we can recover expert behavior on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of agent interactions and of expert interactions with the environment. Finally, we show that the behavior of the agent we train matches the behavior of the expert with the Wasserstein…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
MethodsPrimal Wasserstein Imitation Learning
