Imitation Learning from Observation through Optimal Transport
Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek

TL;DR
This paper introduces a simplified optimal transport-based method for Imitation Learning from Observation that effectively imitates expert behavior using only observational data, without learned models or adversarial training.
Contribution
It presents a novel, model-free approach leveraging Wasserstein distance for ILfO, compatible with any RL algorithm, and demonstrates superior performance on continuous control tasks.
Findings
Achieves expert-level performance with single trajectory observations
Outperforms existing ILfO methods in various tasks
Simplifies reward generation without adversarial training
Abstract
Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
