Imitation Learning by State-Only Distribution Matching
Damian Boborzi, Christoph-Nikolas Straehle, Jens S. Buchner, Lars, Mikelsons

TL;DR
This paper introduces a non-adversarial imitation learning method that matches state transition distributions, improving robustness and sample efficiency, with a reliable convergence metric, achieving state-of-the-art results in continuous control tasks.
Contribution
It proposes a novel non-adversarial approach for imitation learning from observations using KL divergence minimization and density models, enhancing stability and interpretability.
Findings
Achieves state-of-the-art performance on continuous control benchmarks.
Provides a reliable convergence and performance estimation metric.
Demonstrates improved robustness and sample efficiency over adversarial methods.
Abstract
Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task. While many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator. If the true environment reward is unknown and cannot be used to select the best-performing model, this can result in bad real-world policy performance. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric. Our training objective minimizes the Kulback-Leibler divergence (KLD) between the policy and expert state transition trajectories which can be optimized in a non-adversarial fashion. Such methods demonstrate improved robustness when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
MethodsDense Connections · Adam · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Soft Actor Critic
