Imitation Learning by State-Only Distribution Matching

Damian Boborzi; Christoph-Nikolas Straehle; Jens S. Buchner; Lars; Mikelsons

arXiv:2202.04332·cs.LG·October 2, 2024

Imitation Learning by State-Only Distribution Matching

Damian Boborzi, Christoph-Nikolas Straehle, Jens S. Buchner, Lars, Mikelsons

PDF

Open Access 1 Repo

TL;DR

This paper introduces a non-adversarial imitation learning method that matches state transition distributions, improving robustness and sample efficiency, with a reliable convergence metric, achieving state-of-the-art results in continuous control tasks.

Contribution

It proposes a novel non-adversarial approach for imitation learning from observations using KL divergence minimization and density models, enhancing stability and interpretability.

Findings

01

Achieves state-of-the-art performance on continuous control benchmarks.

02

Provides a reliable convergence and performance estimation metric.

03

Demonstrates improved robustness and sample efficiency over adversarial methods.

Abstract

Imitation Learning from observation describes policy learning in a similar way to human learning. An agent's policy is trained by observing an expert performing a task. While many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator. If the true environment reward is unknown and cannot be used to select the best-performing model, this can result in bad real-world policy performance. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric. Our training objective minimizes the Kulback-Leibler divergence (KLD) between the policy and expert state transition trajectories which can be optimized in a non-adversarial fashion. Such methods demonstrate improved robustness when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

FeMa42/soil-tdm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research

MethodsDense Connections · Adam · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Soft Actor Critic