Latent Wasserstein Adversarial Imitation Learning

Siqi Yang; Kai Yan; Alexander G. Schwing; Yu-Xiong Wang

arXiv:2603.05440·cs.LG·March 6, 2026

Latent Wasserstein Adversarial Imitation Learning

Siqi Yang, Kai Yan, Alexander G. Schwing, Yu-Xiong Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces LWAIL, a new imitation learning framework that uses a dynamics-aware latent space and Wasserstein distance to learn from limited state-only demonstrations, outperforming previous methods.

Contribution

The paper proposes LWAIL, a novel adversarial imitation learning approach that leverages a pre-trained latent space for state-only distribution matching, reducing the need for action data.

Findings

01

LWAIL outperforms prior Wasserstein-based IL methods.

02

LWAIL achieves expert-level performance with only a few state-only episodes.

03

Experimental results on MuJoCo environments validate the effectiveness of LWAIL.

Abstract

Imitation Learning (IL) enables agents to mimic expert behavior by learning from demonstrations. However, traditional IL methods require large amounts of medium-to-high-quality demonstrations as well as actions of expert demonstrations, both of which are often unavailable. To reduce this need, we propose Latent Wasserstein Adversarial Imitation Learning (LWAIL), a novel adversarial imitation learning framework that focuses on state-only distribution matching. It benefits from the Wasserstein distance computed in a dynamics-aware latent space. This dynamics-aware latent space differs from prior work and is obtained via a pre-training stage, where we train the Intention Conditioned Value Function (ICVF) to capture a dynamics-aware structure of the state space using a small set of randomly generated state-only data. We show that this enhances the policy's understanding of state…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

-The idea of combining a latent, dynamics-aware embedding with a Wasserstein AIL objective is intuitive and well-motivated. -The results show consistent gains over baselines in low-data settings, and the ablations suggest that learning a latent embedding indeed facilitates imitation learning. -The paper provides a wide range of ablations (different embeddings, action noise, dynamics mismatch), which adds credibility to the empirical findings.

Weaknesses

-Overstated novelty. Methodologically, the approach is largely a combination of known components (Wasserstein AIL and ICVF-based embeddings) rather than a fundamentally new algorithm. The paper currently presents it as a new framework rather than as a targeted study of how these parts interact. - Fairness of comparisons. While the paper shows LWAIL with and without the latent, it does not systematically test whether other IL algorithms (e.g., IQ-Learn, OPOLO, GAIfO) would also benefit from the s

Reviewer 02Rating 2Confidence 4

Strengths

The paper identifies and addresses the weakness of using Euclidean metrics in Wasserstein IL by introducing a dynamics-aware latent space. Overall, the presentation is clear, well-structured, and easy to follow.

Weaknesses

1. **Limited novelty**: The contribution is primarily a combination of existing ideas (Wasserstein IL, adversarial learning, and latent representations), with the main advance being the integration of ICVF embeddings. 2. **Narrow experimental scope**: The experiments focus mainly on locomotion tasks. Given the illustrative example in Fig. 1, navigation-style tasks (such as those in Fig. 3) would have been more suitable to highlight the strengths of the proposed approach. 3. **Questionable nece

Reviewer 03Rating 6Confidence 4

Strengths

- The paper is clearly written, well-structured, and comprehensive, making the authors' contributions and methodology easy to follow. - The paper's primary contribution, LWAIL, is novel in its integration of an ICVF-learned latent space into a Wasserstein AIL framework to create a dynamics-aware distance metric. This is significant as it demonstrates expert-level performance on several locomotion tasks using only a single state-only expert trajectory, a challenging and practical problem setting.

Weaknesses

- The empirical evaluation, while thorough on the chosen tasks, is limited to locomotion (MuJoCo) and a simple maze environment. These environments are state-based and have relatively stable dynamics. The applicability of LWAIL to more complex, high-dimensional problems (e.g., vision-based tasks) or environments with more stochastic dynamics is not explored. - The paper does not include an ablation study on the sensitivity of key hyperparameters. Adversarial Imitation Learning frameworks are of

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning