Towards Generalisable Imitation Learning Through Conditioned Transition Estimation and Online Behaviour Alignment

Nathan Gavenski; Matteo Leonetti; Odinaldo Rodrigues

arXiv:2601.17563·cs.LG·January 27, 2026

Towards Generalisable Imitation Learning Through Conditioned Transition Estimation and Online Behaviour Alignment

Nathan Gavenski, Matteo Leonetti, Odinaldo Rodrigues

PDF

Open Access

TL;DR

This paper introduces UfO, an unsupervised imitation learning method that estimates true actions from observations and aligns agent behavior with teachers, outperforming existing ILfO methods in generalization and stability.

Contribution

The paper proposes a novel two-stage unsupervised imitation learning framework that estimates true actions and aligns behaviors, addressing key limitations of prior ILfO approaches.

Findings

01

UfO outperforms existing ILfO methods in five environments.

02

UfO demonstrates the smallest standard deviation, indicating better generalization.

03

UfO effectively estimates true actions without supervision.

Abstract

State-of-the-art imitation learning from observation methods (ILfO) have recently made significant progress, but they still have some limitations: they need action-based supervised optimisation, assume that states have a single optimal action, and tend to apply teacher actions without full consideration of the actual environment state. While the truth may be out there in observed trajectories, existing methods struggle to extract it without supervision. In this work, we propose Unsupervised Imitation Learning from Observation (UfO) that addresses all of these limitations. UfO learns a policy through a two-stage process, in which the agent first obtains an approximation of the teacher's true actions in the observed state transitions, and then refines the learned policy further by adjusting agent trajectories to closely align them with the teacher's. Experiments we conducted in five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications