Towards Generalisable Imitation Learning Through Conditioned Transition Estimation and Online Behaviour Alignment
Nathan Gavenski, Matteo Leonetti, Odinaldo Rodrigues

TL;DR
This paper introduces UfO, an unsupervised imitation learning method that estimates true actions from observations and aligns agent behavior with teachers, outperforming existing ILfO methods in generalization and stability.
Contribution
The paper proposes a novel two-stage unsupervised imitation learning framework that estimates true actions and aligns behaviors, addressing key limitations of prior ILfO approaches.
Findings
UfO outperforms existing ILfO methods in five environments.
UfO demonstrates the smallest standard deviation, indicating better generalization.
UfO effectively estimates true actions without supervision.
Abstract
State-of-the-art imitation learning from observation methods (ILfO) have recently made significant progress, but they still have some limitations: they need action-based supervised optimisation, assume that states have a single optimal action, and tend to apply teacher actions without full consideration of the actual environment state. While the truth may be out there in observed trajectories, existing methods struggle to extract it without supervision. In this work, we propose Unsupervised Imitation Learning from Observation (UfO) that addresses all of these limitations. UfO learns a policy through a two-stage process, in which the agent first obtains an approximation of the teacher's true actions in the observed state transitions, and then refines the learned policy further by adjusting agent trajectories to closely align them with the teacher's. Experiments we conducted in five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications
