Imitation Learning from a Single Temporally Misaligned Video

William Huey; Huaxiaoyue Wang; Anne Wu; Yoav Artzi; Sanjiban Choudhury

arXiv:2502.05397·cs.LG·July 16, 2025

Imitation Learning from a Single Temporally Misaligned Video

William Huey, Huaxiaoyue Wang, Anne Wu, Yoav Artzi, Sanjiban Choudhury

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ORCA, a sequence-level reward function for imitation learning from a single, temporally misaligned demonstration, significantly improving task performance over frame-level methods.

Contribution

The paper proposes ORCA, a novel sequence-based alignment method that enforces correct temporal ordering, addressing limitations of frame-level matching in imitation learning from misaligned demonstrations.

Findings

01

ORCA achieves 4.5x improvement on Meta-world tasks.

02

ORCA achieves 6.6x improvement on Humanoid-v4 tasks.

03

Empirical analysis confirms ORCA's robustness to temporal misalignment.

Abstract

We examine the problem of learning sequential tasks from a single visual demonstration. A key challenge arises when demonstrations are temporally misaligned due to variations in timing, differences in embodiment, or inconsistencies in execution. Existing approaches treat imitation as a distribution-matching problem, aligning individual frames between the agent and the demonstration. However, we show that such frame-level matching fails to enforce temporal ordering or ensure consistent progress. Our key insight is that matching should instead be defined at the level of sequences. We propose that perfect matching occurs when one sequence successfully covers all the subgoals in the same order as the other sequence. We present ORCA (ORdered Coverage Alignment), a dense per-timestep reward function that measures the probability of the agent covering demonstration frames in the correct order.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

portal-cornell/orca
pytorchOfficial

Videos

Imitation Learning from a Single Temporally Misaligned Video· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications