One-Shot Imitation under Mismatched Execution
Kushal Kedia, Prithwish Dan, Angela Chao, Maximus Adrian Pace,, Sanjiban Choudhury

TL;DR
RHyME is a novel framework that enables robots to imitate human demonstrations by automatically pairing and synthesizing human videos from robot trajectories, overcoming data pairing and visual similarity challenges.
Contribution
The paper introduces RHyME, a sequence-level optimal transport-based method for cross-embodiment imitation that does not require paired data or frame-level visual similarity.
Findings
Achieves over 50% increase in task success rate.
Successfully imitates cross-embodiment demonstrations in simulation and real-world.
Facilitates policy training without paired human-robot data.
Abstract
Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods for human-robot translation either depend on paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques
