Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Juntao Ren, Priya Sundaresan, Dorsa Sadigh, Sanjiban Choudhury, Jeannette Bohg

TL;DR
This paper introduces a novel imitation learning approach using motion tracks as a unified action representation, enabling robots to learn tasks from minimal human videos and limited demonstrations with high success rates.
Contribution
The paper proposes Motion Track Policy (MT-pi), a new IL method that uses 2D motion trajectories as actions, bridging human videos and robot demonstrations for efficient task learning.
Findings
Achieves 86.5% success rate across 4 real-world tasks.
Outperforms state-of-the-art IL baselines by 40%.
Generalizes to scenarios only seen in human videos.
Abstract
Teaching robots to autonomously complete everyday tasks remains a challenge. Imitation Learning (IL) is a powerful approach that imbues robots with skills via demonstrations, but is limited by the labor-intensive process of collecting teleoperated robot data. Human videos offer a scalable alternative, but it remains difficult to directly train IL policies from them due to the lack of robot action labels. To address this, we propose to represent actions as short-horizon 2D trajectories on an image. These actions, or motion tracks, capture the predicted direction of motion for either human hands or robot end-effectors. We instantiate an IL policy called Motion Track Policy (MT-pi) which receives image observations and outputs motion tracks as actions. By leveraging this unified, cross-embodiment action space, MT-pi completes tasks with high success given just minutes of human video and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
