Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning

Juntao Ren; Priya Sundaresan; Dorsa Sadigh; Sanjiban Choudhury; Jeannette Bohg

arXiv:2501.06994·cs.RO·October 14, 2025

Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning

Juntao Ren, Priya Sundaresan, Dorsa Sadigh, Sanjiban Choudhury, Jeannette Bohg

PDF

TL;DR

This paper introduces a novel imitation learning approach using motion tracks as a unified action representation, enabling robots to learn tasks from minimal human videos and limited demonstrations with high success rates.

Contribution

The paper proposes Motion Track Policy (MT-pi), a new IL method that uses 2D motion trajectories as actions, bridging human videos and robot demonstrations for efficient task learning.

Findings

01

Achieves 86.5% success rate across 4 real-world tasks.

02

Outperforms state-of-the-art IL baselines by 40%.

03

Generalizes to scenarios only seen in human videos.

Abstract

Teaching robots to autonomously complete everyday tasks remains a challenge. Imitation Learning (IL) is a powerful approach that imbues robots with skills via demonstrations, but is limited by the labor-intensive process of collecting teleoperated robot data. Human videos offer a scalable alternative, but it remains difficult to directly train IL policies from them due to the lack of robot action labels. To address this, we propose to represent actions as short-horizon 2D trajectories on an image. These actions, or motion tracks, capture the predicted direction of motion for either human hands or robot end-effectors. We instantiate an IL policy called Motion Track Policy (MT-pi) which receives image observations and outputs motion tracks as actions. By leveraging this unified, cross-embodiment action space, MT-pi completes tasks with high success given just minutes of human video and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.