TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

Seungjae Lee; Yoonkyo Jung; Inkook Chun; Yao-Chih Lee; Zikui Cai; Hongjia Huang; Aayush Talreja; Tan Dat Dao; Yongyuan Liang; Jia-Bin Huang; Furong Huang

arXiv:2511.21690·cs.RO·November 27, 2025

TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos

Seungjae Lee, Yoonkyo Jung, Inkook Chun, Yao-Chih Lee, Zikui Cai, Hongjia Huang, Aayush Talreja, Tan Dat Dao, Yongyuan Liang, Jia-Bin Huang, Furong Huang

PDF

Open Access

TL;DR

TraceGen introduces a 3D trace-space world model that enables robots to learn new tasks from cross-embodiment videos efficiently, significantly reducing data requirements and inference time.

Contribution

The paper presents TraceGen, a novel world model that predicts motion in a symbolic 3D trace-space, facilitating cross-embodiment learning from heterogeneous videos.

Findings

01

Achieves 80% success with five target videos across four tasks.

02

Offers 50-600x faster inference than pixel-based models.

03

Reaches 67.5% success with uncalibrated human videos on real robots.

Abstract

Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are abundant, differences in embodiment, camera, and environment hinder their direct use. We address the small-data problem by introducing a unifying, symbolic representation - a compact 3D "trace-space" of scene-level trajectories - that enables learning from cross-embodiment, cross-environment, and cross-task videos. We present TraceGen, a world model that predicts future motion in trace-space rather than pixel space, abstracting away appearance while retaining the geometric structure needed for manipulation. To train TraceGen at scale, we develop TraceForge, a data pipeline that transforms heterogeneous human and robot videos into consistent 3D traces, yielding a corpus of 123K videos and 1.8M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Social Robot Interaction and HRI