AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis
Junjie Ye, Rong Xue, Basile Van Hoorick, Pavel Tokmakov, Muhammad Zubair Irshad, Yue Wang, Vitor Guizilini

TL;DR
AnchorDream leverages pretrained video diffusion models conditioned on robot motion to synthesize diverse, embodiment-consistent robot data, significantly enhancing imitation learning datasets without explicit environment modeling.
Contribution
The paper introduces AnchorDream, a novel embodiment-aware diffusion-based model that scales limited demonstrations into large, diverse datasets while maintaining motion plausibility.
Findings
36.4% improvement in simulator benchmarks
Nearly double performance in real-world tasks
Effective scaling from few demonstrations
Abstract
The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and simulators offer limited diversity and fidelity with pronounced sim-to-real gaps. While generative models present an attractive solution, existing methods often alter only visual appearances without creating new behaviors, or suffer from embodiment inconsistencies that yield implausible motions. To address these limitations, we introduce AnchorDream, an embodiment-aware world model that repurposes pretrained video diffusion models for robot data synthesis. AnchorDream conditions the diffusion process on robot motion renderings, anchoring the embodiment to prevent hallucination while synthesizing objects and environments consistent with the robot's kinematics. Starting from only a handful of human teleoperation demonstrations, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
