CRAFT: Video Diffusion for Bimanual Robot Data Generation

Jason Chen; I-Chun Arthur Liu; Gaurav Sukhatme; Daniel Seita

arXiv:2604.03552·cs.RO·April 7, 2026

CRAFT: Video Diffusion for Bimanual Robot Data Generation

Jason Chen, I-Chun Arthur Liu, Gaurav Sukhatme, Daniel Seita

PDF

1 Repo

TL;DR

CRAFT introduces a video diffusion framework that generates diverse, realistic bimanual robot demonstrations from limited real data, enhancing policy robustness and generalization in manipulation tasks.

Contribution

It presents a novel diffusion-based method conditioned on structural cues to produce scalable, diverse, and physically plausible robot demonstration videos from simulation data.

Findings

01

CRAFT improves success rates over existing augmentation methods.

02

It enables large-scale, diverse demonstration generation from few real examples.

03

The approach enhances generalization in both simulated and real bimanual tasks.

Abstract

Bimanual robot learning from demonstrations is fundamentally limited by the cost and narrow visual diversity of real-world data, which constrains policy robustness across viewpoints, object configurations, and embodiments. We present Canny-guided Robot Data Generation using Video Diffusion Transformers (CRAFT), a video diffusion-based framework for scalable bimanual demonstration generation that synthesizes temporally coherent manipulation videos while producing action labels. By conditioning video diffusion on edge-based structural cues extracted from simulator-generated trajectories, CRAFT produces physically plausible trajectory variations and supports a unified augmentation pipeline spanning object pose changes, camera viewpoints, lighting and background variations, cross-embodiment transfer, and multi-view synthesis. We leverage a pre-trained video diffusion model to convert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://craftaug.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.