RoboTransfer: Controllable Geometry-Consistent Video Diffusion for Manipulation Policy Transfer

Liu Liu; Xiaofeng Wang; Guosheng Zhao; Keyu Li; Wenkang Qin; Jiagang Zhu; Jiaxiong Qiu; Zheng Zhu; Guan Huang; Zhizhong Su

arXiv:2505.23171·cs.CV·January 7, 2026

RoboTransfer: Controllable Geometry-Consistent Video Diffusion for Manipulation Policy Transfer

Liu Liu, Xiaofeng Wang, Guosheng Zhao, Keyu Li, Wenkang Qin, Jiagang Zhu, Jiaxiong Qiu, Zheng Zhu, Guan Huang, Zhizhong Su

PDF

1 Models 1 Datasets

TL;DR

RoboTransfer is a diffusion-based framework that synthesizes geometrically consistent robotic videos with fine control, improving data quality for training manipulation policies and enhancing their generalization in diverse environments.

Contribution

It introduces a novel diffusion model leveraging 3D geometry and cross-view features for high-quality, controllable robotic video synthesis to aid policy transfer.

Findings

01

Videos have superior geometric consistency and visual fidelity.

02

Policies trained on RoboTransfer data generalize better to unseen scenarios.

03

The method enables fine-grained control over scene elements.

Abstract

The goal of general-purpose robotics is to create agents that can seamlessly adapt to and operate in diverse, unstructured human environments. Imitation learning has become a key paradigm for robotic manipulation, yet collecting large-scale and diverse demonstrations is prohibitively expensive. Simulators provide a cost-effective alternative, but the sim-to-real gap remains a major obstacle to scalability. We present RoboTransfer, a diffusion-based video generation framework for synthesizing robotic data. By leveraging cross-view feature interactions and globally consistent 3D geometry, RoboTransfer ensures multi-view geometric consistency while enabling fine-grained control over scene elements, such as background editing and object replacement. Extensive experiments demonstrate that RoboTransfer produces videos with superior geometric consistency and visual fidelity. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
HorizonRobotics/RoboTransfer
model· 66 dl· ♡ 1
66 dl♡ 1

Datasets

HorizonRobotics/RoboTransfer-RealData
dataset· 112 dl
112 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.