RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion

Xiujian Liang; Jiacheng Liu; Mingyang Sun; Qichen He; Cewu Lu; Jianhua Sun

arXiv:2511.22505·cs.RO·December 9, 2025

RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion

Xiujian Liang, Jiacheng Liu, Mingyang Sun, Qichen He, Cewu Lu, Jianhua Sun

PDF

Open Access

TL;DR

This paper introduces RealD$^2$iff, a diffusion-based framework that synthesizes realistic noisy depth data from simulation, significantly enhancing zero-shot sim2real robot manipulation by bridging the visual gap caused by sensor noise.

Contribution

The work presents a hierarchical diffusion model with novel global and local noise modeling strategies, enabling realistic depth synthesis and zero-shot sim2real transfer in robotic manipulation.

Findings

01

Effective depth noise synthesis from simulation

02

Zero-shot sim2real robot manipulation achieved

03

No manual real sensor data collection needed

Abstract

Robot manipulation in the real world is fundamentally constrained by the visual sim2real gap, where depth observations collected in simulation fail to reflect the complex noise patterns inherent to real sensors. In this work, inspired by the denoising capability of diffusion models, we invert the conventional perspective and propose a clean-to-noisy paradigm that learns to synthesize noisy depth, thereby bridging the visual sim2real gap through purely simulation-driven robotic learning. Building on this idea, we introduce RealD $^{2}$ iff, a hierarchical coarse-to-fine diffusion framework that decomposes depth noise into global structural distortions and fine-grained local perturbations. To enable progressive learning of these components, we further develop two complementary strategies: Frequency-Guided Supervision (FGS) for global structure modeling and Discrepancy-Guided Optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis