Diff$^2$I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior

Juncheng Mu; Chengwei Ren; Weixiang Zhang; Liang Pan; Xiao-Ping Zhang; Yue Gao

arXiv:2507.06651·cs.CV·July 10, 2025

Diff$^2$I2P: Differentiable Image-to-Point Cloud Registration with Diffusion Prior

Juncheng Mu, Chengwei Ren, Weixiang Zhang, Liang Pan, Xiao-Ping Zhang, Yue Gao

PDF

Open Access

TL;DR

This paper introduces Diff$^2$I2P, a novel differentiable framework for image-to-point cloud registration that leverages diffusion priors and differentiable modules to improve cross-modal correspondence accuracy and outperforms state-of-the-art methods.

Contribution

The paper proposes a fully differentiable I2P registration framework using a diffusion prior, with novel techniques for knowledge distillation and correspondence estimation.

Findings

01

Achieves over 7% improvement in registration recall on 7-Scenes benchmark.

02

Outperforms existing state-of-the-art I2P registration methods.

03

Demonstrates the effectiveness of diffusion priors in cross-modal registration.

Abstract

Learning cross-modal correspondences is essential for image-to-point cloud (I2P) registration. Existing methods achieve this mostly by utilizing metric learning to enforce feature alignment across modalities, disregarding the inherent modality gap between image and point data. Consequently, this paradigm struggles to ensure accurate cross-modal correspondences. To this end, inspired by the cross-modal generation success of recent large diffusion models, we propose Diff $^{2}$ I2P, a fully Differentiable I2P registration framework, leveraging a novel and effective Diffusion prior for bridging the modality gap. Specifically, we propose a Control-Side Score Distillation (CSD) technique to distill knowledge from a depth-conditioned diffusion model to directly optimize the predicted transformation. However, the gradients on the transformation fail to backpropagate onto the cross-modal features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization · Advanced Vision and Imaging