DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao, Amy Lin, Jeff Tan, Jason Y. Zhang, Deva Ramanan, Shubham, Tulsiani

TL;DR
DiffusionSfM introduces a novel diffusion-based approach for direct multi-view 3D scene reconstruction and camera pose estimation, outperforming traditional methods and modeling uncertainty.
Contribution
It presents a transformer-based diffusion model that directly predicts scene geometry and camera poses from multi-view images, bypassing traditional pipeline stages.
Findings
Outperforms classical SfM methods on synthetic datasets.
Achieves superior results compared to existing learning-based approaches.
Effectively models uncertainty in 3D reconstruction and pose estimation.
Abstract
Current Structure-from-Motion (SfM) methods typically follow a two-stage pipeline, combining learned or geometric pairwise reasoning with a subsequent global optimization step. In contrast, we propose a data-driven multi-view reasoning approach that directly infers 3D scene geometry and camera poses from multi-view images. Our framework, DiffusionSfM, parameterizes scene geometry and cameras as pixel-wise ray origins and endpoints in a global frame and employs a transformer-based denoising diffusion model to predict them from multi-view inputs. To address practical challenges in training diffusion models with missing data and unbounded scene coordinates, we introduce specialized mechanisms that ensure robust learning. We empirically validate DiffusionSfM on both synthetic and real datasets, demonstrating that it outperforms classical and learning-based approaches while naturally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion
