Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang,, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli

TL;DR
Fast3R introduces a Transformer-based multi-view 3D reconstruction method that processes over a thousand images in a single forward pass, significantly improving speed and scalability over previous pairwise approaches.
Contribution
The paper presents Fast3R, a novel multi-view 3D reconstruction framework that generalizes previous methods to handle many images simultaneously using a Transformer architecture.
Findings
Achieves state-of-the-art accuracy in 3D reconstruction.
Significantly reduces inference time compared to existing methods.
Demonstrates robustness and scalability in large-scale multi-view scenarios.
Abstract
Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. Fast3R's Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment. Through extensive experiments on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
