MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds
Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan,, Alexander Schwing, Zhicheng Yan

TL;DR
MV-DUSt3R+ is a fast, single-stage neural network that improves multi-view scene reconstruction, pose estimation, and novel view synthesis by efficiently exchanging information across multiple views without pairwise processing or global optimization.
Contribution
The paper introduces MV-DUSt3R+, a novel single-stage network with multi-view decoder blocks and cross-reference-view fusion, enabling robust, fast multi-view reconstruction and synthesis.
Findings
Significantly outperforms prior multi-view reconstruction methods.
Reduces inference time to 2 seconds for scene reconstruction.
Enhances robustness to reference view selection.
Abstract
Recent sparse multi-view scene reconstruction advances like DUSt3R and MASt3R no longer require camera calibration and camera pose estimation. However, they only process a pair of views at a time to infer pixel-aligned pointmaps. When dealing with more than two views, a combinatorial number of error prone pairwise reconstructions are usually followed by an expensive global optimization, which often fails to rectify the pairwise reconstruction errors. To handle more views, reduce errors, and improve inference time, we propose the fast single-stage feed-forward network MV-DUSt3R. At its core are multi-view decoder blocks which exchange information across any number of views while considering one reference view. To make our method robust to reference view selection, we further propose MV-DUSt3R+, which employs cross-reference-view blocks to fuse information across different reference view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
