MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2   Seconds

Zhenggang Tang; Yuchen Fan; Dilin Wang; Hongyu Xu; Rakesh Ranjan,; Alexander Schwing; Zhicheng Yan

arXiv:2412.06974·cs.CV·December 11, 2024

MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds

Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan,, Alexander Schwing, Zhicheng Yan

PDF

Open Access 1 Repo 2 Models

TL;DR

MV-DUSt3R+ is a fast, single-stage neural network that improves multi-view scene reconstruction, pose estimation, and novel view synthesis by efficiently exchanging information across multiple views without pairwise processing or global optimization.

Contribution

The paper introduces MV-DUSt3R+, a novel single-stage network with multi-view decoder blocks and cross-reference-view fusion, enabling robust, fast multi-view reconstruction and synthesis.

Findings

01

Significantly outperforms prior multi-view reconstruction methods.

02

Reduces inference time to 2 seconds for scene reconstruction.

03

Enhances robustness to reference view selection.

Abstract

Recent sparse multi-view scene reconstruction advances like DUSt3R and MASt3R no longer require camera calibration and camera pose estimation. However, they only process a pair of views at a time to infer pixel-aligned pointmaps. When dealing with more than two views, a combinatorial number of error prone pairwise reconstructions are usually followed by an expensive global optimization, which often fails to rectify the pairwise reconstruction errors. To handle more views, reduce errors, and improve inference time, we propose the fast single-stage feed-forward network MV-DUSt3R. At its core are multi-view decoder blocks which exchange information across any number of views while considering one reference view. To make our method robust to reference view selection, we further propose MV-DUSt3R+, which employs cross-reference-view blocks to fuse information across different reference view…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/mvdust3r
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques