Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT

Wei Zhang; Yihang Wu; Songhua Li; Wenjie Ma; Xin Ma; Qiang Li; Qi Wang

arXiv:2507.08448·cs.CV·July 14, 2025

Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT

Wei Zhang, Yihang Wu, Songhua Li, Wenjie Ma, Xin Ma, Qiang Li, Qi Wang

PDF

TL;DR

This paper systematically reviews feed-forward deep learning models for 3D reconstruction, highlighting their technical frameworks, advantages over traditional methods, and future challenges in scalability and dynamic scene handling.

Contribution

It provides a comprehensive analysis of feed-forward 3D reconstruction models like DUSt3R, contrasting them with traditional and earlier learning-based methods, and discusses future research directions.

Findings

01

Feed-forward models enable direct, single-pass 3D reconstruction from images.

02

Transformer-based correspondence modeling improves accuracy.

03

Challenges include scalability and dynamic scene handling.

Abstract

3D reconstruction, which aims to recover the dense three-dimensional structure of a scene, is a cornerstone technology for numerous applications, including augmented/virtual reality, autonomous driving, and robotics. While traditional pipelines like Structure from Motion (SfM) and Multi-View Stereo (MVS) achieve high precision through iterative optimization, they are limited by complex workflows, high computational cost, and poor robustness in challenging scenarios like texture-less regions. Recently, deep learning has catalyzed a paradigm shift in 3D reconstruction. A new family of models, exemplified by DUSt3R, has pioneered a feed-forward approach. These models employ a unified deep network to jointly infer camera poses and dense geometry directly from an Unconstrained set of images in a single forward pass. This survey provides a systematic review of this emerging domain. We begin…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training