A Comparison of Multi-View Stereo Methods for Photogrammetric 3D Reconstruction: From Traditional to Learning-Based Approaches
Yawen Li, George Vosselman, Francesco Nex

TL;DR
This paper compares traditional and learning-based multi-view stereo methods for photogrammetric 3D reconstruction, evaluating accuracy, speed, and robustness across aerial datasets.
Contribution
It provides a comprehensive evaluation of traditional and state-of-the-art learning-based MVS approaches, highlighting their strengths and limitations.
Findings
Traditional methods like COLMAP are accurate but slower.
Learning-based methods are faster and more robust in challenging scenarios.
End-to-end approaches achieve competitive accuracy with faster runtimes.
Abstract
Photogrammetric 3D reconstruction has long relied on traditional Structure-from-Motion (SfM) and Multi-View Stereo (MVS) methods, which provide high accuracy but face challenges in speed and scalability. Recently, learning-based MVS methods have emerged, aiming for faster and more efficient reconstruction. This work presents a comparative evaluation between a representative traditional MVS pipeline (COLMAP) and state-of-the-art learning-based approaches, including geometry-guided methods (MVSNet, PatchmatchNet, MVSAnywhere, MVSFormer++) and end-to-end frameworks (Stereo4D, FoundationStereo, DUSt3R, MASt3R, Fast3R, VGGT). Two experiments were conducted on different aerial scenarios. The first experiment used the MARS-LVIG dataset, where ground-truth 3D reconstruction was provided by LiDAR point clouds. The second experiment used a public scene from the Pix4D official website, with ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
