TL;DR
This paper evaluates deep multi-view stereo methods on uncontrolled internet photo collections, revealing limitations of existing approaches and proposing a new training methodology suitable for real-world data.
Contribution
It introduces a new training approach enabling unsupervised deep MVS on real-world data and highlights the importance of evaluation in uncontrolled scenarios.
Findings
Unsupervised methods cannot train effectively on wild internet data.
Supervised deep MVS methods excel with few internet images.
Evaluation results differ significantly from controlled dataset benchmarks.
Abstract
Deep multi-view stereo (MVS) methods have been developed and extensively compared on simple datasets, where they now outperform classical approaches. In this paper, we ask whether the conclusions reached in controlled scenarios are still valid when working with Internet photo collections. We propose a methodology for evaluation and explore the influence of three aspects of deep MVS methods: network architecture, training data, and supervision. We make several key observations, which we extensively validate quantitatively and qualitatively, both for depth prediction and complete 3D reconstructions. First, complex unsupervised approaches cannot train on data in the wild. Our new approach makes it possible with three key elements: upsampling the output, softmin based aggregation and a single reconstruction loss. Second, supervised deep depthmap-based MVS methods are state-of-the art for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
