Long-tail Internet photo reconstruction
Yuan Li, Yuanbo Xiangli, Hadar Averbuch-Elor, Noah Snavely, Ruojin Cai

TL;DR
This paper introduces MegaDepth-X, a dataset and sampling strategy to improve 3D reconstruction of long-tail, sparse, and noisy internet images, advancing 3D foundation models.
Contribution
It presents MegaDepth-X and a sampling method to enhance 3D reconstruction robustness in long-tail internet photo collections.
Findings
Finetuning with MegaDepth-X improves reconstruction in sparse scenes.
Enhanced models handle symmetric and repetitive scenes better.
Method maintains performance on standard dense 3D benchmarks.
Abstract
Internet photo collections exhibit an extremely long-tailed distribution: a few famous landmarks are densely photographed and easily reconstructed in 3D, while most real-world sites are represented with sparse, noisy, uneven imagery beyond the capabilities of both classical and learned 3D methods. We believe that tackling this long-tail regime represents one of the next frontiers for 3D foundation models. Although reliable ground-truth 3D supervision from sparse scenes is challenging to acquire, we observe that it can be effectively simulated by sampling sparse subsets from well-reconstructed Internet landmarks. To this end, we introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, together with a strategy for sampling sets of training images that mimic camera distributions in long-tail scenes. Finetuning 3D foundation models with these components yields…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
