TL;DR
Metric3D v2 introduces a versatile monocular geometric foundation model that achieves zero-shot metric depth and surface normal estimation from a single image, enabling accurate 3D recovery without task-specific training.
Contribution
It proposes a canonical camera space transformation and joint depth-normal optimization modules, allowing stable training on large-scale diverse data for zero-shot generalization.
Findings
Achieves zero-shot metric depth and normal estimation on in-the-wild images.
Trained on over 16 million images from diverse camera models.
Enables plausible single-image 3D metrology.
Abstract
We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
