Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J., Fleet

TL;DR
This paper introduces DMD, a diffusion-based model for zero-shot metric depth estimation that effectively handles indoor and outdoor scenes by using FOV conditioning and diverse training, achieving state-of-the-art results.
Contribution
The paper proposes a generic, task-agnostic diffusion model with FOV conditioning and synthetic augmentation for improved zero-shot depth estimation across indoor and outdoor scenes.
Findings
25% reduction in REL on zero-shot indoor datasets
33% reduction in REL on zero-shot outdoor datasets
Achieves state-of-the-art zero-shot depth estimation performance
Abstract
While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized multi-head architectures for jointly modeling indoor and outdoor scenes. In contrast, we advocate a generic, task-agnostic diffusion model, with several advancements such as log-scale depth parameterization to enable joint modeling of indoor and outdoor scenes, conditioning on the field-of-view (FOV) to handle scale ambiguity and synthetically augmenting FOV during training to generalize beyond the limited camera intrinsics in training datasets. Furthermore, by employing a more diverse training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
MethodsDiffusion
