Towards Zero-Shot Scale-Aware Monocular Depth Estimation
Vitor Guizilini, Igor Vasiljevic, Dian Chen, Rares Ambrus, Adrien, Gaidon

TL;DR
ZeroDepth introduces a novel monocular depth estimation framework that predicts metric scale across diverse domains by leveraging geometric embeddings and a decoupled encoder-decoder architecture, achieving state-of-the-art results.
Contribution
The paper presents ZeroDepth, a new approach that enables zero-shot, scale-aware monocular depth estimation across multiple domains without test-time scaling.
Findings
Achieves state-of-the-art performance on outdoor benchmarks like KITTI and nuScenes.
Outperforms in-domain trained methods requiring test-time scaling.
Successfully generalizes to indoor datasets like NYUv2 without domain-specific tuning.
Abstract
Monocular depth estimation is scale-ambiguous, and thus requires scale supervision to produce metric predictions. Even so, the resulting models will be geometry-specific, with learned scales that cannot be directly transferred across domains. Because of that, recent works focus instead on relative depth, eschewing scale in favor of improved up-to-scale zero-shot transfer. In this work we introduce ZeroDepth, a novel monocular depth estimation framework capable of predicting metric scale for arbitrary test images from different domains and camera parameters. This is achieved by (i) the use of input-level geometric embeddings that enable the network to learn a scale prior over objects; and (ii) decoupling the encoder and decoder stages, via a variational latent representation that is conditioned on single frame information. We evaluated ZeroDepth targeting both outdoor (KITTI, DDAD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
MethodsFocus
