GeoDiff: Geometry-Guided Diffusion for Metric Depth Estimation
Tuan Pham, Thanh-Tung Le, Xiaohui Xie, Stephan Mandt

TL;DR
GeoDiff introduces a geometry-guided diffusion framework that enhances monocular depth estimation with stereo cues, effectively recovering absolute metric depth across diverse environments without retraining.
Contribution
It reframes depth estimation as an inverse problem using pretrained diffusion models and stereo constraints, enabling accurate metric depth recovery without additional training.
Findings
Outperforms state-of-the-art methods in various environments.
Effective in challenging scenarios with translucent and specular surfaces.
No retraining required for the proposed approach.
Abstract
We introduce a novel framework for metric depth estimation that enhances pretrained diffusion-based monocular depth estimation (DB-MDE) models with stereo vision guidance. While existing DB-MDE methods excel at predicting relative depth, estimating absolute metric depth remains challenging due to scale ambiguities in single-image scenarios. To address this, we reframe depth estimation as an inverse problem, leveraging pretrained latent diffusion models (LDMs) conditioned on RGB images, combined with stereo-based geometric constraints, to learn scale and shift for accurate depth recovery. Our training-free solution seamlessly integrates into existing DB-MDE frameworks and generalizes across indoor, outdoor, and complex environments. Extensive experiments demonstrate that our approach matches or surpasses state-of-the-art methods, particularly in challenging scenarios involving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Image Processing Techniques and Applications
