BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular   Depth Estimation

Xiang Zhang; Bingxin Ke; Hayko Riemenschneider; Nando Metzger; Anton; Obukhov; Markus Gross; Konrad Schindler; Christopher Schroers

arXiv:2407.17952·cs.CV·November 7, 2024

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton, Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers

PDF

1 Video

TL;DR

BetterDepth is a diffusion-based refiner that enhances zero-shot monocular depth estimation by combining global geometric accuracy with fine detail refinement, achieving state-of-the-art results efficiently.

Contribution

It introduces a plug-and-play diffusion refiner that improves existing MDE models by refining details while maintaining geometric correctness, trained with novel alignment and masking techniques.

Findings

01

State-of-the-art zero-shot MDE performance achieved

02

Effective refinement of details in complex scenes

03

Plug-and-play enhancement for existing models

Abstract

By training over large-scale datasets, zero-shot monocular depth estimation (MDE) methods show robust performance in the wild but often suffer from insufficient detail. Although recent diffusion-based MDE approaches exhibit a superior ability to extract details, they struggle in geometrically complex scenes that challenge their geometry prior, trained on less diverse 3D data. To leverage the complementary merits of both worlds, we propose BetterDepth to achieve geometrically correct affine-invariant MDE while capturing fine details. Specifically, BetterDepth is a conditional diffusion-based refiner that takes the prediction from pre-trained MDE models as depth conditioning, in which the global depth layout is well-captured, and iteratively refines details based on the input image. For the training of such a refiner, we propose global pre-alignment and local patch masking methods to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation· slideslive