TL;DR
This paper introduces a training-free method that enhances a pre-trained diffusion model to perform zero-shot metric depth estimation by incorporating defocus blur cues from image pairs with different apertures.
Contribution
It presents a novel approach to turn a pre-trained diffusion model into a metric depth predictor using defocus cues without additional training.
Findings
Outperforms existing zero-shot MMDE methods on real datasets.
Effectively incorporates defocus cues at inference time.
Improves depth estimation accuracy and generalization.
Abstract
Recent monocular metric depth estimation (MMDE) methods have made notable progress towards zero-shot generalization. However, they still exhibit a significant performance drop on out-of-distribution datasets. We address this limitation by injecting defocus blur cues at inference time into Marigold, a \textit{pre-trained} diffusion model for zero-shot, scale-invariant monocular depth estimation (MDE). Our method effectively turns Marigold into a metric depth predictor in a training-free manner. To incorporate defocus cues, we capture two images with a small and a large aperture from the same viewpoint. To recover metric depth, we then optimize the metric depth scaling parameters and the noise latents of Marigold at inference time using gradients from a loss function based on the defocus-blur image formation model. We compare our method against existing state-of-the-art zero-shot MMDE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
