TL;DR
This paper introduces a novel method using diffusion models and self-distillation to improve monocular depth estimation in challenging, out-of-distribution scenarios, enhancing robustness and accuracy.
Contribution
It combines diffusion-based scene synthesis with self-distillation to fine-tune depth networks for better performance on difficult data.
Findings
Improved depth estimation accuracy on challenging benchmarks.
Effective synthesis of complex scenes with diffusion models.
Enhanced robustness of depth networks in out-of-distribution conditions.
Abstract
We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task. Starting with images that facilitate depth prediction due to the absence of unfavorable factors, we systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information. This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control, known for synthesizing high-quality image content from textual prompts while preserving the coherence of 3D structure between generated and source imagery. Subsequent fine-tuning of any monocular depth network is carried out through a self-distillation protocol that takes into account images generated using our strategy and its own depth predictions on simple, unchallenging scenes. Experiments on benchmarks tailored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training · Diffusion
