Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Gonzalo Martin Garcia, Karim Knaebel, Christian Schmidt, Daan de Geus, Alexander Hermans, Bastian Leibe

TL;DR
This paper reveals a flaw in the inference pipeline of diffusion models for depth estimation, showing that fixing it and fine-tuning significantly improves speed and performance, making these models more practical and competitive.
Contribution
The authors identify and fix a flaw in the inference pipeline, enabling faster and more accurate diffusion-based depth estimation through end-to-end fine-tuning.
Findings
Fixed inference pipeline increases speed by over 200×
Fine-tuned models outperform previous diffusion-based methods
Fine-tuning works directly on Stable Diffusion, matching state-of-the-art performance
Abstract
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. The fixed model performs comparably to the best previously reported configuration while being more than 200 faster. To optimize for downstream task performance, we perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models on common zero-shot benchmarks. We surprisingly find that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications
MethodsDiffusion
