What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
Guangkai Xu, Yongtao Ge, Mingyu Liu, Chengxiang Fan, Kangyang Xie,, Zhiyue Zhao, Hao Chen, Chunhua Shen

TL;DR
This paper investigates key factors affecting the transfer efficiency of diffusion models for dense perception tasks, emphasizing data quality, the stochastic nature of diffusion, and supervision strategies, leading to a fast, effective fine-tuning paradigm called GenPercept.
Contribution
It provides a comprehensive analysis of diffusion model fine-tuning for perception tasks and introduces GenPercept, a one-step deterministic fine-tuning method with improved speed and performance.
Findings
High-quality fine-tuning data is crucial for perception tasks.
Stochastic diffusion models slightly hinder deterministic perception performance.
Task-specific image-level supervision enhances fine-grained details.
Abstract
Extensive pre-training with large data is indispensable for downstream geometry and semantic visual perception tasks. Thanks to large-scale text-to-image (T2I) pretraining, recent works show promising results by simply fine-tuning T2I diffusion models for dense perception tasks. However, several crucial design decisions in this process still lack comprehensive justification, encompassing the necessity of the multi-step stochastic diffusion mechanism, training strategy, inference ensemble strategy, and fine-tuning data quality. In this work, we conduct a thorough investigation into critical factors that affect transfer efficiency and performance when using diffusion priors. Our key findings are: 1) High-quality fine-tuning data is paramount for both semantic and geometry perception tasks. 2) The stochastic nature of diffusion models has a slightly negative impact on deterministic visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics
MethodsDiffusion · ALIGN
