CleanDIFT: Diffusion Features without Noise
Nick Stracke, Stefan Andreas Baumann, Kolja Bauer, Frank Fundel, Bj\"orn Ommer

TL;DR
This paper introduces a lightweight unsupervised fine-tuning approach for diffusion models that produces high-quality, noise-free semantic features, significantly improving downstream task performance without the need for noise addition or ensembling.
Contribution
It presents a novel fine-tuning method enabling diffusion models to generate effective noise-free features, surpassing previous noisy feature extraction techniques.
Findings
Outperforms previous diffusion features in various tasks
Requires less computational cost than ensemble methods
Produces high-quality, noise-free semantic features
Abstract
Internal features from large-scale pre-trained diffusion models have recently been established as powerful semantic descriptors for a wide range of downstream tasks. Works that use these features generally need to add noise to images before passing them through the model to obtain the semantic features, as the models do not offer the most useful features when given images with little to no noise. We show that this noise has a critical impact on the usefulness of these features that cannot be remedied by ensembling with different random noises. We address this issue by introducing a lightweight, unsupervised fine-tuning method that enables diffusion backbones to provide high-quality, noise-free semantic features. We show that these features readily outperform previous diffusion features by a wide margin in a wide variety of extraction setups and downstream tasks, offering better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
MethodsDiffusion
