A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera,, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

TL;DR
This paper explores the complementary strengths of Stable Diffusion and DINO features for zero-shot semantic and dense correspondence, demonstrating that their fusion improves performance on benchmark datasets and enables applications like instance swapping.
Contribution
It introduces a simple fusion method of SD and DINO features that enhances zero-shot correspondence performance and reveals their complementary properties.
Findings
Fusion of SD and DINO features outperforms state-of-the-art methods.
SD features provide high-quality spatial information but less accurate semantics.
DINO features offer sparse, accurate semantic matches.
Abstract
Text-to-image diffusion models have made significant advances in generating and editing high-quality images. As a result, numerous approaches have explored the ability of diffusion model features to understand and process single images for downstream tasks, e.g., classification, semantic segmentation, and stylization. However, significantly less is known about what these features reveal across multiple, different images and objects. In this work, we exploit Stable Diffusion (SD) features for semantic and dense correspondence and discover that with simple post-processing, SD features can perform quantitatively similar to SOTA representations. Interestingly, the qualitative analysis reveals that SD features have very different properties compared to existing representation learning features, such as the recently released DINOv2: while DINOv2 provides sparse but accurate matches, SD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Radiomics and Machine Learning in Medical Imaging
MethodsDiffusion
