Emergent Correspondence from Image Diffusion

Luming Tang; Menglin Jia; Qianqian Wang; Cheng Perng Phoo; Bharath; Hariharan

arXiv:2306.03881·cs.CV·December 8, 2023·54 cites

Emergent Correspondence from Image Diffusion

Luming Tang, Menglin Jia, Qianqian Wang, Cheng Perng Phoo, Bharath, Hariharan

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper demonstrates that image diffusion models inherently learn to establish semantic, geometric, and temporal correspondences without explicit supervision, and introduces DIFT, a method to extract these features for improved correspondence matching.

Contribution

The paper introduces DIFT, a novel approach to extract implicit correspondence features from diffusion models without additional training, outperforming several supervised and unsupervised methods.

Findings

01

DIFT outperforms weakly-supervised methods and off-the-shelf features in correspondence tasks.

02

DIFT surpasses DINO and OpenCLIP on the SPair-71k benchmark.

03

DIFT matches or exceeds state-of-the-art supervised methods on multiple categories.

Abstract

Finding correspondences between images is a fundamental problem in computer vision. In this paper, we show that correspondence emerges in image diffusion models without any explicit supervision. We propose a simple strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT), and use them to establish correspondences between real images. Without any additional fine-tuning or supervision on the task-specific data or annotations, DIFT is able to outperform both weakly-supervised methods and competitive off-the-shelf features in identifying semantic, geometric, and temporal correspondences. Particularly for semantic correspondence, DIFT from Stable Diffusion is able to outperform DINO and OpenCLIP by 19 and 14 accuracy points respectively on the challenging SPair-71k benchmark. It even outperforms the state-of-the-art supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Emergent Correspondence from Image Diffusion· slideslive

Taxonomy

TopicsAdvanced Neuroimaging Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Fetal and Pediatric Neurological Disorders

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer · Diffusion · self-DIstillation with NO labels