Cross-Image Attention for Zero-Shot Appearance Transfer

Yuval Alaluf; Daniel Garibi; Or Patashnik; Hadar Averbuch-Elor; Daniel; Cohen-Or

arXiv:2311.03335·cs.CV·November 7, 2023·1 cites

Cross-Image Attention for Zero-Shot Appearance Transfer

Yuval Alaluf, Daniel Garibi, Or Patashnik, Hadar Averbuch-Elor, Daniel, Cohen-Or

PDF

Open Access

TL;DR

This paper introduces a zero-shot method using cross-image attention in generative models to transfer appearance between objects with similar semantics but different shapes, without additional training.

Contribution

It proposes a novel cross-image attention mechanism that implicitly finds semantic correspondences during image generation, enabling appearance transfer without training or optimization.

Findings

01

Effective across diverse object categories

02

Robust to shape, size, and viewpoint variations

03

Improves image quality through latent and internal representation manipulation

Abstract

Recent advancements in text-to-image generative models have demonstrated a remarkable ability to capture a deep semantic understanding of images. In this work, we leverage this semantic knowledge to transfer the visual appearance between objects that share similar semantics but may differ significantly in shape. To achieve this, we build upon the self-attention layers of these generative models and introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. Specifically, given a pair of images -- one depicting the target structure and the other specifying the desired appearance -- our cross-image attention combines the queries corresponding to the structure image with the keys and values of the appearance image. This operation, when applied during the denoising process, leverages the established semantic correspondences to generate an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis