Cross-Image Attention for Zero-Shot Appearance Transfer
Yuval Alaluf, Daniel Garibi, Or Patashnik, Hadar Averbuch-Elor, Daniel, Cohen-Or

TL;DR
This paper introduces a zero-shot method using cross-image attention in generative models to transfer appearance between objects with similar semantics but different shapes, without additional training.
Contribution
It proposes a novel cross-image attention mechanism that implicitly finds semantic correspondences during image generation, enabling appearance transfer without training or optimization.
Findings
Effective across diverse object categories
Robust to shape, size, and viewpoint variations
Improves image quality through latent and internal representation manipulation
Abstract
Recent advancements in text-to-image generative models have demonstrated a remarkable ability to capture a deep semantic understanding of images. In this work, we leverage this semantic knowledge to transfer the visual appearance between objects that share similar semantics but may differ significantly in shape. To achieve this, we build upon the self-attention layers of these generative models and introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. Specifically, given a pair of images -- one depicting the target structure and the other specifying the desired appearance -- our cross-image attention combines the queries corresponding to the structure image with the keys and values of the appearance image. This operation, when applied during the denoising process, leverages the established semantic correspondences to generate an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis
