Visual Style Prompting with Swapping Self-Attention

Jaeseok Jeong; Junho Kim; Yunjey Choi; Gayoung Lee; Youngjung Uh

arXiv:2402.12974·cs.CV·February 22, 2024·3 cites

Visual Style Prompting with Swapping Self-Attention

Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel style prompting method for diffusion-based text-to-image models that swaps self-attention components during denoising, enabling style control without fine-tuning and improving style fidelity and prompt accuracy.

Contribution

The proposed swapping self-attention approach allows style transfer in diffusion models without fine-tuning, enhancing style fidelity and prompt alignment.

Findings

01

Outperforms existing style transfer methods in style fidelity

02

Maintains high prompt accuracy across diverse styles

03

No additional fine-tuning required for style control

Abstract

In the evolving domain of text-to-image generation, diffusion models have emerged as powerful tools in content creation. Despite their remarkable capability, existing models still face challenges in achieving controlled generation with a consistent style, requiring costly fine-tuning or often inadequately transferring the visual elements due to content leakage. To address these challenges, we propose a novel approach, \ours, to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers. This approach allows for the visual style prompting without any fine-tuning, ensuring that generated images maintain a faithful style. Through extensive evaluation across various styles and text prompts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver-ai/Visual-Style-Prompting
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Advanced Vision and Imaging

MethodsDiffusion