Leveraging Diffusion Models for Stylization using Multiple Style Images
Dan Ruta, Abdelaziz Djelouah, Raphael Ortiz, Christopher Schroers

TL;DR
This paper introduces a novel diffusion-based stylization method that effectively uses multiple style images and feature alignment techniques to improve style transfer accuracy and prevent content leakage, achieving state-of-the-art results.
Contribution
The paper presents a new approach leveraging multiple style images and statistical feature alignment within diffusion models for improved style transfer.
Findings
Achieves state-of-the-art stylization results.
Effectively prevents content leakage from style images.
Improves style matching accuracy with multiple style images.
Abstract
Recent advances in latent diffusion models have enabled exciting progress in image style transfer. However, several key issues remain. For example, existing methods still struggle to accurately match styles. They are often limited in the number of style images that can be used. Furthermore, they tend to entangle content and style in undesired ways. To address this, we propose leveraging multiple style images which helps better represent style features and prevent content leaking from the style images. We design a method that leverages both image prompt adapters and statistical alignment of the features during the denoising process. With this, our approach is designed such that it can intervene both at the cross-attention and the self-attention layers of the denoising UNet. For the statistical alignment, we employ clustering to distill a small representative set of attention features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques
