MixSA: Training-free Reference-based Sketch Extraction via Mixture-of-Self-Attention
Rui Yang, Xiaojun Wu, Shengfeng He

TL;DR
MixSA is a training-free sketch extraction method that uses a mixture-of-self-attention technique with diffusion priors to produce high-quality, style-interpolated sketches from images, offering greater versatility and control.
Contribution
It introduces a novel training-free approach leveraging mixture-of-self-attention and diffusion priors for versatile sketch extraction and style interpolation.
Findings
Outperforms existing methods in perceptual quality metrics.
Enables style interpolation and precise control over sketch textures.
Addresses color averaging issues in sketch extraction.
Abstract
Current sketch extraction methods either require extensive training or fail to capture a wide range of artistic styles, limiting their practical applicability and versatility. We introduce Mixture-of-Self-Attention (MixSA), a training-free sketch extraction method that leverages strong diffusion priors for enhanced sketch perception. At its core, MixSA employs a mixture-of-self-attention technique, which manipulates self-attention layers by substituting the keys and values with those from reference sketches. This allows for the seamless integration of brushstroke elements into initial outline images, offering precise control over texture density and enabling interpolation between styles to create novel, unseen styles. By aligning brushstroke styles with the texture and contours of colored images, particularly in late decoder layers handling local textures, MixSA addresses the common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
