Text to Sketch Generation with Multi-Styles
Tengjie Li, Shikui Tu, Lei Xu

TL;DR
This paper introduces a training-free diffusion-based framework for sketch generation that allows explicit style control through textual prompts and reference sketches, improving style accuracy and flexibility.
Contribution
It presents a novel style guidance mechanism that reduces content leakage and supports multi-style generation using a joint AdaIN module, advancing style control in sketch synthesis.
Findings
High-quality style-aligned sketch generation
Effective reduction of content leakage from references
Supports multi-style generation with improved flexibility
Abstract
Recent advances in vision-language models have facilitated progress in sketch generation. However, existing specialized methods primarily focus on generic synthesis and lack mechanisms for precise control over sketch styles. In this work, we propose a training-free framework based on diffusion models that enables explicit style guidance via textual prompts and referenced style sketches. Unlike previous style transfer methods that overwrite key and value matrices in self-attention, we incorporate the reference features as auxiliary information with linear smoothing and leverage a style-content guidance mechanism. This design effectively reduces content leakage from reference sketches and enhances synthesis quality, especially in cases with low structural similarity between reference and target sketches. Furthermore, we extend our framework to support controllable multi-style generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis
