StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback
Jiho Park, Sieun Choi, Jaeyoon Seo, Jihie Kim

TL;DR
StableSketcher is a novel framework that enhances diffusion models to generate high-fidelity, prompt-aligned pixel-based sketches by fine-tuning autoencoders and incorporating visual question answering feedback.
Contribution
It introduces a new reward function based on visual question answering and a dataset with sketches, captions, and Q&A pairs, advancing sketch generation quality and evaluation.
Findings
StableSketcher produces sketches with better stylistic fidelity.
It achieves improved prompt alignment over the Stable Diffusion baseline.
The new dataset SketchDUO addresses limitations of existing sketch datasets.
Abstract
Although recent advancements in diffusion models have significantly enriched the quality of generated images, challenges remain in synthesizing pixel-based human-drawn sketches, a representative example of abstract expression. To combat these challenges, we propose StableSketcher, a novel framework that empowers diffusion models to generate hand-drawn sketches with high prompt fidelity. Within this framework, we fine-tune the variational autoencoder to optimize latent decoding, enabling it to better capture the characteristics of sketches. In parallel, we integrate a new reward function for reinforcement learning based on visual question answering, which improves text-image alignment and semantic consistency. Extensive experiments demonstrate that StableSketcher generates sketches with improved stylistic fidelity, achieving better alignment with prompts compared to the Stable Diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
