StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Jiho Park; Sieun Choi; Jaeyoon Seo; Jihie Kim

arXiv:2510.20093·cs.CV·April 15, 2026

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

Jiho Park, Sieun Choi, Jaeyoon Seo, Jihie Kim

PDF

1 Repo 1 Datasets

TL;DR

StableSketcher is a novel framework that enhances diffusion models to generate high-fidelity, prompt-aligned pixel-based sketches by fine-tuning autoencoders and incorporating visual question answering feedback.

Contribution

It introduces a new reward function based on visual question answering and a dataset with sketches, captions, and Q&A pairs, advancing sketch generation quality and evaluation.

Findings

01

StableSketcher produces sketches with better stylistic fidelity.

02

It achieves improved prompt alignment over the Stable Diffusion baseline.

03

The new dataset SketchDUO addresses limitations of existing sketch datasets.

Abstract

Although recent advancements in diffusion models have significantly enriched the quality of generated images, challenges remain in synthesizing pixel-based human-drawn sketches, a representative example of abstract expression. To combat these challenges, we propose StableSketcher, a novel framework that empowers diffusion models to generate hand-drawn sketches with high prompt fidelity. Within this framework, we fine-tune the variational autoencoder to optimize latent decoding, enabling it to better capture the characteristics of sketches. In parallel, we integrate a new reward function for reinforcement learning based on visual question answering, which improves text-image alignment and semantic consistency. Extensive experiments demonstrate that StableSketcher generates sketches with improved stylistic fidelity, achieving better alignment with prompts compared to the Stable Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://zihos.github.io/StableSketcher
github

Datasets

ziiio/SketchDUO
dataset· 107 dl
107 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.