Training-Free Sketch-Guided Diffusion with Latent Optimization

Sandra Zhang Ding; Jiafeng Mao; Kiyoharu Aizawa

arXiv:2409.00313·cs.CV·May 8, 2025

Training-Free Sketch-Guided Diffusion with Latent Optimization

Sandra Zhang Ding, Jiafeng Mao, Kiyoharu Aizawa

PDF

Open Access

TL;DR

This paper introduces a training-free, sketch-guided image generation method that uses latent optimization and cross-attention maps to produce images closely aligned with user sketches, enhancing control and customization.

Contribution

It presents a novel training-free pipeline that incorporates sketch guidance into diffusion models via latent optimization and cross-attention maps, improving image structure fidelity.

Findings

01

Enhanced image accuracy with sketch guidance

02

Training-free approach simplifies integration

03

Latent optimization refines structure adherence

Abstract

Based on recent advanced diffusion models, Text-to-image (T2I) generation models have demonstrated their capabilities to generate diverse and high-quality images. However, leveraging their potential for real-world content creation, particularly in providing users with precise control over the image generation result, poses a significant challenge. In this paper, we propose an innovative training-free pipeline that extends existing text-to-image generation models to incorporate a sketch as an additional condition. To generate new images with a layout and structure closely resembling the input sketch, we find that these core features of a sketch can be tracked with the cross-attention maps of diffusion models. We introduce latent optimization, a method that refines the noisy latent at each intermediate step of the generation process using cross-attention maps to ensure that the generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition

MethodsDiffusion