Text-Guided Scene Sketch-to-Photo Synthesis
AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio, Okura

TL;DR
This paper introduces a text-guided, scene-level sketch-to-photo synthesis method that leverages large-scale pre-trained models and self-supervised learning to generate realistic photos from sketches without reference images.
Contribution
It presents a novel approach combining large pre-trained diffusion models and self-supervised training for scene-level sketch-to-photo synthesis guided by text.
Findings
Produces high-quality, realistic photos from scene sketches.
Does not require reference images for synthesis.
Utilizes self-supervised learning with a pre-trained edge detector.
Abstract
We propose a method for scene-level sketch-to-photo synthesis with text guidance. Although object-level sketch-to-photo synthesis has been widely studied, whole-scene synthesis is still challenging without reference photos that adequately reflect the target style. To this end, we leverage knowledge from recent large-scale pre-trained generative models, resulting in text-guided sketch-to-photo synthesis without the need for reference images. To train our model, we use self-supervised learning from a set of photographs. Specifically, we use a pre-trained edge detector that maps both color and sketch images into a standardized edge domain, which reduces the gap between photograph-based edge images (during training) and hand-drawn sketch images (during inference). We implement our method by fine-tuning a latent diffusion model (i.e., Stable Diffusion) with sketch and text conditions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Latent Diffusion Model
