Text-Guided Scene Sketch-to-Photo Synthesis

AprilPyone MaungMaung; Makoto Shing; Kentaro Mitsui; Kei Sawada; Fumio; Okura

arXiv:2302.06883·cs.CV·February 15, 2023

Text-Guided Scene Sketch-to-Photo Synthesis

AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio, Okura

PDF

Open Access

TL;DR

This paper introduces a text-guided, scene-level sketch-to-photo synthesis method that leverages large-scale pre-trained models and self-supervised learning to generate realistic photos from sketches without reference images.

Contribution

It presents a novel approach combining large pre-trained diffusion models and self-supervised training for scene-level sketch-to-photo synthesis guided by text.

Findings

01

Produces high-quality, realistic photos from scene sketches.

02

Does not require reference images for synthesis.

03

Utilizes self-supervised learning with a pre-trained edge detector.

Abstract

We propose a method for scene-level sketch-to-photo synthesis with text guidance. Although object-level sketch-to-photo synthesis has been widely studied, whole-scene synthesis is still challenging without reference photos that adequately reflect the target style. To this end, we leverage knowledge from recent large-scale pre-trained generative models, resulting in text-guided sketch-to-photo synthesis without the need for reference images. To train our model, we use self-supervised learning from a set of photographs. Specifically, we use a pre-trained edge detector that maps both color and sketch images into a standardized edge domain, which reduces the gap between photograph-based edge images (during training) and hand-drawn sketch images (during inference). We implement our method by fine-tuning a latent diffusion model (i.e., Stable Diffusion) with sketch and text conditions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Latent Diffusion Model