Style-Content Disentanglement in Language-Image Pretraining   Representations for Zero-Shot Sketch-to-Image Synthesis

Jan Zuiderveld

arXiv:2206.01661·cs.CV·June 6, 2022

Style-Content Disentanglement in Language-Image Pretraining Representations for Zero-Shot Sketch-to-Image Synthesis

Jan Zuiderveld

PDF

Open Access

TL;DR

This paper introduces a training-free zero-shot sketch-to-image synthesis method using language-image pretraining representations, leveraging disentangled style and content features for effective image generation without additional training.

Contribution

It presents a simple arithmetic-based approach to disentangle style and content in pretrained representations, enabling competitive zero-shot sketch-to-image synthesis without retraining models.

Findings

01

Competitive with state-of-the-art models

02

Requires only pretrained models and minimal data

03

Effective zero-shot synthesis without retraining

Abstract

In this work, we propose and validate a framework to leverage language-image pretraining representations for training-free zero-shot sketch-to-image synthesis. We show that disentangled content and style representations can be utilized to guide image generators to employ them as sketch-to-image generators without (re-)training any parameters. Our approach for disentangling style and content entails a simple method consisting of elementary arithmetic assuming compositionality of information in representations of input sketches. Our results demonstrate that this approach is competitive with state-of-the-art instance-level open-domain sketch-to-image models, while only depending on pretrained off-the-shelf models and a fraction of the data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques