Style-Content Disentanglement in Language-Image Pretraining Representations for Zero-Shot Sketch-to-Image Synthesis
Jan Zuiderveld

TL;DR
This paper introduces a training-free zero-shot sketch-to-image synthesis method using language-image pretraining representations, leveraging disentangled style and content features for effective image generation without additional training.
Contribution
It presents a simple arithmetic-based approach to disentangle style and content in pretrained representations, enabling competitive zero-shot sketch-to-image synthesis without retraining models.
Findings
Competitive with state-of-the-art models
Requires only pretrained models and minimal data
Effective zero-shot synthesis without retraining
Abstract
In this work, we propose and validate a framework to leverage language-image pretraining representations for training-free zero-shot sketch-to-image synthesis. We show that disentangled content and style representations can be utilized to guide image generators to employ them as sketch-to-image generators without (re-)training any parameters. Our approach for disentangling style and content entails a simple method consisting of elementary arithmetic assuming compositionality of information in representations of input sketches. Our results demonstrate that this approach is competitive with state-of-the-art instance-level open-domain sketch-to-image models, while only depending on pretrained off-the-shelf models and a fraction of the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
