StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Rinon Gal; Or Patashnik; Haggai Maron; Gal Chechik; Daniel Cohen-Or

arXiv:2108.00946·cs.CV·December 17, 2021·65 cites

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Rinon Gal, Or Patashnik, Haggai Maron, Gal Chechik, Daniel Cohen-Or

PDF

Open Access 3 Repos

TL;DR

This paper introduces a text-guided domain adaptation method for image generators using CLIP, enabling style and shape changes without any image data, through natural language prompts and minimal training.

Contribution

It presents a novel approach to adapt generative models to new domains solely via text prompts, eliminating the need for image collection or retraining from scratch.

Findings

01

Effective domain adaptation with natural language prompts

02

Maintains latent-space properties for downstream tasks

03

Outperforms existing methods in style and shape modifications

Abstract

Can a generative model be trained to produce images from a specific domain, guided by a text prompt only, without seeing any image? In other words: can an image generator be trained "blindly"? Leveraging the semantic power of large scale Contrastive-Language-Image-Pre-training (CLIP) models, we present a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image. We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains characterized by diverse styles and shapes. Notably, many of these modifications would be difficult or outright impossible to reach with existing methods. We conduct an extensive set of experiments and comparisons across a wide range of domains. These demonstrate the effectiveness of our approach and show that our shifted models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications