TL;DR
StyleCLIP introduces a novel text-based interface for manipulating StyleGAN-generated images by leveraging CLIP models, enabling intuitive and automatic semantic edits without manual latent space exploration.
Contribution
The paper develops a CLIP-based optimization scheme, a latent mapper for faster manipulation, and a method for input-agnostic style space directions, advancing text-driven image editing.
Findings
Effective text-guided image manipulation demonstrated
Faster and more stable edits with the latent mapper
Comparable or superior results to manual methods
Abstract
Inspired by the ability of StyleGAN to generate highly realistic images in a variety of domains, much recent work has focused on understanding how to use the latent spaces of StyleGAN to manipulate generated and real images. However, discovering semantically meaningful latent manipulations typically involves painstaking human examination of the many degrees of freedom, or an annotated collection of images for each desired manipulation. In this work, we explore leveraging the power of recently introduced Contrastive Language-Image Pre-training (CLIP) models in order to develop a text-based interface for StyleGAN image manipulation that does not require such manual effort. We first introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to a user-provided text prompt. Next, we describe a latent mapper that infers a text-guided latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
This AI Made Me Look Like Obi-Wan Kenobi! 🧔· youtube
Taxonomy
MethodsHuMan(Expedia)||How do I get a human at Expedia? · R1 Regularization · Adaptive Instance Normalization · Dense Connections · Convolution · Feedforward Network · StyleGAN
