One-Shot Adaptation of GAN in Just One CLIP
Gihyun Kwon, Jong Chul Ye

TL;DR
This paper introduces a novel single-shot GAN adaptation method using CLIP space manipulations, enabling effective domain transfer from a single image while maintaining diversity and spatial consistency.
Contribution
It proposes a two-step training strategy with CLIP-guided latent optimization and a new loss function, along with contrastive regularization for improved spatial consistency.
Findings
Generates diverse, target-texture images from a single source image.
Outperforms baseline models both qualitatively and quantitatively.
Enables more effective attribute editing through CLIP space manipulation.
Abstract
There are many recent research efforts to fine-tune a pre-trained generator with a few target images to generate images of a novel domain. Unfortunately, these methods often suffer from overfitting or under-fitting when fine-tuned with a single target image. To address this, here we present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization, followed by generator fine-tuning with a novel loss function that imposes CLIP space consistency between the source and adapted generators. To further improve the adapted model to produce spatially consistent samples with respect to the source generator, we also propose contrastive regularization for patchwise relationships in the CLIP space. Experimental results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training
