Generating images from caption and vice versa via CLIP-Guided Generative   Latent Space Search

Federico A. Galatolo; Mario G.C.A. Cimino; Gigliola Vaglini

arXiv:2102.01645·cs.NE·October 4, 2021

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Federico A. Galatolo, Mario G.C.A. Cimino, Gigliola Vaglini

PDF

3 Repos

TL;DR

This paper introduces CLIP-GLaSS, a zero-shot framework that generates images or captions from a given caption or image by searching latent space with a genetic algorithm guided by CLIP embeddings.

Contribution

It presents a novel zero-shot approach combining CLIP, generative models, and genetic algorithms for cross-modal image-caption generation.

Findings

01

Effective generation of images from captions and vice versa.

02

Utilizes BigGAN, StyleGAN2, and GPT-2 for high-quality outputs.

03

Demonstrates promising results in cross-modal generation tasks.

Abstract

In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image). CLIP-GLaSS is based on the CLIP neural network, which, given an image and a descriptive caption, provides similar embeddings. Differently, CLIP-GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP embedding is the most similar to the input one. This optimal image (or caption) is produced via a generative network, after an exploration by a genetic algorithm. Promising results are shown, based on the experimentation of the image Generators BigGAN and StyleGAN2, and of the text Generator GPT2

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Dense Connections · Adam · Linear Layer · Six Ways To Communicate To Someone At Expedia Via Phone And Email's. · 1x1 Convolution · Feedforward Network · Off-Diagonal Orthogonal Regularization · Projection Discriminator · ((Reservation@Faqs))How do I cancel a reservation on Expedia?