Simultaneous Multiple-Prompt Guided Generation Using Differentiable   Optimal Transport

Yingtao Tian; Marco Cuturi; David Ha

arXiv:2204.08472·cs.CV·April 20, 2022

Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal Transport

Yingtao Tian, Marco Cuturi, David Ha

PDF

Open Access

TL;DR

This paper introduces a novel method using differentiable optimal transport to improve text-to-image synthesis, enhancing diversity and fidelity in generated images guided by multiple prompts.

Contribution

It proposes a differentiable optimal transport-based approach for multi-prompt image generation, addressing mode collapse and improving diversity over traditional mean-distance methods.

Findings

01

OT-based method produces more diverse images

02

Improved fidelity to multiple prompts

03

Qualitative and quantitative performance gains

Abstract

Recent advances in deep learning, such as powerful generative models and joint text-image embeddings, have provided the computational creativity community with new tools, opening new perspectives for artistic pursuits. Text-to-image synthesis approaches that operate by generating images from text cues provide a case in point. These images are generated with a latent vector that is progressively refined to agree with text cues. To do so, patches are sampled within the generated image, and compared with the text prompts in the common text-image embedding space; The latent vector is then updated, using gradient descent, to reduce the mean (average) distance between these patches and text cues. While this approach provides artists with ample freedom to customize the overall appearance of images, through their choice in generative models, the reliance on a simple criterion (mean of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation