Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal Transport
Yingtao Tian, Marco Cuturi, David Ha

TL;DR
This paper introduces a novel method using differentiable optimal transport to improve text-to-image synthesis, enhancing diversity and fidelity in generated images guided by multiple prompts.
Contribution
It proposes a differentiable optimal transport-based approach for multi-prompt image generation, addressing mode collapse and improving diversity over traditional mean-distance methods.
Findings
OT-based method produces more diverse images
Improved fidelity to multiple prompts
Qualitative and quantitative performance gains
Abstract
Recent advances in deep learning, such as powerful generative models and joint text-image embeddings, have provided the computational creativity community with new tools, opening new perspectives for artistic pursuits. Text-to-image synthesis approaches that operate by generating images from text cues provide a case in point. These images are generated with a latent vector that is progressively refined to agree with text cues. To do so, patches are sampled within the generated image, and compared with the text prompts in the common text-image embedding space; The latent vector is then updated, using gradient descent, to reduce the mean (average) distance between these patches and text cues. While this approach provides artists with ample freedom to customize the overall appearance of images, through their choice in generative models, the reliance on a simple criterion (mean of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
