TP2O: Creative Text Pair-to-Object Generation using Balance   Swap-Sampling

Jun Li; Zedong Zhang; Jian Yang

arXiv:2310.01819·cs.CV·July 19, 2024

TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling

Jun Li, Zedong Zhang, Jian Yang

PDF

Open Access

TL;DR

This paper introduces TP2O, a novel method for creative text-to-image generation that uses balance swap-sampling to produce high-quality, diverse combinatorial objects by exchanging and selecting image components based on CLIP distances.

Contribution

The paper presents a new balance swap-sampling technique that enhances creative combinatorial object generation in text-to-image synthesis, outperforming recent state-of-the-art methods.

Findings

01

Outperforms recent SOTA T2I methods in experiments.

02

Achieves results comparable to human artists.

03

Effectively generates diverse and high-quality combinatorial objects.

Abstract

Generating creative combinatorial objects from two seemingly unrelated object texts is a challenging task in text-to-image synthesis, often hindered by a focus on emulating existing data distributions. In this paper, we develop a straightforward yet highly effective method, called \textbf{balance swap-sampling}. First, we propose a swapping mechanism that generates a novel combinatorial object image set by randomly exchanging intrinsic elements of two text embeddings through a cutting-edge diffusion model. Second, we introduce a balance swapping region to efficiently sample a small subset from the newly generated image set by balancing CLIP distances between the new images and their original generations, increasing the likelihood of accepting the high-quality combinations. Last, we employ a segmentation method to compare CLIP distances among the segmented components, ultimately…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training · Diffusion · Focus