TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
Jun Li, Zedong Zhang, Jian Yang

TL;DR
This paper introduces TP2O, a novel method for creative text-to-image generation that uses balance swap-sampling to produce high-quality, diverse combinatorial objects by exchanging and selecting image components based on CLIP distances.
Contribution
The paper presents a new balance swap-sampling technique that enhances creative combinatorial object generation in text-to-image synthesis, outperforming recent state-of-the-art methods.
Findings
Outperforms recent SOTA T2I methods in experiments.
Achieves results comparable to human artists.
Effectively generates diverse and high-quality combinatorial objects.
Abstract
Generating creative combinatorial objects from two seemingly unrelated object texts is a challenging task in text-to-image synthesis, often hindered by a focus on emulating existing data distributions. In this paper, we develop a straightforward yet highly effective method, called \textbf{balance swap-sampling}. First, we propose a swapping mechanism that generates a novel combinatorial object image set by randomly exchanging intrinsic elements of two text embeddings through a cutting-edge diffusion model. Second, we introduce a balance swapping region to efficiently sample a small subset from the newly generated image set by balancing CLIP distances between the new images and their original generations, increasing the likelihood of accepting the high-quality combinations. Last, we employ a segmentation method to compare CLIP distances among the segmented components, ultimately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training · Diffusion · Focus
