Minority-Focused Text-to-Image Generation via Prompt Optimization

Soobin Um; Jong Chul Ye

arXiv:2410.07838·cs.CV·April 7, 2025

Minority-Focused Text-to-Image Generation via Prompt Optimization

Soobin Um, Jong Chul Ye

PDF

Open Access 1 Repo

TL;DR

This paper introduces a prompt optimization framework for text-to-image models that enhances the generation of minority samples, which are low-density, rare instances valuable for data augmentation and creative AI.

Contribution

The authors develop an online prompt optimizer and a specialized solver that promote minority feature generation without sacrificing semantic content.

Findings

01

Significantly improves minority sample generation quality.

02

Outperforms existing samplers in producing diverse low-density instances.

03

Demonstrates effectiveness across various T2I models.

Abstract

We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for high-quality generation. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that encourages emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anonymous5293/minorityprompt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsFocus · Diffusion