ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization

Haosheng Gan; Berk Tinaz; Mohammad Shahab Sepehri; Zalan Fabian; Mahdi Soltanolkotabi

arXiv:2507.03275·cs.CV·July 8, 2025

ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization

Haosheng Gan, Berk Tinaz, Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

PDF

TL;DR

ConceptMix++ introduces an iterative prompt optimization framework that enhances fair benchmarking of text-to-image models by systematically refining prompts using vision-language feedback, revealing hidden capabilities and improving evaluation accuracy.

Contribution

It presents a novel multimodal optimization pipeline that disentangles prompt phrasing from visual capabilities, enabling more accurate and fair comparisons of T2I models.

Findings

01

Optimized prompts improve compositional generation performance.

02

Certain visual concepts benefit more from prompt optimization.

03

Optimized prompts transfer effectively across different models.

Abstract

Current text-to-image (T2I) benchmarks evaluate models on rigid prompts, potentially underestimating true generative capabilities due to prompt sensitivity and creating biases that favor certain models while disadvantaging others. We introduce ConceptMix++, a framework that disentangles prompt phrasing from visual generation capabilities by applying iterative prompt optimization. Building on ConceptMix, our approach incorporates a multimodal optimization pipeline that leverages vision-language model feedback to refine prompts systematically. Through extensive experiments across multiple diffusion models, we show that optimized prompts significantly improve compositional generation performance, revealing previously hidden model capabilities and enabling fairer comparisons across T2I models. Our analysis reveals that certain visual concepts -- such as spatial relationships and shapes --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.