Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

TL;DR
This paper introduces Commonsense-T2I, a benchmark to evaluate text-to-image models' ability to generate images consistent with real-world commonsense, revealing significant gaps even in state-of-the-art models.
Contribution
It presents a new adversarial benchmark dataset for assessing commonsense reasoning in T2I models and provides a comprehensive evaluation of current models' performance.
Findings
State-of-the-art models achieve less than 50% accuracy
GPT-enriched prompts do not significantly improve results
There is a large gap between generated images and real photos
Abstract
We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I. Given two adversarial text prompts containing an identical set of action words with minor differences, such as "a lightbulb without electricity" v.s. "a lightbulb with electricity", we evaluate whether T2I models can conduct visual-commonsense reasoning, e.g. produce images that fit "the lightbulb is unlit" vs. "the lightbulb is lit" correspondingly. Commonsense-T2I presents an adversarial challenge, providing pairwise text prompts along with expected outputs. The dataset is carefully hand-curated by experts and annotated with fine-grained labels, such as commonsense type and likelihood of the expected outputs, to assist analyzing model behavior. We benchmark a variety of state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Mathematics, Computing, and Information Processing
MethodsSparse Evolutionary Training · ALIGN · Diffusion
