DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho, Abhay Zala, Mohit Bansal

TL;DR
This paper evaluates the reasoning abilities and social biases of state-of-the-art text-to-image models, revealing significant gaps in visual reasoning skills and the presence of biases learned from web data.
Contribution
It introduces PaintSkills, a diagnostic dataset for assessing visual reasoning, and provides a comprehensive analysis of biases in recent models.
Findings
Models perform poorly on object counting and spatial reasoning tasks.
Recent models exhibit gender and skin tone biases learned from web data.
The study highlights the need for improved reasoning and bias mitigation in text-to-image models.
Abstract
Recently, DALL-E, a multimodal transformer language model, and its variants, including diffusion models, have shown high-quality text-to-image generation capabilities. However, despite the realistic image generation results, there has not been a detailed analysis of how to evaluate such models. In this work, we investigate the visual reasoning capabilities and social biases of different text-to-image models, covering both multimodal transformer language models and diffusion models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. For this, we propose PaintSkills, a compositional diagnostic evaluation dataset that measures these skills. Despite the high-fidelity image generation capability, a large gap exists between the performance of recent models and the upper bound accuracy in object counting and spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications
MethodsDiffusion
