Evaluating Numerical Reasoning in Text-to-Image Models
Ivana Kaji\'c, Olivia Wiles, Isabela Albuquerque, Matthias Bauer, Su, Wang, Jordi Pont-Tuset, Aida Nematzadeh

TL;DR
This paper evaluates the numerical reasoning abilities of text-to-image models, revealing significant limitations in their understanding of quantities, quantifiers, and complex numerical concepts, through a new benchmark called GeckoNum.
Contribution
It introduces GeckoNum, a comprehensive benchmark for assessing numerical reasoning in text-to-image models, highlighting their deficiencies in understanding various numerical concepts.
Findings
Models struggle with exact numbers beyond small quantities.
Performance drops with complex numerical concepts like fractions and zero.
Models poorly interpret linguistic quantifiers and partial quantities.
Abstract
Text-to-image generative models are capable of producing high-quality images that often faithfully depict concepts described using natural language. In this work, we comprehensively evaluate a range of text-to-image models on numerical reasoning tasks of varying difficulty, and show that even the most advanced models have only rudimentary numerical skills. Specifically, their ability to correctly generate an exact number of objects in an image is limited to small numbers, it is highly dependent on the context the number term appears in, and it deteriorates quickly with each successive number. We also demonstrate that models have poor understanding of linguistic quantifiers (such as "a few" or "as many as"), the concept of zero, and struggle with more advanced concepts such as partial quantities and fractional representations. We bundle prompts, generated images and human annotations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Visualization and Analytics · Multimodal Machine Learning Applications · Topic Modeling
