Relations, Negations, and Numbers: Looking for Logic in Generative Text-to-Image Models
Colin Conwell, Rupert Tawiah-Quashie, and Tomer Ullman

TL;DR
This paper investigates the inability of state-of-the-art text-to-image models like DALL-E 3 to reliably understand and generate images based on logical operators such as relations, negations, and numbers, highlighting significant gaps in compositional reasoning.
Contribution
It provides a comprehensive evaluation of logical reasoning failures in modern generative models and proposes minimal modifications to improve their compositional capabilities.
Findings
Models struggle with negations and numbers beyond 3.
Grounded diffusion performs worse than DALL-E 3 on logical prompts.
Relational prompt frequency correlates with image match quality.
Abstract
Despite remarkable progress in multi-modal AI research, there is a salient domain in which modern AI continues to lag considerably behind even human children: the reliable deployment of logical operators. Here, we examine three forms of logical operators: relations, negations, and discrete numbers. We asked human respondents (N=178 in total) to evaluate images generated by a state-of-the-art image-generating AI (DALL-E 3) prompted with these `logical probes', and find that none reliably produce human agreement scores greater than 50\%. The negation probes and numbers (beyond 3) fail most frequently. In a 4th experiment, we assess a `grounded diffusion' pipeline that leverages targeted prompt engineering and structured intermediate representations for greater compositional control, but find its performance is judged even worse than that of DALL-E 3 across prompts. To provide further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship
