A very preliminary analysis of DALL-E 2
Gary Marcus, Ernest Davis, Scott Aaronson

TL;DR
This paper provides a preliminary evaluation of DALL-E 2's ability to generate images from complex prompts, revealing limited success in meeting challenging requests and highlighting areas for improvement.
Contribution
It offers an initial, systematic assessment of DALL-E 2's common sense and reasoning capabilities using intentionally difficult prompts.
Findings
5 out of 14 prompts had at least one fully satisfying image
No prompt resulted in all images satisfying the request
DALL-E 2 struggles with complex, challenging prompts
Abstract
The DALL-E 2 system generates original synthetic images corresponding to an input text as caption. We report here on the outcome of fourteen tests of this system designed to assess its common sense, reasoning and ability to understand complex texts. All of our prompts were intentionally much more challenging than the typical ones that have been showcased in recent weeks. Nevertheless, for 5 out of the 14 prompts, at least one of the ten images fully satisfied our requests. On the other hand, on no prompt did all of the ten images satisfy our requests.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing
