A very preliminary analysis of DALL-E 2

Gary Marcus; Ernest Davis; Scott Aaronson

arXiv:2204.13807·cs.CV·May 4, 2022·88 cites

A very preliminary analysis of DALL-E 2

Gary Marcus, Ernest Davis, Scott Aaronson

PDF

Open Access

TL;DR

This paper provides a preliminary evaluation of DALL-E 2's ability to generate images from complex prompts, revealing limited success in meeting challenging requests and highlighting areas for improvement.

Contribution

It offers an initial, systematic assessment of DALL-E 2's common sense and reasoning capabilities using intentionally difficult prompts.

Findings

01

5 out of 14 prompts had at least one fully satisfying image

02

No prompt resulted in all images satisfying the request

03

DALL-E 2 struggles with complex, challenging prompts

Abstract

The DALL-E 2 system generates original synthetic images corresponding to an input text as caption. We report here on the outcome of fourteen tests of this system designed to assess its common sense, reasoning and ability to understand complex texts. All of our prompts were intentionally much more challenging than the typical ones that have been showcased in recent weeks. Nevertheless, for 5 out of the 14 prompts, at least one of the ten images fully satisfied our requests. On the other hand, on no prompt did all of the ten images satisfy our requests.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Mathematics, Computing, and Information Processing