A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and   Spatial Reasoning

Zhisheng Tang; Mayank Kejriwal

arXiv:2302.09068·cs.AI·February 21, 2023·6 cites

A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning

Zhisheng Tang, Mayank Kejriwal

PDF

Open Access

TL;DR

This study evaluates ChatGPT and DALL-E 2's abilities in decision making and spatial reasoning, revealing strengths and limitations in their understanding and rationality through qualitative analysis.

Contribution

It provides a novel pilot assessment of recent generative transformer models' cognitive abilities using neutral prompts and qualitative analysis.

Findings

01

DALL-E 2 correctly generates at least one image per spatial prompt but often produces incorrect images.

02

ChatGPT shows some rational decision-making but frequently violates classical axioms.

03

Models exhibit unpredictable outputs, making systematic evaluation challenging.

Abstract

We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

MethodsHigh-Order Consensuses