A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning
Zhisheng Tang, Mayank Kejriwal

TL;DR
This study evaluates ChatGPT and DALL-E 2's abilities in decision making and spatial reasoning, revealing strengths and limitations in their understanding and rationality through qualitative analysis.
Contribution
It provides a novel pilot assessment of recent generative transformer models' cognitive abilities using neutral prompts and qualitative analysis.
Findings
DALL-E 2 correctly generates at least one image per spatial prompt but often produces incorrect images.
ChatGPT shows some rational decision-making but frequently violates classical axioms.
Models exhibit unpredictable outputs, making systematic evaluation challenging.
Abstract
We conduct a pilot study selectively evaluating the cognitive abilities (decision making and spatial reasoning) of two recently released generative transformer models, ChatGPT and DALL-E 2. Input prompts were constructed following neutral a priori guidelines, rather than adversarial intent. Post hoc qualitative analysis of the outputs shows that DALL-E 2 is able to generate at least one correct image for each spatial reasoning prompt, but most images generated are incorrect (even though the model seems to have a clear understanding of the objects mentioned in the prompt). Similarly, in evaluating ChatGPT on the rationality axioms developed under the classical Von Neumann-Morgenstern utility theorem, we find that, although it demonstrates some level of rational decision-making, many of its decisions violate at least one of the axioms even under reasonable constructions of preferences,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
MethodsHigh-Order Consensuses
