Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev

TL;DR
This study compares the reasoning abilities of humans, GPT-4, and GPT-4V using the ConceptARC benchmark, revealing that neither GPT-4 model achieves human-level abstraction skills despite different prompting methods.
Contribution
It extends prior work by evaluating GPT-4 with detailed prompts and assesses GPT-4V's multimodal reasoning on image-based tasks, providing new insights into their abstraction capabilities.
Findings
GPT-4 does not exhibit human-level robust abstraction abilities.
GPT-4V's multimodal reasoning also falls short of human performance.
Detailed prompting does not significantly improve GPT-4's reasoning skills.
Abstract
We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Intelligent Tutoring Systems and Adaptive Learning
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Residual Connection
