Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Melanie Mitchell; Alessandro B. Palmarini; Arseny Moskvichev

arXiv:2311.09247·cs.AI·December 25, 2023·20 cites

Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks

Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev

PDF

Open Access

TL;DR

This study compares the reasoning abilities of humans, GPT-4, and GPT-4V using the ConceptARC benchmark, revealing that neither GPT-4 model achieves human-level abstraction skills despite different prompting methods.

Contribution

It extends prior work by evaluating GPT-4 with detailed prompts and assesses GPT-4V's multimodal reasoning on image-based tasks, providing new insights into their abstraction capabilities.

Findings

01

GPT-4 does not exhibit human-level robust abstraction abilities.

02

GPT-4V's multimodal reasoning also falls short of human performance.

03

Detailed prompting does not significantly improve GPT-4's reasoning skills.

Abstract

We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Intelligent Tutoring Systems and Adaptive Learning

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Residual Connection