LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and   the Importance of Object-based Representations

Yudong Xu; Wenhao Li; Pashootan Vaezipoor; Scott Sanner; Elias B.; Khalil

arXiv:2305.18354·cs.CL·February 16, 2024·5 cites

LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations

Yudong Xu, Wenhao Li, Pashootan Vaezipoor, Scott Sanner, Elias B., Khalil

PDF

Open Access 1 Repo

TL;DR

This paper investigates GPT-4's ability to solve abstract reasoning tasks from the ARC benchmark, revealing limitations due to object representation and proposing object-based external tools to improve reasoning performance.

Contribution

The study introduces a new 1D-ARC benchmark and demonstrates that object-based representations significantly enhance GPT-4's reasoning on abstract tasks.

Findings

01

GPT-4 solves only 13/50 simple ARC tasks with textual encodings.

02

Object-based external representations nearly double GPT-4's performance on ARC tasks.

03

Object representations lead to near-perfect scores on the easier 1D-ARC.

Abstract

Can a Large Language Model (LLM) solve simple abstract reasoning problems? We explore this broad question through a systematic analysis of GPT on the Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract reasoning ability from limited examples in which solutions require some "core knowledge" of concepts such as objects, goal states, counting, and basic geometry. GPT-4 solves only 13/50 of the most straightforward ARC tasks when using textual encodings for their two-dimensional input-output grids. Our failure analysis reveals that GPT-4's capacity to identify objects and reason about them is significantly influenced by the sequential nature of the text that represents an object within a text encoding of a task. To test this hypothesis, we design a new benchmark, the 1D-ARC, which consists of one-dimensional (array-like) tasks that are more conducive to GPT-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

khalil-research/1d-arc
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Software Engineering Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Test · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · Cosine Annealing · Layer Normalization