LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations
Yudong Xu, Wenhao Li, Pashootan Vaezipoor, Scott Sanner, Elias B., Khalil

TL;DR
This paper investigates GPT-4's ability to solve abstract reasoning tasks from the ARC benchmark, revealing limitations due to object representation and proposing object-based external tools to improve reasoning performance.
Contribution
The study introduces a new 1D-ARC benchmark and demonstrates that object-based representations significantly enhance GPT-4's reasoning on abstract tasks.
Findings
GPT-4 solves only 13/50 simple ARC tasks with textual encodings.
Object-based external representations nearly double GPT-4's performance on ARC tasks.
Object representations lead to near-perfect scores on the easier 1D-ARC.
Abstract
Can a Large Language Model (LLM) solve simple abstract reasoning problems? We explore this broad question through a systematic analysis of GPT on the Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract reasoning ability from limited examples in which solutions require some "core knowledge" of concepts such as objects, goal states, counting, and basic geometry. GPT-4 solves only 13/50 of the most straightforward ARC tasks when using textual encodings for their two-dimensional input-output grids. Our failure analysis reveals that GPT-4's capacity to identify objects and reason about them is significantly influenced by the sequential nature of the text that represents an object within a text encoding of a task. To test this hypothesis, we design a new benchmark, the 1D-ARC, which consists of one-dimensional (array-like) tasks that are more conducive to GPT-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Test · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · Cosine Annealing · Layer Normalization
