From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark

Chao Lei; Nir Lipovetzky; Krista A. Ehinger; Yanchuan Chang

arXiv:2505.17482·cs.AI·May 26, 2025

From Reasoning to Generalization: Knowledge-Augmented LLMs for ARC Benchmark

Chao Lei, Nir Lipovetzky, Krista A. Ehinger, Yanchuan Chang

PDF

TL;DR

This paper evaluates reasoning-oriented large language models on the ARC benchmark, introduces a knowledge augmentation method called KAAR that improves reasoning and generalization, and demonstrates significant performance gains over existing methods.

Contribution

The paper proposes KAAR, a novel knowledge augmentation framework that enhances LLM reasoning by incorporating hierarchical priors, leading to improved performance on the ARC benchmark.

Findings

01

KAAR outperforms non-augmented methods with around 5% absolute gains.

02

Repeated-sampling planning-aided code generation (RSPC) achieves high accuracy and generalization.

03

ARC remains a challenging benchmark for reasoning-oriented LLMs.

Abstract

Recent reasoning-oriented LLMs have demonstrated strong performance on challenging tasks such as mathematics and science examinations. However, core cognitive faculties of human intelligence, such as abstract reasoning and generalization, remain underexplored. To address this, we evaluate recent reasoning-oriented LLMs on the Abstraction and Reasoning Corpus (ARC) benchmark, which explicitly demands both faculties. We formulate ARC as a program synthesis task and propose nine candidate solvers. Experimental results show that repeated-sampling planning-aided code generation (RSPC) achieves the highest test accuracy and demonstrates consistent generalization across most LLMs. To further improve performance, we introduce an ARC solver, Knowledge Augmentation for Abstract Reasoning (KAAR), which encodes core knowledge priors within an ontology that classifies priors into three hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.