Reasoning Abilities of Large Language Models: In-Depth Analysis on the   Abstraction and Reasoning Corpus

Seungpil Lee; Woochang Sim; Donghyeon Shin; Wongyu Seo and; Jiwon Park; Seokki Lee; Sanha Hwang; Sejin Kim; Sundong Kim

arXiv:2403.11793·cs.CL·November 26, 2024·2 cites

Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

Seungpil Lee, Woochang Sim, Donghyeon Shin, Wongyu Seo and, Jiwon Park, Seokki Lee, Sanha Hwang, Sejin Kim, Sundong Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a process-centric evaluation method for LLM reasoning abilities using the ARC benchmark, focusing on logical coherence, compositionality, and productivity, revealing gaps compared to human reasoning.

Contribution

It presents a novel approach based on the Language of Thought Hypothesis to assess reasoning processes, not just results, in LLMs.

Findings

01

LLMs show some inference ability but lag behind humans in reasoning.

02

The LoTH perspective offers new insights into AI reasoning development.

03

Evaluation reveals significant gaps in LLM reasoning capabilities.

Abstract

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to evaluate the inference and contextual understanding abilities of LLMs in a process-centric manner, focusing on three key components from the Language of Thought Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity. Our carefully designed experiments reveal that while LLMs demonstrate some inference capabilities, they still significantly lag behind human-level reasoning in these three aspects. The main contribution of this paper lies in introducing the LoTH perspective, which provides a method for evaluating the reasoning process that conventional results-oriented approaches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GIST-DSLab/ARC_Prompt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling