What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models

Tian Yun; Chen Sun; Ellie Pavlick

arXiv:2507.22457·cs.CL·July 31, 2025

What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models

Tian Yun, Chen Sun, Ellie Pavlick

PDF

1 Video

TL;DR

This paper reevaluates the claim that large language models are not true abstract reasoners, showing that with minimal tuning they can perform well, but this doesn't always transfer across datasets, prompting a reexamination of what constitutes reasoning.

Contribution

It demonstrates that small parameter tuning can significantly improve LLMs' reasoning performance, but transferability remains limited, challenging previous assumptions about their reasoning capabilities.

Findings

01

Parameter tuning enables near-perfect zero-shot performance.

02

Transferability of tuned models across datasets is limited.

03

Reconsideration of what defines an 'abstract reasoner'.

Abstract

Recent work has argued that large language models (LLMs) are not "abstract reasoners", citing their poor zero-shot performance on a variety of challenging tasks as evidence. We revisit these experiments in order to add nuance to the claim. First, we show that while LLMs indeed perform poorly in a zero-shot setting, even tuning a small subset of parameters for input encoding can enable near-perfect performance. However, we also show that this finetuning does not necessarily transfer across datasets. We take this collection of empirical results as an invitation to (re-)open the discussion of what it means to be an "abstract reasoner", and why it matters whether LLMs fit the bill.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models· underline