Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks

Xinhe Wang; Jin Huang; Xingjian Zhang; Tianhao Wang; Jiaqi W. Ma

arXiv:2512.21329·cs.CL·January 12, 2026

Your Reasoning Benchmark May Not Test Reasoning: Revealing Perception Bottleneck in Abstract Reasoning Benchmarks

Xinhe Wang, Jin Huang, Xingjian Zhang, Tianhao Wang, Jiaqi W. Ma

PDF

Open Access

TL;DR

This paper demonstrates that performance gaps in abstract reasoning benchmarks are primarily due to perception limitations, not reasoning deficiencies, highlighting the need to separate perception and reasoning in evaluations.

Contribution

It introduces a two-stage pipeline that isolates perception from reasoning, revealing perception as the main bottleneck in current benchmarks.

Findings

01

Perception is the dominant factor in performance gaps.

02

Approximately 80% of model failures are due to perception errors.

03

Benchmarks conflate perception and reasoning challenges.

Abstract

Reasoning benchmarks such as the Abstraction and Reasoning Corpus (ARC) and ARC-AGI are widely used to assess progress in artificial intelligence and are often interpreted as probes of core, so-called ``fluid'' reasoning abilities. Despite their apparent simplicity for humans, these tasks remain challenging for frontier vision-language models (VLMs), a gap commonly attributed to deficiencies in machine reasoning. We challenge this interpretation and hypothesize that the gap arises primarily from limitations in visual perception rather than from shortcomings in inductive reasoning. To verify this hypothesis, we introduce a two-stage experimental pipeline that explicitly separates perception and reasoning. In the perception stage, each image is independently converted into a natural-language description, while in the reasoning stage a model induces and applies rules using these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI