MixRea: Benchmarking Explicit-Implicit Reasoning in Large Language Models
Yuanqing Cai, Ziyi Huang, Minhao Liu, Lixin Duan, Wen Li, Yanru Zhang

TL;DR
This paper introduces MixRea, a benchmark for explicit-implicit reasoning in large language models, revealing their inattentional blindness and proposing a prompting method to improve reasoning accuracy.
Contribution
The paper presents MixRea, a comprehensive benchmark for reasoning tasks, and introduces PRCP, a novel prompting technique to address LLMs' inattentional blindness.
Findings
Even the best LLMs achieve only 42.8% consistency on MixRea.
LLMs exhibit widespread inattentional blindness across reasoning types.
PRCP improves reasoning performance by recovering overlooked causal relations.
Abstract
Large language models (LLMs) are increasingly integrated into high-stakes decision-making. Inspired by the theory of \emph{inattentional blindness} in human cognition, we investigate whether LLMs, trained on human-preferred corpora that embed attentional biases, exhibit a similar limitation: \emph{failing to attend to subtle yet important contextual cues under explicit task instructions}. To evaluate this, we introduce the task of \textbf{explicit-implicit reasoning} and present \textbf{MixRea}, a benchmark of 2,246 multiple-choice questions across 9 reasoning types with varying distributions of explicit and implicit information. Evaluation of 21 advanced LLMs shows that even the best-performing reasoning model (Gemini 2.5 Pro) achieves only 42.8\% consistency, revealing widespread inattentional blindness. To mitigate this, we propose \textbf{Potential Relation Completion Prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
