Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features

Alex Heyman; Joel Zylberberg

arXiv:2505.12151·cs.LG·October 13, 2025

Reasoning Large Language Model Errors Arise from Hallucinating Critical Problem Features

Alex Heyman, Joel Zylberberg

PDF

Open Access 1 Repo

TL;DR

This paper investigates how reasoning large language models (RLLMs) often hallucinate incorrect problem features, such as graph edges, leading to reasoning errors across various complex tasks and models.

Contribution

It identifies a common hallucination error in RLLMs where they invent problem features, and demonstrates its prevalence across multiple models and problem types.

Findings

01

Hallucination of graph edges is common across models.

02

This hallucination significantly contributes to incorrect answers.

03

The phenomenon generalizes to other problem types like stable matching.

Abstract

Large language models have recently made great strides in reasoning task performance through chain-of-thought (CoT) strategies trained via reinforcement learning; however, these "reasoning large language models" (RLLMs) remain imperfect reasoners, and understanding the frequencies and causes of their failure modes is important for both users and developers. We test o1-mini, o3-mini, DeepSeek-R1, Claude 3.7 Sonnet, Gemini 2.5 Pro Preview, and Grok 3 Mini Beta on graph coloring as a variable-complexity constraint-satisfaction logic problem, and find evidence from both error rate comparisons and CoT/explanation text analysis that RLLMs are prone to hallucinate graph edges not specified in the prompt. This phenomenon persists across multiple problem complexity levels and semantic frames, and it appears to account for a significant fraction of the incorrect answers from every tested model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexheyman/rllmgraphcoloring
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)