From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

Farima Fatahi Bayat; Pouya Pezeshkpour; Estevam Hruschka

arXiv:2511.10899·cs.CL·April 22, 2026

From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models

Farima Fatahi Bayat, Pouya Pezeshkpour, Estevam Hruschka

PDF

1 Repo

TL;DR

This paper investigates how tool use in large language models can lead to reasoning failures, introducing the concept of Tool-Induced Myopia (TIM) and proposing a framework to improve reasoning with tools.

Contribution

It characterizes TIM as a new failure mode in tool-augmented language models and develops a framework to mitigate reasoning degradation caused by tool use.

Findings

01

Tool use increases answer accuracy but worsens reasoning coherence.

02

Models shift from arithmetic errors to reasoning failures with more tool use.

03

The proposed framework improves both accuracy and reasoning depth.

Abstract

Tool-augmented Language Models (TaLMs) can invoke external tools to solve problems beyond their parametric capacity. However, it remains unclear whether these tool-enabled gains reflect trustworthy reasoning. Focusing on the Code Interpreter tool, we show that even when tools are selected and executed correctly, TaLMs treat tool outputs as substitutes for reasoning, producing solutions that appear correct but lack coherent justification. We term this failure mode Tool-Induced Myopia (TIM), and study it using PYMATH, a benchmark of 1,679 competition-level mathematical problems for which Python code is helpful but not sufficient. We further develop a multi-dimensional evaluation suite to quantify reasoning degradation in TaLMs relative to their non-tool counterparts. Our findings reveal that while TaLMs achieve up to a 19.3 percentage point gain in final-answer accuracy, their reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

megagonlabs/TIM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.