When to Trust Tools? Adaptive Tool Trust Calibration For Tool-Integrated Math Reasoning
Ruotao Xu, Yixin Ji, Yu Luo, Jinpeng Li, Dong Li, Peifeng Li, Juntao Li, Min Zhang

TL;DR
This paper introduces Adaptive Tool Trust Calibration (ATTC), a framework that helps reasoning models decide when to trust or ignore tool results, improving accuracy in tool-integrated math reasoning tasks.
Contribution
The paper proposes ATTC, a novel method for calibrating tool trust in reasoning models, addressing the 'Tool Ignored' problem and enhancing model performance.
Findings
ATTC reduces 'Tool Ignored' cases in open-source TIR models.
Performance improves by 4.1% to 7.5% across multiple datasets.
ATTC effectively guides models to trust or ignore tools based on confidence scores.
Abstract
Large reasoning models (LRMs) have achieved strong performance enhancement through scaling test time computation, but due to the inherent limitations of the underlying language models, they still have shortcomings in tasks that require precise computation and extensive knowledge reserves. Tool-Integrated Reasoning (TIR) has emerged as a promising paradigm that incorporates tool call and execution within the reasoning trajectory. Although recent works have released some powerful open-source TIR models, our analysis reveals that these models still suffer from critical deficiencies. We find that when the reasoning of the model conflicts with the tool results, the model tends to believe in its own reasoning. And there are cases where the tool results are correct but are ignored by the model, resulting in incorrect answers, which we define as "Tool Ignored''. This indicates that the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
