The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Weihao Xuan; Qingcheng Zeng; Heli Qi; Yunze Xiao; Junjue Wang; Naoto Yokoya

arXiv:2601.07264·cs.CL·January 13, 2026

The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents

Weihao Xuan, Qingcheng Zeng, Heli Qi, Yunze Xiao, Junjue Wang, Naoto Yokoya

PDF

Open Access

TL;DR

This paper investigates calibration issues in tool-using language agents, revealing a confidence dichotomy based on tool type and proposing RL fine-tuning to improve trustworthiness and generalization.

Contribution

It uncovers a fundamental confidence dichotomy in tool-use agents and introduces a reinforcement learning framework to enhance calibration across different tool types.

Findings

01

Evidence tools cause overconfidence due to noisy information.

02

Verification tools help ground reasoning and reduce miscalibration.

03

Agents trained with our method generalize well to noisy and diverse domains.

Abstract

Autonomous agents based on large language models (LLMs) are rapidly evolving to handle multi-turn tasks, but ensuring their trustworthiness remains a critical challenge. A fundamental pillar of this trustworthiness is calibration, which refers to an agent's ability to express confidence that reliably reflects its actual performance. While calibration is well-established for static models, its dynamics in tool-integrated agentic workflows remain underexplored. In this work, we systematically investigate verbalized calibration in tool-use agents, revealing a fundamental confidence dichotomy driven by tool type. Specifically, our pilot study identifies that evidence tools (e.g., web search) systematically induce severe overconfidence due to inherent noise in retrieved information, while verification tools (e.g., code interpreters) can ground reasoning through deterministic feedback and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education