Knowing What You Know Is Not Enough: Large Language Model Confidences Don't Align With Their Actions

Arka Pal; Teo Kitanovski; Arthur Liang; Akilesh Potti; Micah Goldblum

arXiv:2511.13240·cs.LG·February 10, 2026

Knowing What You Know Is Not Enough: Large Language Model Confidences Don't Align With Their Actions

Arka Pal, Teo Kitanovski, Arthur Liang, Akilesh Potti, Micah Goldblum

PDF

Open Access

TL;DR

This paper reveals that large language models' confidence estimates in static evaluations do not reliably predict their behavior in interactive, real-world scenarios, exposing a significant gap in current evaluation methods.

Contribution

It uncovers the action-belief gap in LLMs, demonstrating that static confidence calibration does not ensure rational actions in dynamic settings.

Findings

01

Models often bet against their high-confidence predictions in prediction markets.

02

Models fail to use information-seeking tools reliably when their confidence is low.

03

High confidence does not always lead to consistent or rational actions.

Abstract

Large language models (LLMs) are increasingly deployed in agentic and multi-turn workflows where they are tasked to perform actions of significant consequence. In order to deploy them reliably and manage risky outcomes in these settings, it is helpful to access model uncertainty estimates. However, confidence elicitation methods for LLMs are typically not evaluated directly in agentic settings; instead, they are evaluated on static datasets, such as Q&A benchmarks. In this work we investigate the relationship between confidence estimates elicited in static settings and the behavior of LLMs in interactive settings. We uncover a significant action-belief gap -- LLMs frequently take actions that contradict their elicited confidences. In a prediction market setting, we find that models often bet against their own high-confidence predictions; in a tool-use setting, models fail to reliably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling