Interpreting and Mitigating Unwanted Uncertainty in LLMs

Tiasa Singha Roy; Ayush Rajesh Jhaveri; Ilias Triantafyllopoulos

arXiv:2510.22866·cs.CL·October 28, 2025

Interpreting and Mitigating Unwanted Uncertainty in LLMs

Tiasa Singha Roy, Ayush Rajesh Jhaveri, Ilias Triantafyllopoulos

PDF

TL;DR

This paper investigates the causes of unwanted answer-flipping in Large Language Models and proposes a masking technique targeting specific attention heads to reduce this uncertainty-driven failure mode.

Contribution

It identifies non-retrieval attention heads responsible for misleading attention and demonstrates that masking them reduces answer-flipping by up to 15%.

Findings

01

Masking certain attention heads decreases flip behavior.

02

Retrieval heads are not the main cause of uncertainty.

03

Trade-offs exist between flip mitigation and downstream performance.

Abstract

Despite their impressive capabilities, Large Language Models (LLMs) exhibit unwanted uncertainty, a phenomenon where a model changes a previously correct answer into an incorrect one when re-prompted. This behavior undermines trust and poses serious risks in high-stakes domains. In this work, we investigate the mechanisms that drive this phenomenon. We adapt the Needle-in-a-Haystack retrieval framework and integrate a Flip-style re-evaluation prompt to simulate realistic answer-flipping scenarios. We find that retrieval heads are not primarily responsible for avoiding uncertainty. Instead, we identify a small set of non-retrieval attention heads that disproportionately attend to misleading tokens in uncertain contexts. Masking these heads yields significant improvements, reducing flip behavior by up to 15% without introducing incoherence or overcorrection. However, when tested for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.