Anchored Confabulation: Partial Evidence Non-Monotonically Amplifies Confident Hallucination in LLMs
Ashish Balkishan Lathkar

TL;DR
This paper uncovers a calibration property in large language models where partial evidence can non-monotonically increase confident hallucinations, and proposes methods to mitigate this effect.
Contribution
It formalizes anchored confabulation as Parametric Hallucination Confidence (PHC), demonstrating its presence across models and proposing a learned routing method to reduce hallucinations without fine-tuning.
Findings
PHC increases with partial evidence and hop depth.
A learned router exploiting PHC improves retrieval-augmented generation performance.
Epistemic humility prompts reduce PHC spikes and improve confidence calibration.
Abstract
We identify a previously unknown calibration property of large language models: providing one confirmed intermediate fact toward a multi-step reasoning chain increases the model's confident-wrong-answer rate before full evidence eliminates it. We call this anchored confabulation: a partial anchor commits the model to confident parametric completion of remaining reasoning steps. We formalize it as Parametric Hallucination Confidence (PHC) and establish it across six lines of evidence including a causal injection experiment (PHC 0.613 to 0.656 to 0.595 to 0.536, N=160) and capability scaling across five model families (Spearman rho=0.900, p=0.037). The Anchoring Threshold Law k*(n)=floor(n/3) predicts PHC amplification by hop depth with four confirmed predictions. Applied to RAG routing, a LearnedRouter exploiting PHC closes 81.1% of the oracle performance gap (macro F1=0.426, p<1e-6) on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
