TL;DR
This paper introduces Neighbor-Consistency Belief (NCB), a structural measure to evaluate and improve the robustness of LLMs' factual beliefs under contextual perturbations, addressing limitations of point-wise confidence metrics.
Contribution
It proposes NCB as a new belief robustness metric, a stress-testing protocol for LLMs, and a Structure-Aware Training method to enhance belief stability.
Findings
High-NCB data resists contextual interference better.
NCB correlates with response stability across models.
SAT reduces knowledge brittleness by approximately 30%.
Abstract
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
