InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States
Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith, Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

TL;DR
InternalInspector is a new framework that improves confidence estimation in LLMs by analyzing all internal states across layers, leading to better detection of hallucinations and inaccuracies.
Contribution
It introduces a comprehensive internal state analysis using contrastive learning, surpassing existing methods in confidence calibration and hallucination detection.
Findings
Higher accuracy in confidence estimation across tasks
Lower calibration error compared to existing methods
Superior hallucination detection performance
Abstract
Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention states, feed-forward states, and activation states of all layers. Unlike existing methods that primarily focus on the final activation state, InternalInspector conducts a comprehensive analysis across all internal states of every layer to accurately identify both correct and incorrect prediction processes. By benchmarking InternalInspector against existing confidence estimation methods across various natural language understanding and generation tasks, including factual question answering,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Methods and Inference · Reservoir Engineering and Simulation Methods
MethodsSoftmax · Attention Is All You Need · Contrastive Learning · Focus
