InternalInspector $I^2$: Robust Confidence Estimation in LLMs through   Internal States

Mohammad Beigi; Ying Shen; Runing Yang; Zihao Lin; Qifan Wang; Ankith; Mohan; Jianfeng He; Ming Jin; Chang-Tien Lu; Lifu Huang

arXiv:2406.12053·cs.CL·June 19, 2024

InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith, Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

PDF

Open Access 1 Video

TL;DR

InternalInspector is a new framework that improves confidence estimation in LLMs by analyzing all internal states across layers, leading to better detection of hallucinations and inaccuracies.

Contribution

It introduces a comprehensive internal state analysis using contrastive learning, surpassing existing methods in confidence calibration and hallucination detection.

Findings

01

Higher accuracy in confidence estimation across tasks

02

Lower calibration error compared to existing methods

03

Superior hallucination detection performance

Abstract

Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention states, feed-forward states, and activation states of all layers. Unlike existing methods that primarily focus on the final activation state, InternalInspector conducts a comprehensive analysis across all internal states of every layer to accurately identify both correct and incorrect prediction processes. By benchmarking InternalInspector against existing confidence estimation methods across various natural language understanding and generation tasks, including factual question answering,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States· underline

Taxonomy

TopicsStatistical Methods and Inference · Reservoir Engineering and Simulation Methods

MethodsSoftmax · Attention Is All You Need · Contrastive Learning · Focus