Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection

Xie Xiaohu; Liu Xiaohu; Yao Benjamin

arXiv:2603.06604·cs.LG·March 10, 2026

Know When You're Wrong: Aligning Confidence with Correctness for LLM Error Detection

Xie Xiaohu, Liu Xiaohu, Yao Benjamin

PDF

Open Access

TL;DR

This paper introduces a normalized confidence score for LLMs to reliably detect errors and hallucinations, improving trustworthiness and enabling efficient retrieval-augmented generation.

Contribution

It proposes a confidence scoring method, analyzes calibration effects of training techniques, and demonstrates practical error detection and correction in LLMs.

Findings

01

Supervised fine-tuning improves confidence calibration.

02

RL methods like PPO and DPO cause overconfidence.

03

Adaptive retrieval with confidence scores enhances accuracy with fewer retrievals.

Abstract

As large language models (LLMs) are increasingly deployed in critical decision-making systems, the lack of reliable methods to measure their uncertainty presents a fundamental trustworthiness risk. We introduce a normalized confidence score based on output anchor token probabilities: classification labels for structured tasks and self-evaluation responses (Yes/No) for open-ended generation. This enables direct detection of errors and hallucinations with minimal overhead and without external validation. We make three key contributions. First, we propose a normalized confidence score and self-evaluation framework that exposes reliable confidence estimates for error detection across seven diverse benchmark tasks and five LLMs of varying architectures and sizes. Second, our theoretical analysis reveals that supervised fine-tuning (SFT) yields well-calibrated confidence through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques