A Context-Aware Dual-Metric Framework for Confidence Estimation in Large Language Models
Mingruo Yuan, Shuyi Zhang, Ben Kao

TL;DR
This paper introduces CRUX, a novel framework for confidence estimation in large language models that considers context relevance through two new metrics, improving trustworthiness especially in safety-critical applications.
Contribution
CRUX is the first framework to integrate context faithfulness and consistency for confidence estimation in LLMs, using contrastive sampling and global answer consistency metrics.
Findings
CRUX achieves the highest AUROC on multiple benchmark datasets.
It effectively captures data uncertainty and model confidence.
Outperforms existing confidence estimation methods.
Abstract
Accurate confidence estimation is essential for trustworthy large language models (LLMs) systems, as it empowers the user to determine when to trust outputs and enables reliable deployment in safety-critical applications. Current confidence estimation methods for LLMs neglect the relevance between responses and contextual information, a crucial factor in output quality evaluation, particularly in scenarios where background knowledge is provided. To bridge this gap, we propose CRUX (Context-aware entropy Reduction and Unified consistency eXamination), the first framework that integrates context faithfulness and consistency for confidence estimation via two novel metrics. First, contextual entropy reduction represents data uncertainty with the information gain through contrastive sampling with and without context. Second, unified consistency examination captures potential model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
