Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs
Duygu Nur Yaldiz, Yavuz Faruk Bakman, Baturalp Buyukates, Chenyang, Tao, Anil Ramakrishna, Dimitrios Dimitriadis, Jieyu Zhao, Salman Avestimehr

TL;DR
This paper introduces Learnable Response Scoring (LARS), a trainable scoring function that improves uncertainty estimation in large language models by capturing complex token dependencies, leading to more reliable confidence assessments.
Contribution
The paper proposes LARS, a novel supervised scoring function that outperforms existing methods in uncertainty estimation for LLMs across multiple tasks.
Findings
LARS achieves up to 16% AUROC improvement over existing methods.
LARS effectively captures complex token dependencies for better uncertainty calibration.
Experimental results demonstrate LARS's superior performance in QA and reasoning tasks.
Abstract
Uncertainty estimation (UE) of generative large language models (LLMs) is crucial for evaluating the reliability of generated sequences. A significant subset of UE methods utilize token probabilities to assess uncertainty, aggregating multiple token probabilities into a single UE score using a scoring function. Existing scoring functions for probability-based UE, such as length-normalized scoring and semantic contribution-based weighting, are designed to solve certain aspects of the problem but exhibit limitations, including the inability to handle biased probabilities and complex semantic dependencies between tokens. To address these issues, in this work, we propose Learnable Response Scoring (LARS) function, a novel scoring function that leverages supervised data to capture complex dependencies between tokens and probabilities, thereby producing more reliable and calibrated response…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Stream Mining Techniques · Simulation Techniques and Applications · Semantic Web and Ontologies
MethodsLARS
