MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs
Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, Chenyang, Tao, Dimitrios Dimitriadis, Salman Avestimehr

TL;DR
This paper introduces MARS, a novel meaning-aware scoring method for uncertainty estimation in generative LLMs, improving reliability by considering semantic contributions of tokens, validated across multiple datasets and models.
Contribution
Proposes MARS, a semantic-aware scoring function that enhances uncertainty estimation in generative LLMs over traditional length-normalized methods.
Findings
MARS significantly improves uncertainty estimation performance.
MARS outperforms length-normalized scoring across multiple datasets.
Validated effectiveness on medical question-answering data.
Abstract
Generative Large Language Models (LLMs) are widely utilized for their excellence in various tasks. However, their tendency to produce inaccurate or misleading outputs poses a potential risk, particularly in high-stakes environments. Therefore, estimating the correctness of generative LLM outputs is an important task for enhanced reliability. Uncertainty Estimation (UE) in generative LLMs is an evolving domain, where SOTA probability-based methods commonly employ length-normalized scoring. In this work, we propose Meaning-Aware Response Scoring (MARS) as an alternative to length-normalized scoring for UE methods. MARS is a novel scoring function that considers the semantic contribution of each token in the generated sequence in the context of the question. We demonstrate that integrating MARS into UE methods results in a universal and significant improvement in UE performance. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Software Engineering Research
