MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in   Generative LLMs

Yavuz Faruk Bakman; Duygu Nur Yaldiz; Baturalp Buyukates; Chenyang; Tao; Dimitrios Dimitriadis; Salman Avestimehr

arXiv:2402.11756·cs.CL·February 14, 2025·1 cites

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, Chenyang, Tao, Dimitrios Dimitriadis, Salman Avestimehr

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MARS, a novel meaning-aware scoring method for uncertainty estimation in generative LLMs, improving reliability by considering semantic contributions of tokens, validated across multiple datasets and models.

Contribution

Proposes MARS, a semantic-aware scoring function that enhances uncertainty estimation in generative LLMs over traditional length-normalized methods.

Findings

01

MARS significantly improves uncertainty estimation performance.

02

MARS outperforms length-normalized scoring across multiple datasets.

03

Validated effectiveness on medical question-answering data.

Abstract

Generative Large Language Models (LLMs) are widely utilized for their excellence in various tasks. However, their tendency to produce inaccurate or misleading outputs poses a potential risk, particularly in high-stakes environments. Therefore, estimating the correctness of generative LLM outputs is an important task for enhanced reliability. Uncertainty Estimation (UE) in generative LLMs is an evolving domain, where SOTA probability-based methods commonly employ length-normalized scoring. In this work, we propose Meaning-Aware Response Scoring (MARS) as an alternative to length-normalized scoring for UE methods. MARS is a novel scoring function that considers the semantic contribution of each token in the generated sequence in the context of the question. We demonstrate that integrating MARS into UE methods results in a universal and significant improvement in UE performance. We conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ybakman/llm_uncertainity
pytorchOfficial

Videos

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Software Engineering Research