# Uncertainty Under the Curve: A Sequence-Level Entropy Area Metric for Reasoning LLM

**Authors:** Yongfu Zhu, Lin Sun, Guangxiang Zhao, Weihong Lin, Xiangzheng Zhang

arXiv: 2508.20384 · 2025-08-29

## TL;DR

This paper introduces the Entropy Area Score (EAS), a new metric for quantifying uncertainty in reasoning large language models, which is efficient, interpretable, and improves data selection and model accuracy.

## Contribution

The paper presents EAS, a novel sequence-level entropy area metric that does not require external models or repeated sampling, enhancing uncertainty quantification in LLMs.

## Key findings

- EAS correlates strongly with answer entropy across models and datasets.
- EAS outperforms Pass Rate filtering in data selection for training.
- EAS improves student model accuracy on math benchmarks.

## Abstract

In this work, we introduce Entropy Area Score (EAS), a simple yet effective metric to quantify uncertainty in the answer generation process of reasoning large language models (LLMs). EAS requires neither external models nor repeated sampling, it integrates token-level predictive entropy from the model itself to capture the evolution of uncertainty during generation. Empirical results show that EAS is strongly correlated with answer entropy across models and datasets. In training data selection, EAS identifies high-potential samples and consistently outperforms Pass Rate filtering under equal sample budgets, improving student model accuracy on math benchmarks. EAS is both efficient and interpretable, offering a practical tool for uncertainty modeling and data quality assessment in LLM training.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20384/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20384/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/2508.20384/full.md

---
Source: https://tomesphere.com/paper/2508.20384