Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM

Pedro Memoli Buffa; Luciano Del Corro

arXiv:2601.09001·cs.CL·March 4, 2026

Entropy Sentinel: Continuous LLM Accuracy Monitoring from Decoding Entropy Traces in STEM

Pedro Memoli Buffa, Luciano Del Corro

PDF

Open Access

TL;DR

This paper introduces Entropy Sentinel, a method that uses decoding entropy traces from LLMs to monitor and improve model accuracy during deployment, especially under domain shifts.

Contribution

It demonstrates that entropy-based signals can effectively estimate LLM accuracy at slice-level and domain-level, aiding scalable monitoring and targeted data collection.

Findings

01

Entropy profiles often correlate with actual accuracy across benchmarks.

02

The method generalizes across multiple LLM architectures and sizes.

03

Entropy-based monitoring can guide data acquisition to improve performance.

Abstract

Deploying LLMs raises two coupled challenges: (1) monitoring--estimating where a model underperforms as traffic and domains drift--and (2) improvement--prioritizing data acquisition to close the largest performance gaps. We test whether an inference-time signal can estimate slice-level accuracy under domain shift. For each response, we compute an output-entropy profile from final-layer next-token probabilities (from top- $k$ logprobs) and summarize it with different statistics. A lightweight classifier predicts instance correctness, and averaging predicted probabilities yields a domain-level accuracy estimate. We evaluate on ten STEM reasoning benchmarks with exhaustive train/test compositions ( $k \in {1, 2, 3, 4}$ ; all $(k 10)$ combinations), on different classifier models and features across nine LLMs from six families (3B--20B). Estimates often track held-out benchmark accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Electron Microscopy Techniques and Applications · Advanced Graph Neural Networks