# Entropy-Guided Loop: Achieving Reasoning through Uncertainty-Aware Generation

**Authors:** Andrew G. A. Correa, Ana C. H de Matos

arXiv: 2509.00079 · 2025-09-03

## TL;DR

This paper introduces an entropy-guided refinement loop that uses token-level uncertainty to improve reasoning accuracy of small models efficiently, approaching the performance of larger models at lower cost.

## Contribution

The paper presents a novel, lightweight test-time refinement method that leverages token uncertainty to selectively improve reasoning outputs of small models.

## Key findings

- Approaches 95% of large model quality with one-third the cost.
- Refines about 31% of responses, boosting accuracy by 16 percentage points.
- Provides a practical middle ground between single-pass inference and costly reasoning chains.

## Abstract

Reasoning models often outperform smaller models but at 3--5$\times$ higher cost and added latency. We present entropy-guided refinement: a lightweight, test-time loop that uses token-level uncertainty to trigger a single, targeted refinement pass. We extract logprobs, compute Shannon entropy on top-$k$ alternatives, and apply a simple OR-logic trigger over perplexity, maximum token entropy, and low-confidence-token count. Unlike approaches that use entropy only for measurement or decoding, we pass a compact uncertainty report (tokens, confidences, alternatives, context) back to the model to guide corrective edits. On representative technical queries across reasoning, mathematics, and code generation tasks, a small model with our loop approaches 95\% of a reference reasoning model's quality at approximately one-third of the cost. The method achieves selective refinement on ~31\% of responses while improving accuracy by 16 percentage points over single-pass inference. We demonstrate that this uncertainty-aware loop provides an effective middle ground between single-pass inference and expensive reasoning chains, making it practical for production deployments where both quality and cost matter.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00079/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00079/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/2509.00079/full.md

---
Source: https://tomesphere.com/paper/2509.00079