TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Tunyu Zhang; Haizhou Shi; Yibin Wang; Hengyi Wang; Xiaoxiao He; Zhuowei Li; Haoxian Chen; Ligong Han; Kai Xu; Huan Zhang; Dimitris Metaxas; Hao Wang

arXiv:2505.11737·cs.LG·April 14, 2026

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, Dimitris Metaxas, Hao Wang

PDF

1 Video

TL;DR

TokUR introduces a token-level uncertainty estimation method for LLMs that improves reasoning reliability and interpretability by self-assessment and self-improvement during mathematical tasks.

Contribution

The paper presents a novel low-rank random weight perturbation technique for token-level uncertainty estimation in LLMs, enhancing their reasoning performance.

Findings

01

TokUR's uncertainty correlates strongly with answer correctness.

02

Uncertainty signals improve model robustness in reasoning tasks.

03

Method is scalable and effective across various difficulty levels.

Abstract

While Large Language Models (LLMs) have demonstrated impressive capabilities, their output quality remains inconsistent across various application scenarios, making it difficult to identify trustworthy responses, especially in complex tasks requiring multi-step reasoning. In this paper, we propose a Token-level Uncertainty estimation framework for Reasoning (TokUR) that enables LLMs to self-assess and self-improve their responses in mathematical reasoning. Specifically, we introduce low-rank random weight perturbation during LLM decoding to generate predictive distributions for token-level uncertainty estimation, and we aggregate these uncertainty quantities to capture the semantic uncertainty of generated responses. Experiments on mathematical reasoning datasets of varying difficulty demonstrate that TokUR exhibits a strong correlation with answer correctness and model robustness, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning· slideslive