CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought
Boxuan Zhang, Ruqi Zhang

TL;DR
CoT-UQ introduces a response-wise uncertainty quantification framework for LLMs that leverages Chain-of-Thought reasoning to improve the accuracy of uncertainty estimates, outperforming existing methods.
Contribution
This work presents a novel response-wise UQ method that integrates Chain-of-Thought reasoning to enhance uncertainty estimation in large language models.
Findings
CoT-UQ achieves 5.9% higher AUROC on average compared to existing UQ methods.
It effectively captures critical reasoning information for better uncertainty assessment.
The method is validated on Llama models across logical and mathematical tasks.
Abstract
Large language models (LLMs) excel in many tasks but struggle to accurately quantify uncertainty in their generated responses. This limitation makes it challenging to detect misinformation and ensure reliable decision-making. Existing uncertainty quantification (UQ) methods for LLMs are primarily prompt-wise rather than response-wise, often requiring multiple response samples, which incurs high computational costs. Moreover, LLMs have been shown to be overconfident, particularly when using reasoning steps to derive their answers. In this work, we propose CoT-UQ, a response-wise UQ framework that integrates LLMs' inherent reasoning capabilities through Chain-of-Thought (CoT) into the UQ process. CoT-UQ captures critical information during inference by extracting keywords from each reasoning step and assessing their importance to the final answer. This key reasoning information is then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsLLaMA
