Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study

Dylan Bouchard; Mohit Singh Chauhan; Viren Bajaj; David Skarbrevik

arXiv:2602.17431·cs.CL·February 20, 2026

Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study

Dylan Bouchard, Mohit Singh Chauhan, Viren Bajaj, David Skarbrevik

PDF

Open Access

TL;DR

This paper presents a comprehensive taxonomy and framework for fine-grained uncertainty quantification in long-form language model outputs, improving factuality detection and enabling better comparison of methods.

Contribution

It introduces a taxonomy and formalization of uncertainty quantification methods tailored for long-form LLM outputs, with experimental validation across multiple models and datasets.

Findings

01

Claim-response entailment outperforms claim-level scorers.

02

Claim-level scoring is generally more effective than sentence-level.

03

Uncertainty-aware decoding significantly enhances factuality.

Abstract

Uncertainty quantification has emerged as an effective approach to closed-book hallucination detection for LLMs, but existing methods are largely designed for short-form outputs and do not generalize well to long-form generation. We introduce a taxonomy for fine-grained uncertainty quantification in long-form LLM outputs that distinguishes methods by design choices at three stages: response decomposition, unit-level scoring, and response-level aggregation. We formalize several families of consistency-based black-box scorers, providing generalizations and extensions of existing methods. In our experiments across multiple LLMs and datasets, we find 1) claim-response entailment consistently performs better or on par with more complex claim-level scorers, 2) claim-level scoring generally yields better results than sentence-level scoring, and 3) uncertainty-aware decoding is highly effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling