Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study
Dylan Bouchard, Mohit Singh Chauhan, Viren Bajaj, David Skarbrevik

TL;DR
This paper presents a comprehensive taxonomy and framework for fine-grained uncertainty quantification in long-form language model outputs, improving factuality detection and enabling better comparison of methods.
Contribution
It introduces a taxonomy and formalization of uncertainty quantification methods tailored for long-form LLM outputs, with experimental validation across multiple models and datasets.
Findings
Claim-response entailment outperforms claim-level scorers.
Claim-level scoring is generally more effective than sentence-level.
Uncertainty-aware decoding significantly enhances factuality.
Abstract
Uncertainty quantification has emerged as an effective approach to closed-book hallucination detection for LLMs, but existing methods are largely designed for short-form outputs and do not generalize well to long-form generation. We introduce a taxonomy for fine-grained uncertainty quantification in long-form LLM outputs that distinguishes methods by design choices at three stages: response decomposition, unit-level scoring, and response-level aggregation. We formalize several families of consistency-based black-box scorers, providing generalizations and extensions of existing methods. In our experiments across multiple LLMs and datasets, we find 1) claim-response entailment consistently performs better or on par with more complex claim-level scorers, 2) claim-level scoring generally yields better results than sentence-level scoring, and 3) uncertainty-aware decoding is highly effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
