Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration

Xin Liu; Lu Wang

arXiv:2604.12046·cs.CL·April 15, 2026

Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration

Xin Liu, Lu Wang

PDF

TL;DR

This paper introduces CURE, a framework that enhances long-form generation factuality by enabling LLMs to reason about uncertainty at the claim level, leading to more accurate and calibrated outputs.

Contribution

CURE is the first approach to teach LLMs claim-level reasoning about uncertainty, improving factual accuracy and calibration in long-form generation.

Findings

01

CURE improves claim-level accuracy by up to 39.9% on Biography generation.

02

CURE increases AUROC by 16.0% on FactBench, indicating better calibration.

03

CURE maintains factual recall while enhancing factuality.

Abstract

Large language models (LLMs) often hallucinate in long-form generation. Existing approaches mainly improve factuality through post-hoc revision or reinforcement learning (RL) with correctness-based rewards, but they do not teach the model to estimate which parts of its generation are reliable. As a result, models may still state incorrect claims confidently in their responses. Recent advances in reasoning have significantly improved LLM performance, and have been leveraged to estimate confidence by incorporating calibration into RL objectives. However, existing approaches remain limited to a single scalar confidence for the entire response, which is insufficient for long-form generation where uncertainty varies across individual claims. To mitigate this problem, we propose CURE, a framework that improves long-form factuality by teaching LLMs to reason about uncertainty at the claim…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.