Multi-group Uncertainty Quantification for Long-form Text Generation
Terrance Liu, Zhiwei Steven Wu

TL;DR
This paper investigates the reliability of uncertainty quantification methods in long-form text generated by large language models, especially within specific subgroups, and proposes group-conditional approaches to improve subgroup guarantees.
Contribution
It introduces the application of multicalibration and multivalid conformal prediction to long-form text generation, demonstrating their effectiveness in subgroup calibration and prediction guarantees.
Findings
Canonical methods perform well globally but fail in subgroups.
Group-conditional methods improve subgroup calibration.
The study establishes a benchmark for uncertainty quantification in long-form text generation.
Abstract
While past works have shown how uncertainty quantification can be applied to large language model (LLM) outputs, the question of whether resulting uncertainty guarantees still hold within sub-groupings of data remains open. In our work, given some long-form text generated by an LLM, we study uncertainty at both the level of individual claims contained within the output (via calibration) and across the entire output itself (via conformal prediction). Using biography generation as a testbed for this study, we derive a set of (demographic) attributes (e.g., whether some text describes a man or woman) for each generation to form such "subgroups" of data. We find that although canonical methods for both types of uncertainty quantification perform well when measuring across the entire dataset, such guarantees break down when examining particular subgroups. Having established this issue, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
