Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

Amin Banayeeanzade; Qingchuan Yang; Dhruv Tarsadiya; Fatemeh Bahrani; Leonardo Blas; Alfy Samuel; Robin Jia; Meisam Razaviyayn; Sai Praneeth Karimireddy

arXiv:2605.11128·cs.CL·May 13, 2026

Sampling More, Getting Less: Calibration is the Diversity Bottleneck in LLMs

Amin Banayeeanzade, Qingchuan Yang, Dhruv Tarsadiya, Fatemeh Bahrani, Leonardo Blas, Alfy Samuel, Robin Jia, Meisam Razaviyayn, Sai Praneeth Karimireddy

PDF

TL;DR

This paper investigates how probability distribution miscalibration during decoding causes diversity collapse in large language models, affecting their output variety and proposing diagnostic tools to analyze this issue.

Contribution

It introduces a validity-diversity framework that decomposes diversity collapse into order and shape miscalibration, providing formal analysis and empirical diagnostics.

Findings

01

Diversity collapse stems from order and shape miscalibration in LLMs.

02

Local calibration failures compound over decoding steps, reducing diversity.

03

Across 14 models, diversity issues are linked to distribution miscalibration, not sampling heuristics.

Abstract

Diversity is essential for language-model applications ranging from creative generation to scientific discovery, yet modern LLMs often collapse into a narrow subset of plausible outputs. While prior work has developed benchmarks for measuring this lack of diversity, less is known about how the step-by-step probability distributions at inference time cause the problem. We introduce a validity--diversity framework that attributes diversity collapse to how an LLM allocates probability mass across valid and invalid continuations during decoding. This framework decomposes the bottleneck into two complementary forms of miscalibration. First, order calibration: valid tokens are not reliably ranked above invalid tokens, so rank-based cutoff rules must trade off between recovering valid continuations and admitting invalid ones. Second, shape calibration: probability mass is overly concentrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.