Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions

Jing Wang; Yu-Yang Qian; Ke Xue; Chao Qian; Peng Zhao; Zhi-Hua Zhou

arXiv:2604.07931·cs.LG·April 10, 2026

Robust Length Prediction: A Perspective from Heavy-Tailed Prompt-Conditioned Distributions

Jing Wang, Yu-Yang Qian, Ke Xue, Chao Qian, Peng Zhao, Zhi-Hua Zhou

PDF

TL;DR

This paper introduces ProD methods for more reliable length prediction in large language models by modeling prompt-conditioned output length distributions as heavy-tailed, improving prediction robustness.

Contribution

It presents a novel approach to length prediction that accounts for heavy-tailed distributions, using multiple generations and robust estimation techniques.

Findings

01

ProD methods outperform existing length prediction approaches.

02

Using multiple generations improves prediction accuracy.

03

ProD-M and ProD-D provide robust point and distributional predictions.

Abstract

Output-length prediction is important for efficient LLM serving, as it directly affects batching, memory reservation, and scheduling. For prompt-only length prediction, most existing methods use a one-shot sampled length as the label, implicitly treating each prompt as if it had one true target length. We show that this is unreliable: even under a fixed model and decoding setup, the same prompt induces a \emph{prompt-conditioned output length distribution}, not a deterministic scalar, and this distribution is consistent with \emph{heavy-tailed} behavior. Motivated by this, we cast length prediction as robust estimation from heavy-tailed prompt-conditioned length distributions. We propose prompt-conditioned length distribution (ProD) methods, which construct training targets from multiple independent generations of the same prompt. Two variants are developed to reuse the served LLM's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.