CASTILLO: Characterizing Response Length Distributions of Large Language Models
Daniel F. Perez-Ramirez, Dejan Kostic, and Magnus Boman

TL;DR
CASTILLO provides a comprehensive dataset and analysis of response length distributions across multiple large language models, revealing variability and aiding resource management in LLM inference.
Contribution
It introduces a dataset characterizing response lengths for 13 LLMs across diverse prompts, enabling better predictive modeling and resource allocation.
Findings
Significant variability in response lengths within and across models
Model-specific behaviors and partial text degeneration observed
Dataset and analysis tools publicly released for research use
Abstract
Efficiently managing compute resources for Large Language Model (LLM) inference remains challenging due to the inherently stochastic and variable lengths of autoregressive text generation. Accurately estimating response lengths in advance enables proactive resource allocation, yet existing approaches either bias text generation towards certain lengths or rely on assumptions that ignore model- and prompt-specific variability. We introduce CASTILLO, a dataset characterizing response length distributions across 13 widely-used open-source LLMs evaluated on seven distinct instruction-following corpora. For each prompt, model sample pair, we generate 10 independent completions using fixed decoding hyper-parameters, record the token length of each response, and publish summary statistics (mean, std-dev, percentiles), along with the shortest and longest completions, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
