Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

Minda Zhao; Yilun Du; Mengyu Wang

arXiv:2601.05414·cs.CL·April 27, 2026

Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions

Minda Zhao, Yilun Du, Mengyu Wang

PDF

TL;DR

This study reveals that large language models struggle to accurately sample from probability distributions, especially in independent requests, impacting their reliability in stochastic applications.

Contribution

It provides the first large-scale, statistically rigorous evaluation of LLMs' native probabilistic sampling capabilities, highlighting significant limitations.

Findings

01

Batch sampling achieves 7% median validity

02

Independent requests pass nearly none of the distributions

03

Sampling fidelity worsens with distribution complexity and sample size

Abstract

As large language models (LLMs) transition from chat interfaces to integral components of stochastic pipelines and systems approaching general intelligence, the ability to faithfully sample from specified probability distributions has become a functional requirement rather than a theoretical curiosity. We present the first large-scale, statistically powered audit of native probabilistic sampling in frontier LLMs, benchmarking 11 models across 15 distributions. To disentangle failure modes, we employ a dual-protocol design: Batch Generation, where a model produces $N = 1000$ samples within one response, and Independent Requests, comprising $N = 1000$ stateless calls. We observe a sharp protocol asymmetry: batch generation achieves only modest statistical validity, with a 7% median pass rate, while independent requests collapse almost entirely, with 10 of 11 models passing none of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.