Beyond Reproducibility: Token Probabilities Expose Large Language Model Nondeterminism

Tairan Fu; Gonzalo Mart\'inez; Javier Conde; Carlos Arriaga; Pedro Reviriego; Xiuyuan Qi; Shanshan Liu

arXiv:2601.06118·cs.AI·January 13, 2026

Beyond Reproducibility: Token Probabilities Expose Large Language Model Nondeterminism

Tairan Fu, Gonzalo Mart\'inez, Javier Conde, Carlos Arriaga, Pedro Reviriego, Xiuyuan Qi, Shanshan Liu

PDF

Open Access

TL;DR

This paper investigates how nondeterminism in Large Language Models affects token probability distributions, revealing significant variations in intermediate probabilities and implications for model reliability and evaluation.

Contribution

It provides a detailed analysis of token probability variations caused by nondeterminism, highlighting their impact on model outputs and proposing a new way to estimate nondeterminism effects.

Findings

01

Nondeterminism significantly affects token probabilities between 0.1 and 0.9.

02

Similar nondeterministic patterns are observed across different models.

03

Token probability analysis can estimate nondeterminism impact without multiple runs.

Abstract

The execution of Large Language Models (LLMs) has been shown to produce nondeterministic results when run on Graphics Processing Units (GPUs), even when they are configured to produce deterministic results. This is due to the finite precision effects of the arithmetic operations, which depend on the order in which they are executed. This order, in turn, depends on the processes that are running concurrently on the GPU. Previous studies have focused on the impact of nondeterminism on the text generated by the LLMs or on proposing mechanisms to achieve deterministic execution. This work takes a closer look at nondeterminism by analyzing the variations on the token probabilities, not on the generated text. Interestingly, all the models evaluated have similar results in both the trends and the actual values of the variations of the probabilities. In particular, the results show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software System Performance and Reliability