Beyond Reproducibility: Token Probabilities Expose Large Language Model Nondeterminism
Tairan Fu, Gonzalo Mart\'inez, Javier Conde, Carlos Arriaga, Pedro Reviriego, Xiuyuan Qi, Shanshan Liu

TL;DR
This paper investigates how nondeterminism in Large Language Models affects token probability distributions, revealing significant variations in intermediate probabilities and implications for model reliability and evaluation.
Contribution
It provides a detailed analysis of token probability variations caused by nondeterminism, highlighting their impact on model outputs and proposing a new way to estimate nondeterminism effects.
Findings
Nondeterminism significantly affects token probabilities between 0.1 and 0.9.
Similar nondeterministic patterns are observed across different models.
Token probability analysis can estimate nondeterminism impact without multiple runs.
Abstract
The execution of Large Language Models (LLMs) has been shown to produce nondeterministic results when run on Graphics Processing Units (GPUs), even when they are configured to produce deterministic results. This is due to the finite precision effects of the arithmetic operations, which depend on the order in which they are executed. This order, in turn, depends on the processes that are running concurrently on the GPU. Previous studies have focused on the impact of nondeterminism on the text generated by the LLMs or on proposing mechanisms to achieve deterministic execution. This work takes a closer look at nondeterminism by analyzing the variations on the token probabilities, not on the generated text. Interestingly, all the models evaluated have similar results in both the trends and the actual values of the variations of the probabilities. In particular, the results show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software System Performance and Reliability
