Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models
Alberto Messina, Stefano Scotta

TL;DR
This paper introduces the concept of background temperature to quantify implementation-induced randomness in large language models, affecting reproducibility and evaluation.
Contribution
It formalizes background temperature, relates it to inference environment perturbations, and proposes an empirical method to estimate it in practice.
Findings
Background temperature can be empirically estimated from LLM outputs.
Implementation-level nondeterminism impacts model reproducibility.
Pilot experiments demonstrate the concept across major LLM providers.
Abstract
Even when decoding with temperature , large language models (LLMs) can produce divergent outputs for identical inputs. Recent work by Thinking Machines Lab highlights implementation-level sources of nondeterminism, including batch-size variation, kernel non-invariance, and floating-point non-associativity. In this short note we formalize this behavior by introducing the notion of \emph{background temperature} , the effective temperature induced by an implementation-dependent perturbation process observed even when nominal . We provide clean definitions, show how relates to a stochastic perturbation governed by the inference environment , and propose an empirical protocol to estimate via the equivalent temperature of an ideal reference system. We conclude with a set of pilot experiments run on a representative pool from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
