LLM-42: Enabling Determinism in LLM Inference with Verified Speculation
Raja Gond, Aditya K Kamath, Ramachandran Ramjee, and Ashish Panwar

TL;DR
LLM-42 introduces a scheduling-based method that achieves deterministic inference in large language models by verifying token outputs with minimal overhead, maintaining high throughput and flexibility.
Contribution
This work presents LLM-42, a novel approach that enforces determinism in LLM inference through verification and rollback, avoiding kernel redesigns and reducing overhead.
Findings
Achieves deterministic outputs with minimal performance impact.
Mostly reuses existing GPU kernels without modification.
Overhead is proportional to the amount of traffic requiring determinism.
Abstract
In LLM inference, the same prompt may yield different outputs across different runs. At the system level, this non-determinism arises from floating-point non-associativity combined with dynamic batching and GPU kernels whose reduction orders vary with batch size. A straightforward way to eliminate non-determinism is to disable dynamic batching during inference, but doing so severely degrades throughput. Another approach is to make kernels batch-invariant; however, this tightly couples determinism to kernel design, requiring new implementations. This coupling also imposes fixed runtime overheads, regardless of how much of the workload actually requires determinism. Inspired by ideas from speculative decoding, we present LLM-42, a scheduling-based approach to enable determinism in LLM inference. Our key observation is that if a sequence is in a consistent state, the next emitted token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
