Deterministic Inference across Tensor Parallel Sizes That Eliminates Training-Inference Mismatch
Ziyang Zhang, Xinheng Ding, Jiayi Yuan, Rixin Liu, Huizi Mao, Jiarong Xing, Zirui Liu

TL;DR
This paper introduces Tree-Based Invariant Kernels (TBIK), enabling deterministic inference across different tensor parallel sizes in large language models, ensuring bit-wise identical results and eliminating training-inference mismatch.
Contribution
The paper presents TBIK, a novel set of TP-invariant matrix multiplication and reduction primitives that guarantee reproducibility across varying tensor parallel configurations.
Findings
Achieves zero divergence and bit-wise reproducibility across TP sizes.
Ensures identical results between vLLM and FSDP in RL training pipelines.
Code implementation demonstrates practical integration and effectiveness.
Abstract
Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However, existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs. While prior work has addressed batch-size-related nondeterminism through batch-invariant kernels, determinism across different TP sizes remains an open problem, particularly in RL settings, where the training engine typically uses Fully Sharded Data Parallel (i.e., TP = 1) while the rollout engine relies on multi-GPU TP to maximize the inference throughput, creating a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Tensor decomposition and applications · Machine Learning in Materials Science
