Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models

Zhengyu Fang; Zhimeng Jiang; Huiyuan Chen; Xiaoge Zhang; Tianyi Li; Kaiyu Tang; Xiao Li; Jing Li

arXiv:2604.13413·cs.LG·April 16, 2026

Dataset-Level Metrics Attenuate Non-Determinism: A Fine-Grained Non-Determinism Evaluation in Diffusion Language Models

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Tianyi Li, Kaiyu Tang, Xiao Li, Jing Li

PDF

TL;DR

This paper reveals that dataset-level metrics hide the true extent of non-determinism in diffusion language models, and proposes a fine-grained, factor-aware evaluation method to better understand model variability.

Contribution

It introduces a detailed evaluation framework and Factor Variance Attribution (FVA) to analyze and attribute sources of non-determinism in diffusion language models.

Findings

01

Dataset-level metrics attenuate non-determinism, masking variability.

02

Non-determinism is pervasive and varies with model factors like guidance scale and diffusion steps.

03

Code generation tasks are more sensitive to non-determinism than question answering.

Abstract

Diffusion language models (DLMs) have emerged as a promising paradigm for large language models (LLMs), yet the non-deterministic behavior of DLMs remains poorly understood. The existing non-determinism evaluations for LLMs predominantly rely on dataset-level metrics under fixed inference configurations, providing limited insight into how model behavior varies across runs and evaluation conditions. In this work, we show that dataset-level metrics systematically attenuate non-determinism in diffusion language models by aggregating sample-level prediction quality across different runs. As a result, configurations with similar aggregate performance can exhibit substantially different behaviors on individual inputs, leaving fine-grained instability and distinct error patterns uncharacterized. To address this limitation, we conduct a fine-grained evaluation of non-determinism based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.