Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding

Daisuke Oba; Danushka Bollegala; Masahiro Kaneko; Naoaki Okazaki

arXiv:2602.06412·cs.CL·May 13, 2026

Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding

Daisuke Oba, Danushka Bollegala, Masahiro Kaneko, Naoaki Okazaki

PDF

1 Repo 1 Video

TL;DR

SureLock is a method that detects when tokens in masked diffusion language models have stabilized, allowing the model to skip unnecessary recomputations and significantly reduce computational costs while maintaining quality.

Contribution

The paper introduces SureLock, a novel technique that locks stabilized tokens during diffusion decoding, reducing computation without sacrificing output quality.

Findings

01

SureLock reduces FLOPs by 30-50% on LLaDA-8B.

02

It maintains comparable generation quality to standard methods.

03

Theoretical analysis justifies the local KL monitoring for locking decisions.

Abstract

Masked Diffusion Language Models generate sequences via iterative sampling that progressively unmasks tokens. However, they still recompute the attention and feed-forward blocks for every token position at every step -- even when many unmasked tokens are essentially fixed, resulting in substantial waste in compute. We propose SureLock: when the posterior at an unmasked position has stabilized across steps (our sure condition), we lock that position -- thereafter skipping its query projection and feed-forward sublayers -- while caching its attention keys and values so other positions can continue to attend to it. This reduces the dominant per-iteration computational cost from $O (N^{2} d)$ to $O (M N d)$ where $N$ is the sequence length, $M$ is the number of unlocked token positions, and $d$ is the model dimension. In practice, $M$ decreases as the iteration progresses, yielding substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://daioba.github.io/surelock
github

Videos

Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding· slideslive