LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection
Haohui Zhang, Zhiye Wang, Xiaoying Gan, Xinbing Wang, Bo Jiang

TL;DR
LEAP is a novel method that detects early-converging tokens in diffusion language models, enabling faster parallel decoding without sacrificing accuracy by leveraging future context and multi-sequence superposition.
Contribution
LEAP introduces a training-free, plug-and-play technique for early token convergence detection, significantly reducing inference steps in dLLMs.
Findings
Reduces average denoising steps by about 30%.
Accelerates decoding to 7.2 tokens per step on GSM8K.
Enables reliable early decoding of tokens without high-confidence thresholds.
Abstract
Diffusion Language Models (dLLMs) have garnered significant attention for their potential in highly parallel processing. The parallel capabilities of existing dLLMs stem from the assumption of conditional independence at high confidence levels, which ensures negligible discrepancy between the marginal and joint distributions. However, the stringent confidence thresholds required to preserve accuracy severely constrain the scalability of parallelism. Through systematic token-level statistical analysis, we reveal that a substantial proportion of tokens converge to their correct predictions early in the denoising process yet fail to reach standard confidence thresholds, confirming that current confidence-based criteria are overly conservative. In response, we introduce LEAP (Lookahead Early-Convergence Token Detection for Accelerated Parallel Decoding). LEAP is a training-free,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
