Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules
Amr Mohamed, Yang Zhang, Michalis Vazirgiannis, Guokan Shang

TL;DR
This paper introduces SchED, a training-free early-exit algorithm for diffusion language models that significantly accelerates decoding while maintaining high performance, outperforming previous confidence-based methods especially on long-form generation.
Contribution
SchED is a novel, model-agnostic early-exit method that uses confidence thresholds to speed up diffusion LLM decoding without retraining, showing large speedups across various models and tasks.
Findings
Achieves 3.8-4.0x speedup on instruction-tuned models with minimal performance loss.
Maintains 99.1-100% of baseline performance on base models with up to 2.34x speedup.
Outperforms prior confidence-based early-exit methods, especially on long-form generation.
Abstract
Diffusion large language models (dLLMs) offer a promising alternative to autoregressive models, but their practical utility is severely hampered by slow, iterative sampling. We present SchED, a training-free, model-agnostic early-exit algorithm that aggregates full-span logit margins and halts decoding once a smooth, progress-dependent confidence threshold is met. We evaluated SchED on two dLLM families (Dream and LLaDA), in base and instruction-tuned variants across ten benchmarks spanning downstream tasks including multiple-choice question answering (MCQ), math, long-form QA/summarization, and translation. SchED delivers large, stable accelerations: on instruction-tuned models, it achieves - speedups while retaining - of the baseline score on average. On base models, SchED yields consistent speedup gains with - performance retention, with up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
