TL;DR
This paper introduces WINO and WINO+ methods enabling diffusion LLMs to improve decoding efficiency and quality by discovering and following reliable denoising orders, effectively making models their own teachers.
Contribution
The paper proposes revokable parallel decoding and training strategies that align inference with training, enhancing diffusion LLMs' efficiency and quality.
Findings
WINO improves accuracy from 73.24% to 75.82% on GSM8K with 6.10x step reduction.
WINO+ achieves 76.58% accuracy with a 6.83x reduction on GSM8K.
On Flickr30K, WINO+ reaches a 16.22x step reduction with better CIDEr scores.
Abstract
Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We attribute this dilemma to a train-inference mismatch amplified by irreversible decoding. While training reconstructs tokens from randomly corrupted states, efficient inference requires an adaptive denoising order, where easier tokens are revealed earlier and context-dependent ones are deferred. This view motivates two complementary methods: an inference-time method that makes parallel decoding revokable, and a training-time extension that distills the reliable order exposed by this revokable process. Accordingly, we first propose Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable parallel generation. WINO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
