Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

Fanqin Zeng; Feng Hong; Geng Yu; Huangjie Zheng; Xiaofeng Cao; Ya Zhang; Bo Han; Yanfeng Wang; Jiangchao Yao

arXiv:2605.16941·cs.CL·May 19, 2026

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

Fanqin Zeng, Feng Hong, Geng Yu, Huangjie Zheng, Xiaofeng Cao, Ya Zhang, Bo Han, Yanfeng Wang, Jiangchao Yao

PDF

1 Repo

TL;DR

This paper introduces WINO and WINO+ methods enabling diffusion LLMs to improve decoding efficiency and quality by discovering and following reliable denoising orders, effectively making models their own teachers.

Contribution

The paper proposes revokable parallel decoding and training strategies that align inference with training, enhancing diffusion LLMs' efficiency and quality.

Findings

01

WINO improves accuracy from 73.24% to 75.82% on GSM8K with 6.10x step reduction.

02

WINO+ achieves 76.58% accuracy with a 6.83x reduction on GSM8K.

03

On Flickr30K, WINO+ reaches a 16.22x step reduction with better CIDEr scores.

Abstract

Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We attribute this dilemma to a train-inference mismatch amplified by irreversible decoding. While training reconstructs tokens from randomly corrupted states, efficient inference requires an adaptive denoising order, where easier tokens are revealed earlier and context-dependent ones are deferred. This view motivates two complementary methods: an inference-time method that makes parallel decoding revokable, and a training-time extension that distills the reliable order exposed by this revokable process. Accordingly, we first propose Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable parallel generation. WINO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Feng-Hong/WINO-DLLM/tree/WINO-plus
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.