Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

Feng Hong; Geng Yu; Yushi Ye; Haicheng Huang; Huangjie Zheng; Ya Zhang; Yanfeng Wang; Jiangchao Yao

arXiv:2507.18578·cs.CL·September 29, 2025

Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

Feng Hong, Geng Yu, Yushi Ye, Haicheng Huang, Huangjie Zheng, Ya Zhang, Yanfeng Wang, Jiangchao Yao

PDF

3 Reviews

TL;DR

The paper introduces WINO, a decoding algorithm for DLLMs that enhances speed and quality by enabling revokable, parallel token generation and verification, addressing the irreversibility issue in standard decoding.

Contribution

WINO is a training-free, parallel decoding method that improves DLLM performance by allowing token verification and refinement during inference.

Findings

01

Accelerates inference by 6× on GSM8K with 2.58% accuracy gain

02

Achieves 10× speedup on Flickr30K with better performance

03

Demonstrates superiority over existing decoding methods

Abstract

Diffusion Large Language Models (DLLMs) have emerged as a compelling alternative to Autoregressive models, designed for fast parallel generation. However, existing DLLMs are plagued by a severe quality-speed trade-off, where faster parallel decoding leads to significant performance degradation. We attribute this to the irreversibility of standard decoding in DLLMs, which is easily polarized into the wrong decoding direction along with early error context accumulation. To resolve this, we introduce Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable decoding in DLLMs. WINO employs a parallel draft-and-verify mechanism, aggressively drafting multiple tokens while simultaneously using the model's bidirectional context to verify and re-mask suspicious ones for refinement. Verified in open-source DLLMs like LLaDA and MMaDA, WINO is shown to decisively…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The proposed method is training-free, allowing it to function as a plug-and-play module for existing pre-trained DLLMs. 2. The algorithm demonstrates strong performance gains while accelerating the inference.

Weaknesses

1. The proposed solution to prevent information leakage during verification seems insufficient. The paper states that a token in the Y_shad cannot attend to its corresponding token in Y_cur. However, it can attend to other tokens within Y_cur. Since full self-attention is applied within Y_cur, these other tokens already contain contextual information about the token being verified. Therefore, information leakage still exists. It causes doubt on the mechanism's claimed effectiveness, as the model

Reviewer 02Rating 6Confidence 3

Strengths

- **Clear and Intuitive Motivation:** The identification of "irreversibility" as a core problem is a strong conceptual contribution. The proposed draft-and-verify solution is a logical and elegant response to this problem. However, this seems to be aligning with many recent work, which might be worth including to related work, and the idea is also not super novel in AR LLMs community. Being honest with that will not hurt the novelty of the paper but establishes a great baseline. - **Novel Mech

Weaknesses

- **Missing Key Baseline Comparisons:** This is the most significant weakness. The paper compares WINO to generating 1 token/step or a fixed M tokens/step. However, the related work section itself mentions other advanced samplers like Fast-dLLM-parallel and the entropy-bounded (EB) sampler, which also perform dynamic, confidence-based parallel decoding. To convincingly demonstrate the superiority of the WINO mechanism (specifically, the benefit of *revocation*), it is crucial to compare against

Reviewer 03Rating 6Confidence 4

Strengths

1. The work introduces a novel draft-verify procedure that replaces the naive write-many-at-once update used in prior DLLM decoding. Compared with earlier approaches, WINO maintains high speedups without sacrificing performance and in many benchmarks even improves accuracy. 2. The study evaluates both text-only and vision-language settings over 14 tasks, which strengthens the credibility of the conclusions. 3. WINO does not require retraining and is easy to deploy as a plug-and-play component

Weaknesses

1. As described in Section 4.2, during verification each position in y_shad conditions on the corresponding position in y_curr from the previous step. It is unclear whether this one-step lag negatively affects performance. 2. The paper lacks detailed hardware information for the experiments, especially which GPU models and how many were used to obtain the main TPS results.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.