Multi-Token Residual Prediction

Yufeng Xu; Zishuo Bao; Qian Wang; Zeshen Zhang; Haoqi Zhang; Bowen Peng; Ang Li; Rahul Chalamala; Yucheng Lu

arXiv:2605.18817·cs.LG·May 20, 2026

Multi-Token Residual Prediction

Yufeng Xu, Zishuo Bao, Qian Wang, Zeshen Zhang, Haoqi Zhang, Bowen Peng, Ang Li, Rahul Chalamala, Yucheng Lu

PDF

TL;DR

The paper introduces Multi-token Residual Prediction (MRP), a lightweight module that improves the efficiency of diffusion language models by enabling dependency-aware multi-token denoising within a single forward pass, achieving significant speedups.

Contribution

MRP leverages the similarity of logit distributions at adjacent denoising steps to predict residuals, enabling faster inference without sacrificing quality.

Findings

01

Achieves up to 1.42x lossless speedup in inference.

02

Effective across multiple model scales and benchmarks.

03

Enables tunable quality-speed tradeoffs in decoding.

Abstract

Diffusion Language Models (DLMs) generate text by iteratively denoising masked token sequences, offering a tradeoff between parallelism and quality compared to autoregressive models. In current practice, the number of tokens decoded per step is controlled by a confidence threshold, and quality degrades monotonically as more tokens are denoised per step. We introduce Multi-token Residual Prediction (MRP), a lightweight module that enables dependency-aware multi-token denoising within a single backbone forward pass. MRP exploits a key property of the denoising process: the logit distributions at adjacent denoising steps are remarkably similar. Rather than running the backbone a second time to obtain the next-step logits, MRP predicts the residual between steps from the backbone's hidden states, effectively denoising more tokens per backbone forward at a fraction of the cost. We deploy MRP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.