Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model

Yihong Dong; Zhaoyu Ma; Xue Jiang; Zhiyuan Fan; Jiaru Qian; Yongmin Li; Jianha Xiao; Zhi Jin; Rongyu Cao; Binhua Li; Fei Huang; Yongbin Li; Ge Li

arXiv:2510.18165·cs.AI·April 30, 2026

Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model

Yihong Dong, Zhaoyu Ma, Xue Jiang, Zhiyuan Fan, Jiaru Qian, Yongmin Li, Jianha Xiao, Zhi Jin, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li, Ge Li

PDF

TL;DR

Saber is a novel, training-free sampling algorithm for diffusion language models that adaptively accelerates inference and employs backtracking to improve code generation quality and speed.

Contribution

It introduces Saber, a dynamic sampling method that adjusts token unmasking and backtracks to enhance DLM performance without additional training.

Findings

01

Boosts Pass@1 accuracy by 1.9% on code benchmarks.

02

Achieves 251.4% average inference speedup.

03

Narrowing the performance gap with autoregressive models.

Abstract

Diffusion language models (DLMs) are emerging as a compelling alternative to the dominant autoregressive paradigm, offering inherent advantages in parallel generation and bidirectional context modeling. However, for the tasks with strict structural constraints such as code generation, DLMs face a critical trade-off between inference speed and output quality, where accelerating generation by reducing sampling steps often leads to catastrophic performance collapse. We find that the fundamental reasons are: 1) the generation difficulty is non-uniform in the structured sequence decoding steps, making DLM's static acceleration strategy suboptimal; 2) the context of tokens generated by DLM evolves continuously, causing early high-confidence predictions to turn into irreversible errors. In this paper, we introduce efficient Sampling with Adaptive acceleration and Backtracking Enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.