Prefix-Adaptive Block Diffusion for Efficient Document Recognition

Mingxu Chai,Ziyu Shen,Chenyu Liu,Kaidi Zhang,Jiazheng Zhang,Dingwei Zhu,Zhiheng Xi,Ruoyu Chen,Jun Long,Jihua Kang,Tao Gui,Qi Zhang

arXiv:2605.16861·cs.CV·May 19, 2026

Prefix-Adaptive Block Diffusion for Efficient Document Recognition

Mingxu Chai,Ziyu Shen,Chenyu Liu,Kaidi Zhang,Jiazheng Zhang,Dingwei Zhu,Zhiheng Xi,Ruoyu Chen,Jun Long,Jihua Kang,Tao Gui,Qi Zhang

PDF

1 Models

TL;DR

The paper introduces PA-BDM, a novel diffusion model for document recognition that enhances efficiency and accuracy by adaptively managing prefix commitment and denoising strategies.

Contribution

It proposes a prefix-adaptive diffusion approach with confidence-gated loss and progressive prefix commitment, improving parallel decoding and recognition performance.

Findings

01

PA-BDM achieves higher recognition scores on benchmarks.

02

Inference throughput improves by 71.6% over previous models.

03

Replaces intra-block bidirectional denoising with causal denoising.

Abstract

Block Diffusion Models (BDMs) support parallel generation, flexible-length output, and KV caching, making them promising for efficient document parsing. However, existing BDMs bind denoising and cache commitment to fixed block boundaries: parallelism shrinks during intra-block denoising, while generated tokens cannot be cached until the whole block is completed. Moreover, intra-block bidirectional denoising conflicts with inter-block autoregression, creating inconsistent information flow that can challenge structure-sensitive recognition. We propose the Prefix-Adaptive Block Diffusion Model (PA-BDM), which replaces intra-block bidirectional denoising with causal denoising from prefix to suffix and treats the block size as a maximum candidate range rather than a fixed commitment unit. PA-BDM uses Confidence-gated Structural Loss (CSL) to build low-entropy prefixes before extending…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
MingxuChai/PA-BDM
model· 108 dl
108 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.