TL;DR
The paper introduces PA-BDM, a novel diffusion model for document recognition that enhances efficiency and accuracy by adaptively managing prefix commitment and denoising strategies.
Contribution
It proposes a prefix-adaptive diffusion approach with confidence-gated loss and progressive prefix commitment, improving parallel decoding and recognition performance.
Findings
PA-BDM achieves higher recognition scores on benchmarks.
Inference throughput improves by 71.6% over previous models.
Replaces intra-block bidirectional denoising with causal denoising.
Abstract
Block Diffusion Models (BDMs) support parallel generation, flexible-length output, and KV caching, making them promising for efficient document parsing. However, existing BDMs bind denoising and cache commitment to fixed block boundaries: parallelism shrinks during intra-block denoising, while generated tokens cannot be cached until the whole block is completed. Moreover, intra-block bidirectional denoising conflicts with inter-block autoregression, creating inconsistent information flow that can challenge structure-sensitive recognition. We propose the Prefix-Adaptive Block Diffusion Model (PA-BDM), which replaces intra-block bidirectional denoising with causal denoising from prefix to suffix and treats the block size as a maximum candidate range rather than a fixed commitment unit. PA-BDM uses Confidence-gated Structural Loss (CSL) to build low-entropy prefixes before extending…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
