Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Linye Wei; Wenjue Chen; Pingzhi Tang; Xiaotian Guo; Le Ye; Runsheng Wang; Meng Li

arXiv:2511.21759·cs.CL·December 1, 2025

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models

Linye Wei, Wenjue Chen, Pingzhi Tang, Xiaotian Guo, Le Ye, Runsheng Wang, Meng Li

PDF

Open Access

TL;DR

This paper introduces ODB-dLLM, a novel acceleration framework for diffusion language models that reduces inference time by adaptively managing prefill and decoding phases, achieving significant speedups with maintained accuracy.

Contribution

The paper presents a dual-boundary orchestration framework that adaptively reduces prefill overhead and employs a jump-share speculative decoding method for efficient inference.

Findings

01

Achieves 46-162x speedup over baseline dLLM

02

Achieves 2.63-6.30x speedup over Fast-dLLM

03

Mitigates accuracy degradation in acceleration methods

Abstract

Diffusion-based large language models (dLLMs) have recently gained significant attention for their exceptional performance and inherent potential for parallel decoding. Existing frameworks further enhance its inference efficiency by enabling KV caching. However, its bidirectional attention mechanism necessitates periodic cache refreshes that interleave prefill and decoding phases, both contributing substantial inference cost and constraining achievable speedup. Inspired by the heterogeneous arithmetic intensity of the prefill and decoding phases, we propose ODB-dLLM, a framework that orchestrates dual-boundaries to accelerate dLLM inference. In the prefill phase, we find that the predefined fixed response length introduces heavy yet redundant computational overhead, which affects efficiency. To alleviate this, ODB-dLLM incorporates an adaptive length prediction mechanism that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis