DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone
Vaibhav Singh, Oleksiy Ostapenko, Pierre-Andr\'e No\"el, Eugene Belilovsky, Torsten Scholak

TL;DR
DiffuMamba introduces a new diffusion language model with a Mamba backbone that significantly improves inference throughput and efficiency for long sequences, matching performance of Transformer-based models.
Contribution
The paper presents DiffuMamba, a novel diffusion language model with a Mamba backbone, combining diffusion objectives with linear-time sequence modeling, and demonstrates its superior efficiency and competitive performance.
Findings
DiffuMamba achieves up to 8.2x higher inference throughput.
Models match Transformer-based diffusion in downstream tasks.
Cache-efficient block diffusion with Mamba mixers scales linearly with sequence length.
Abstract
Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) generation, yet their reliance on Transformer backbones limits inference efficiency due to quadratic attention or KV-cache overhead. We introduce DiffuMamba, a masked diffusion language model built on a bidirectional Mamba backbone that combines the diffusion objective with linear-time sequence modeling, and DiffuMamba-H, a hybrid variant with interleaved attention. Across scales up to 1.3B parameters, our models match Transformer-based diffusion in downstream performance while achieving up to 8.2x and 4.3x higher inference throughput, respectively, on long sequences. We further present a systematic analysis of inference efficiency across modern DLM variants combining asymptotic complexity with empirical measurements. Notably, cache-efficient block diffusion with Mamba mixers emerges as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Machine Learning in Healthcare
