DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

Vaibhav Singh; Oleksiy Ostapenko; Pierre-Andr\'e No\"el; Eugene Belilovsky; Torsten Scholak

arXiv:2511.15927·cs.LG·March 2, 2026

DiffuMamba: High-Throughput Diffusion LMs with Mamba Backbone

Vaibhav Singh, Oleksiy Ostapenko, Pierre-Andr\'e No\"el, Eugene Belilovsky, Torsten Scholak

PDF

Open Access

TL;DR

DiffuMamba introduces a new diffusion language model with a Mamba backbone that significantly improves inference throughput and efficiency for long sequences, matching performance of Transformer-based models.

Contribution

The paper presents DiffuMamba, a novel diffusion language model with a Mamba backbone, combining diffusion objectives with linear-time sequence modeling, and demonstrates its superior efficiency and competitive performance.

Findings

01

DiffuMamba achieves up to 8.2x higher inference throughput.

02

Models match Transformer-based diffusion in downstream tasks.

03

Cache-efficient block diffusion with Mamba mixers scales linearly with sequence length.

Abstract

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) generation, yet their reliance on Transformer backbones limits inference efficiency due to quadratic attention or KV-cache overhead. We introduce DiffuMamba, a masked diffusion language model built on a bidirectional Mamba backbone that combines the diffusion objective with linear-time sequence modeling, and DiffuMamba-H, a hybrid variant with interleaved attention. Across scales up to 1.3B parameters, our models match Transformer-based diffusion in downstream performance while achieving up to 8.2x and 4.3x higher inference throughput, respectively, on long sequences. We further present a systematic analysis of inference efficiency across modern DLM variants combining asymptotic complexity with empirical measurements. Notably, cache-efficient block diffusion with Mamba mixers emerges as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Machine Learning in Healthcare