MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts

Alexandros Christoforos; Chadbourne Davis

arXiv:2512.20604·cs.CL·January 8, 2026

MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts

Alexandros Christoforos, Chadbourne Davis

PDF

Open Access

TL;DR

MoE-DiffuSeq introduces a scalable diffusion framework for long-document text generation by combining sparse attention, mixture of experts, and a novel diffusion process design, significantly improving efficiency and quality.

Contribution

It presents a novel diffusion-based architecture integrating sparse attention and MoE, with a soft absorbing state to enhance long-form text generation efficiency and coherence.

Findings

01

Outperforms prior models in training efficiency and inference speed.

02

Maintains high generation quality on long-document benchmarks.

03

Effective for scientific, code, and dialogue generation tasks.

Abstract

We propose \textbf{MoE-DiffuSeq}, a diffusion-based framework for efficient long-form text generation that integrates sparse attention with a Mixture-of-Experts (MoE) architecture. Existing sequence diffusion models suffer from prohibitive computational and memory costs when scaling to long documents, largely due to dense attention and slow iterative reconstruction. MoE-DiffuSeq addresses these limitations by combining expert routing with a tailored sparse attention mechanism, substantially reducing attention complexity while preserving global coherence and textual fidelity. In addition, we introduce a \emph{soft absorbing state} within the diffusion process that reshapes attention dynamics during denoising, enabling faster sequence reconstruction and more precise token refinement. This design accelerates both training and sampling without sacrificing generation quality. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Advanced Text Analysis Techniques