Discrete Diffusion Language Model for Efficient Text Summarization
Do Huu Dat, Do Duc Anh, Anh Tuan Luu, Wray Buntine

TL;DR
This paper introduces a novel discrete diffusion model with semantic-aware noising and CrossMamba architecture, enabling efficient and high-quality long-text summarization that outperforms existing models on key benchmarks.
Contribution
It presents a new semantic-aware noising process and an adapted CrossMamba model for improved long-text summarization with faster inference.
Findings
Achieved state-of-the-art ROUGE scores on benchmark datasets
Demonstrated faster inference speeds than autoregressive models
Successfully handled long sequences with Transformer backbones
Abstract
While diffusion models excel at conditional generating high-quality images, prior works in discrete diffusion models were not evaluated on conditional long-text generation. In this work, we address the limitations of prior discrete diffusion models for conditional long-text generation, particularly in long sequence-to-sequence tasks such as abstractive summarization. Despite fast decoding speeds compared to autoregressive methods, previous diffusion models failed on the abstractive summarization task due to the incompatibility between the backbone architectures and the random noising process. To overcome these challenges, we introduce a novel semantic-aware noising process that enables Transformer backbones to handle long sequences effectively. Additionally, we propose CrossMamba, an adaptation of the Mamba model to the encoder-decoder paradigm, which integrates seamlessly with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsAttention Is All You Need · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Diffusion · Softmax
