TL;DR
This paper introduces an encoder-decoder architecture for discrete diffusion language models, significantly improving inference speed and training efficiency while maintaining high-quality output across various NLP tasks.
Contribution
It proposes a novel encoder-decoder framework for discrete diffusion models, enabling faster inference and training through specialized modules and algorithms.
Findings
Achieves superior quality-throughput trade-offs in summarization, translation, and reasoning.
Enables faster training of block diffusion models with better sequence partitioning.
Provides open-source code and models for practical adoption.
Abstract
Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network at every denoising step and incur high computational cost. Our key insight is that discrete diffusion models perform two types of computation: 1) representing clean tokens and 2) denoising corrupted tokens, which enables us to use separate modules for each task. We propose an encoder-decoder architecture to accelerate discrete diffusion inference, which relies on an encoder to represent clean tokens and a lightweight decoder to iteratively refine a noised sequence. We also show that this architecture enables faster training of block diffusion models, which partition sequences into blocks for better quality and are commonly used in diffusion language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
