Transfer Learning for Text Diffusion Models
Kehang Han, Kathleen Kenealy, Aditya Barua, Noah Fiedel, Noah Constant

TL;DR
This paper investigates replacing autoregressive decoding with text diffusion models for large language models, demonstrating promising results in code synthesis and QA, with potential speed advantages.
Contribution
It introduces AR2Diff, a lightweight adaptation method to transform pretrained AR models into diffusion models, and provides a comprehensive comparison across tasks and architectures.
Findings
Diffusion models outperform AR in code synthesis and extractive QA.
AR2Diff improves quality of AR models with diffusion decoding.
Diffusion decoding can be faster than autoregressive decoding for long texts.
Abstract
In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsDiffusion
