Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Fred Zhangzhi Peng; Alexis Fox; Anru R. Zhang; Alexander Tong

arXiv:2605.06885·cs.LG·May 11, 2026

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

Fred Zhangzhi Peng, Alexis Fox, Anru R. Zhang, Alexander Tong

PDF

1 Repo

TL;DR

This paper introduces REPR-ALIGN, a method for converting autoregressive language models into diffusion models by aligning internal representations, leading to faster training and effective transfer of linguistic knowledge.

Contribution

It proposes a novel representation alignment technique that enables direct conversion of AR models to DLMs without extensive retraining or architectural changes.

Findings

01

Up to 4x training acceleration achieved.

02

Representation alignment effectively transfers linguistic knowledge.

03

Method is especially beneficial in low-data regimes.

Abstract

Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representation geometry learned by next-token prediction can be explicitly preserved during AR-to-DLM conversion. We hypothesize that much of the semantic structure learned by AR pretraining can transfer across generation orders, and thus DLM training should be viewed as relearning the decoding path rather than relearning language representations. To investigate this, we introduce REPR-ALIGN, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pengzhangzhi/Open-dLLM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.