TL;DR
This paper introduces REPR-ALIGN, a method for converting autoregressive language models into diffusion models by aligning internal representations, leading to faster training and effective transfer of linguistic knowledge.
Contribution
It proposes a novel representation alignment technique that enables direct conversion of AR models to DLMs without extensive retraining or architectural changes.
Findings
Up to 4x training acceleration achieved.
Representation alignment effectively transfers linguistic knowledge.
Method is especially beneficial in low-data regimes.
Abstract
Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representation geometry learned by next-token prediction can be explicitly preserved during AR-to-DLM conversion. We hypothesize that much of the semantic structure learned by AR pretraining can transfer across generation orders, and thus DLM training should be viewed as relearning the decoding path rather than relearning language representations. To investigate this, we introduce REPR-ALIGN, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
