IDLM: Inverse-distilled Diffusion Language Models
David Li, Nikita Gushchin, Dmitry Abulkhanov, Eric Moulines, Ivan Oseledets, Maxim Panov, Alexander Korotin

TL;DR
This paper introduces IDLM, a novel method extending inverse distillation to discrete diffusion language models, significantly accelerating inference while maintaining quality, and overcoming theoretical and practical challenges in the process.
Contribution
The paper develops a theoretically sound and practically stable inverse distillation technique for discrete diffusion language models, enabling 4x-64x faster inference.
Findings
Reduces inference steps by up to 64 times
Maintains entropy and perplexity comparable to teacher models
Provides theoretical guarantees for unique solutions
Abstract
Diffusion Language Models (DLMs) have recently achieved strong results in text generation. However, their multi-step sampling leads to slow inference, limiting practical use. To address this, we extend Inverse Distillation, a technique originally developed to accelerate continuous diffusion models, to the discrete setting. Nonetheless, this extension introduces both theoretical and practical challenges. From a theoretical perspective, the inverse distillation objective lacks uniqueness guarantees, which may lead to suboptimal solutions. From a practical standpoint, backpropagation in the discrete space is non-trivial and often unstable. To overcome these challenges, we first provide a theoretical result demonstrating that our inverse formulation admits a unique solution, thereby ensuring valid optimization. We then introduce gradient-stable relaxations to support effective training. As…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Generative Adversarial Networks and Image Synthesis
