CRoCoDiL: Continuous and Robust Conditioned Diffusion for Language
Roy Uziel, Omer Belhasin, Itay Levy, Akhiad Bercovich, Ran El-Yaniv, Ran Zilberstein, and Michael Elad

TL;DR
CRoCoDiL introduces a continuous semantic space for diffusion models, enhancing language generation quality and speed by jointly training an encoder-demasker architecture and proposing novel hybrid and multi-diffusion algorithms.
Contribution
It presents a unified fine-tuning approach for diffusion models in language, enabling continuous latent representations and faster, higher-quality text synthesis.
Findings
Achieves over 10x faster sampling speeds in unconditional generation.
Demonstrates superior generation quality with LLaDA.
Introduces two novel diffusion algorithms: ConThenDisc and ConWithinDisc.
Abstract
Masked Diffusion Models (MDMs) provide an efficient non-causal alternative to autoregressive generation but often struggle with token dependencies and semantic incoherence due to their reliance on discrete marginal distributions. We address these limitations by shifting the diffusion process into a continuous sentence-level semantic space. We propose CRoCoDiL (Continuous and Robust Conditioned Diffusion for Language), a unified fine-tuning approach that jointly trains an encoder-demasker architecture, grounding the MDM demasking in continuous latent representations. This leads to the formation of a novel autoencoder in which decoding is obtained by an MDM algorithm. Relying on the same framework, we introduce two unconditional text synthesis algorithms: Continuous-Then-Discrete (ConThenDisc), a hybrid-diffusion approach that first generates latent representations in continuous space and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
