CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation

Chihan Huang; Hao Tang

arXiv:2505.14455·cs.CL·October 23, 2025

CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation

Chihan Huang, Hao Tang

PDF

Open Access

TL;DR

CtrlDiff is a novel semi-autoregressive diffusion model that adaptively segments text for generation, improving controllability and efficiency while narrowing the performance gap with autoregressive models.

Contribution

It introduces a dynamic block prediction mechanism and a classifier-guided control method for diffusion-based language models, addressing fixed-length and controllability limitations.

Findings

01

Outperforms previous diffusion models in quality and controllability

02

Enables efficient post-hoc conditional generation without retraining

03

Reduces computational overhead compared to existing methods

Abstract

Although autoregressive models have dominated language modeling in recent years, there has been a growing interest in exploring alternative paradigms to the conventional next-token prediction framework. Diffusion-based language models have emerged as a compelling alternative due to their powerful parallel generation capabilities and inherent editability. However, these models are often constrained by fixed-length generation. A promising direction is to combine the strengths of both paradigms, segmenting sequences into blocks, modeling autoregressive dependencies across blocks while leveraging discrete diffusion to estimate the conditional distribution within each block given the preceding context. Nevertheless, their practical application is often hindered by two key limitations: rigid fixed-length outputs and a lack of flexible control mechanisms. In this work, we address the critical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsDiffusion