Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

Hengchang Liu; Zhao Yang; Bing Su

arXiv:2602.00476·cs.LG·February 3, 2026

Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

Hengchang Liu, Zhao Yang, Bing Su

PDF

Open Access 1 Models

TL;DR

Diffusion language models can implicitly determine optimal infilling lengths, and a calibration method called CAL enhances their performance in code and text infilling tasks without additional training.

Contribution

This paper introduces CAL, a training-free calibration method that enables diffusion language models to approximate optimal infilling lengths by exploiting statistical signals in denoising confidence.

Findings

01

CAL improves Pass@1 by up to 47.7% in code infilling.

02

CAL boosts BLEU-2 and ROUGE-L scores by up to 8.5% and 9.9%.

03

DLMs inherently discover correct infilling lengths through statistical phenomena.

Abstract

Diffusion language models (DLMs) provide a bidirectional generation framework naturally suited for infilling, yet their performance is constrained by the pre-specified infilling length. In this paper, we reveal that DLMs possess an inherent ability to discover the correct infilling length. We identify two key statistical phenomena in the first-step denoising confidence: a local \textit{Oracle Peak} that emerges near the ground-truth length and a systematic \textit{Length Bias} that often obscures this signal. By leveraging this signal and calibrating the bias, our training-free method \textbf{CAL} (\textbf{C}alibrated \textbf{A}daptive \textbf{L}ength) enables DLMs to approximate the optimal length through an efficient search before formal decoding. Empirical evaluations demonstrate that CAL improves Pass@1 by up to 47.7\% over fixed-length baselines and 40.5\% over chat-based adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Hengchang-Liu/CAL
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis