TL;DR
This paper demonstrates that by extending supervised fine-tuning to full-sequence masking, diffusion language models can effectively perform prompt infilling, surpassing manual templates and enhancing transferability.
Contribution
The authors introduce a training modification enabling diffusion language models to perform prompt infilling, revealing that training practices, not architecture, limit this capability.
Findings
Infilled prompts match or outperform manual templates.
Model infilling transfers effectively across different models.
Full-sequence masking unlocks prompt infilling in diffusion language models.
Abstract
Masked diffusion language models (dLMs) generate text through bidirectional denoising, yet this capability remains locked for infilling prompts. This limitation is an artifact of the current supervised finetuning (SFT) convention of applying response-only masking. To unlock this capability, we extend full-sequence masking during SFT, where both prompts and responses are masked jointly. Once unlocked, the model infills masked portions of a prompt template conditioned on few-shot examples. We show that such model-infilled prompts match or surpass manually designed templates, transfer effectively across models, and are complementary to existing prompt optimization methods. Our results suggest that training practices, not architectural limitations, are the primary bottleneck preventing masked diffusion language models from infilling effective prompts
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
