Thinking Inside the Mask: In-Place Prompting in Diffusion LLMs
Xiangqi Jin, Yuxuan Wang, Yifeng Gao, Zichen Wen, Biqing Qi, Dongrui Liu, Linfeng Zhang

TL;DR
This paper introduces ICE, a novel in-place prompting framework for diffusion large language models that enhances flexibility and efficiency through masked token prompts and early exit strategies, leading to significant accuracy and speed improvements.
Contribution
ICE transforms prefix-only prompting into in-place prompting for dLLMs, enabling bidirectional information flow and reducing computational costs with an early exit mechanism.
Findings
Achieves up to 17.29% accuracy improvement on GSM8K.
Realizes 4.12× speedup on GSM8K.
Attains 276.67× acceleration on MMLU.
Abstract
Despite large language models (LLMs) have achieved remarkable success, their prefix-only prompting paradigm and sequential generation process offer limited flexibility for bidirectional information. Diffusion large language models (dLLMs) present new opportunities through their bidirectional attention mechanisms and iterative refinement processes, enabling more flexible in-place prompting strategies. We introduce ICE (In-Place Chain-of-Thought Prompting with Early Exit), a novel framework that transforms prefix-only prompting into in-place prompting specifically designed for dLLMs. ICE integrates in-place prompts directly within masked token positions during iterative refinement and employs a confidence-aware early exit mechanism to significantly reduce computational overhead. Extensive experiments demonstrate ICE's effectiveness, achieving up to 17.29% accuracy improvement with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
