DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

Caleb Zheng; Jyotika Singh; Fang Tu; Weiyi Sun; Sujeeth Bharadwaj; Yassine Benajiba; Sujith Ravi; Eli Shlizerman; Dan Roth

arXiv:2604.06627·cs.CL·April 9, 2026

DiffuMask: Diffusion Language Model for Token-level Prompt Pruning

Caleb Zheng, Jyotika Singh, Fang Tu, Weiyi Sun, Sujeeth Bharadwaj, Yassine Benajiba, Sujith Ravi, Eli Shlizerman, Dan Roth

PDF

TL;DR

DiffuMask is a diffusion-based prompt pruning framework that rapidly compresses prompts by masking multiple tokens simultaneously, reducing length up to 80% while maintaining reasoning accuracy.

Contribution

It introduces a novel diffusion-based method for parallel prompt pruning, significantly speeding up compression and improving control over retained content.

Findings

01

Achieves up to 80% prompt length reduction.

02

Maintains or improves reasoning accuracy across various settings.

03

Accelerates prompt compression via parallel token masking.

Abstract

In-Context Learning and Chain-of-Thought prompting improve reasoning in large language models (LLMs). These typically come at the cost of longer, more expensive prompts that may contain redundant information. Prompt compression based on pruning offers a practical solution, yet existing methods rely on sequential token removal which is computationally intensive. We present DiffuMask, a diffusion-based framework integrating hierarchical shot-level and token-level pruning signals, that enables rapid and parallel prompt pruning via iterative mask prediction. DiffuMask substantially accelerates the compression process via masking multiple tokens in each denoising step. It offers tunable control over retained content, preserving essential reasoning context and achieving up to 80\% prompt length reduction. Meanwhile, it maintains or improves accuracy across in-domain, out-of-domain, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.