Sink-Aware Pruning for Diffusion Language Models

Aidar Myrzakhan; Tianyi Li; Bowei Guo; Shengkun Tang; Zhiqiang Shen

arXiv:2602.17664·cs.CL·February 20, 2026

Sink-Aware Pruning for Diffusion Language Models

Aidar Myrzakhan, Tianyi Li, Bowei Guo, Shengkun Tang, Zhiqiang Shen

PDF

Open Access

TL;DR

This paper introduces Sink-Aware Pruning, a novel method for reducing the inference cost of Diffusion Language Models by identifying and pruning unstable attention sinks, leading to better efficiency without retraining.

Contribution

It reveals that attention sinks in DLMs are often transient, unlike in AR models, and proposes a pruning method that leverages this insight for improved performance.

Findings

01

Outperforms prior pruning baselines in quality-efficiency trade-off

02

Pruning unstable sinks improves inference efficiency without retraining

03

Attention sink variance is higher in DLMs than in AR models

Abstract

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full generation trajectory (measured by how the dominant sink locations shift across timesteps), indicating that sinks are often transient and less structurally essential than in AR models. Based on this observation, we propose $Sink-Aware Pruning$ , which automatically identifies and prunes unstable sinks in DLMs (prior studies usually keep sinks for AR LLMs). Without retraining, our method achieves a better quality-efficiency trade-off and outperforms strong prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques