DARE: Diffusion Language Model Activation Reuse for Efficient Inference

Natalia Frumkin; Bokun Wang; Hung-Yueh Chiang; Chi-Chih Chang; Mohamed S. Abdelfattah; Diana Marculescu

arXiv:2605.08134·cs.LG·May 12, 2026

DARE: Diffusion Language Model Activation Reuse for Efficient Inference

Natalia Frumkin, Bokun Wang, Hung-Yueh Chiang, Chi-Chih Chang, Mohamed S. Abdelfattah, Diana Marculescu

PDF

1 Repo

TL;DR

DARE introduces token-wise activation reuse techniques for diffusion language models, significantly reducing inference latency while maintaining high output quality, and can be combined with existing methods for further efficiency gains.

Contribution

It proposes novel token-wise reuse mechanisms (DARE-KV and DARE-O) for diffusion LLMs, improving efficiency without retraining and with minimal performance loss.

Findings

01

Up to 1.20x per-layer latency reduction.

02

Reuses up to 87% of attention activations.

03

Negligible degradation on reasoning and code-generation benchmarks.

Abstract

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to auto-regressive (AR) models, offering greater expressive capacity and potential for parallel generation and faster inference. However, open-source dLLMs remain immature, lagging behind AR models in both efficiency and quality. We identify an underexplored property of dLLMs: *token-wise redundancy* in bi-directional self-attention. Self-attention activations are highly correlated across tokens, and temporal changes in query representations can predict redundancy in corresponding key, value, and output activations. We introduce DARE, with two complementary mechanisms: DARE-KV, which reuses cached key-value (KV) activations, and DARE-O, which reuses output activations to reduce redundant computation while preserving quality. DARE achieves up to 1.20x per-layer latency reduction and reuses up to 87% of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

enyac-group/DARE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.