Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

Sterling Huang; Abigayle Brown; Jiyoo Noh; Jiakang Xu; Wantong Huo; Kaung Myat Kyaw; Jonathan Chan

arXiv:2605.17932·cs.CL·May 19, 2026

Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

Sterling Huang, Abigayle Brown, Jiyoo Noh, Jiakang Xu, Wantong Huo, Kaung Myat Kyaw, Jonathan Chan

PDF

TL;DR

This paper evaluates how prompt compression affects diffusion large language models like LLaDA, revealing that compression impacts reasoning and summarization differently and highlighting the need for diffusion-specific compression methods.

Contribution

It provides the first systematic evaluation of prompt compression transferability from autoregressive to diffusion LLMs, demonstrating the limitations and challenges involved.

Findings

01

Summarization tasks are relatively robust under prompt compression.

02

Mathematical reasoning performance degrades significantly despite high semantic similarity.

03

Compression failures are mainly due to information omission rather than semantic drift.

Abstract

Prompt compression reduces inference cost and context length in large language models, but prior evaluations focus primarily on autoregressive architectures. This study investigates whether prompt compression transfers effectively to diffusion large language models (DLLMs) using LLMLingua-2, specifically the 8B-parameter DLLM LLaDA. We evaluate compression performance on GSM8K, DUC2004, and ShareGPT using 250 prompts per dataset at an approximate 2 $\times$ compression ratio, across mathematical reasoning, prompt reconstruction, and summarization tasks. Outputs generated from original prompts, compressed prompts, reconstructed prompts, and reconstructed-prompt reasoning were compared using exact-match accuracy, BLEU, ROUGE, and BERTScore. Results show that semantic preservation does not necessarily imply stable downstream behavior in diffusion models. Summarization tasks remained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.