Layer Collapse in Diffusion Language Models

Alexander Conzelmann; Albert Catalan-Tatjer; Shiwei Liu

arXiv:2605.06366·cs.LG·May 12, 2026

Layer Collapse in Diffusion Language Models

Alexander Conzelmann, Albert Catalan-Tatjer, Shiwei Liu

PDF

1 Repo

TL;DR

This paper uncovers a layer-collapse phenomenon in diffusion language models, showing that early layers develop redundant, dominated activation patterns that are crucial for performance and influence compression strategies.

Contribution

It reveals that layer collapse in DLMs is driven by overtraining, not undertraining, and demonstrates the implications for model compression and deployment strategies.

Findings

01

Layer collapse involves a dominant outlier critical for model output.

02

DLMs are more robust to quantization and pruning than autoregressive models.

03

Optimal sparsity allocation varies significantly between DLMs and AR models.

Abstract

Diffusion language models (DLMs) have recently emerged as competitive alternatives to autoregressive (AR) language models, yet differences in their activation dynamics remain poorly understood. We characterize these dynamics in LLaDA-8B and identify a striking layer-collapse property: a few early layers exhibit highly similar, collapsed activation patterns dominated by a single large super-outlier persisting over a long token range. Despite its apparent redundancy, this outlier is critical: pruning it causes outputs to degrade into repetitive random token loops. Paradoxically, layers in LLaDA contain more redundant representations overall, with redundancy most pronounced in earlier layers -- the reverse of AR models, where deeper layers grow redundant due to undertraining. Our analysis indicates that layer collapse in DLMs is not driven by undertraining but by overtraining: a dominant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Conzel/super-outlier-dlm
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.