Demystifying When Pruning Works via Representation Hierarchies

Shwai He; Guoheng Sun; Haichao Zhang; Yun Fu; and Ang Li

arXiv:2603.24652·cs.CL·May 13, 2026

Demystifying When Pruning Works via Representation Hierarchies

Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, and Ang Li

PDF

1 Repo

TL;DR

This paper investigates why network pruning works well for some language tasks and not others by analyzing the effects on different internal representations within language models.

Contribution

It introduces a representation-hierarchy perspective to explain the varying impact of pruning across language tasks and provides practical guidance for its application.

Findings

01

Representations in embedding and logit spaces are robust to pruning.

02

The transformation from logits to probabilities amplifies pruning effects, affecting generation.

03

Pruning is effective for non-generative tasks due to stable probability subspaces.

Abstract

Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To understand this discrepancy, we analyze network pruning from a representation-hierarchy perspective, decomposing the internal computation of language models into three sequential spaces: embedding (hidden representations), logit (pre-softmax outputs), and probability (post-softmax distributions). We find that representations in the embedding and logit spaces are largely robust to pruning-induced perturbations. However, the nonlinear transformation from logits to probabilities amplifies these deviations, which accumulate across time steps and lead to substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CASE-Lab-UMD/Pruning-on-Representations
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.