2SSP: A Two-Stage Framework for Structured Pruning of LLMs

Fabrizio Sandri; Elia Cunegatti; Giovanni Iacca

arXiv:2501.17771·cs.CL·August 19, 2025

2SSP: A Two-Stage Framework for Structured Pruning of LLMs

Fabrizio Sandri, Elia Cunegatti, Giovanni Iacca

PDF

Open Access 1 Repo

TL;DR

This paper introduces 2SSP, a two-stage structured pruning framework for large language models that combines width and depth pruning to efficiently reduce model size while maintaining performance.

Contribution

The novel 2SSP framework integrates width and depth pruning strategies with a balancing mechanism, outperforming existing methods in efficiency and accuracy.

Findings

01

Outperforms five state-of-the-art pruning methods.

02

Achieves up to 50% sparsity with minimal perplexity increase.

03

Reduces pruning time by up to two orders of magnitude.

Abstract

We propose a novel Two-Stage framework for Structured Pruning (\textsc{2SSP}) for pruning Large Language Models (LLMs), which combines two different strategies of pruning, namely Width and Depth Pruning. The first stage (Width Pruning) removes entire neurons, hence their corresponding rows and columns, aiming to preserve the connectivity among the pruned structures in the intermediate state of the Feed-Forward Networks in each Transformer block. This is done based on an importance score measuring the impact of each neuron on the output magnitude. The second stage (Depth Pruning), instead, removes entire Attention submodules. This is done by applying an iterative process that removes the Attention with the minimum impact on a given metric of interest (in our case, perplexity). We also propose a novel mechanism to balance the sparsity rate of the two stages w.r.t. to the desired global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fabriziosandri/2ssp
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Digital Rights Management and Security

MethodsAttention Is All You Need · Softmax · Adam · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer