SlimLLM: Accurate Structured Pruning for Large Language Models

Jialong Guo; Xinghao Chen; Yehui Tang; Yunhe Wang

arXiv:2505.22689·cs.LG·May 30, 2025

SlimLLM: Accurate Structured Pruning for Large Language Models

Jialong Guo, Xinghao Chen, Yehui Tang, Yunhe Wang

PDF

Open Access

TL;DR

SlimLLM introduces a fast, effective structured pruning method for large language models that considers holistic importance of sub-modules, enabling significant compression with minimal performance loss.

Contribution

The paper presents a novel holistic importance evaluation and a linear regression-based recovery strategy for structured pruning of LLMs, achieving state-of-the-art results.

Findings

01

Outperforms existing pruning methods on LLaMA benchmark

02

Maintains high performance with significant model compression

03

Demonstrates effectiveness across different LLM architectures

Abstract

Large language models(LLMs) have garnered significant attention and demonstrated impressive capabilities in a wide range of applications. However, due to their enormous computational costs, the deployment and application of LLMs are often severely limited. To address this issue, structured pruning is an effective solution to compress the parameters of LLMs. Determining the importance of each sub-module in LLMs and minimizing performance loss are critical issues that need to be carefully addressed in structured pruning. In this paper, we propose an effective and fast structured pruning method named SlimLLM for large language models. For channel and attention head pruning, we evaluate the importance based on the entire channel or head, rather than merely aggregating the importance of individual elements within a sub-module. This approach enables a more holistic consideration of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Pruning · LLaMA · Linear Regression