SDMPrune: Self-Distillation MLP Pruning for Efficient Large Language Models
Hourun Zhu, Chengchao Shen

TL;DR
SDMPrune introduces a self-distillation based pruning method focusing on MLP modules in large language models, significantly reducing parameters while maintaining performance, and outperforming existing pruning techniques.
Contribution
The paper proposes a novel self-distillation loss during pruning, specifically targeting MLP modules to improve compression of large language models without performance loss.
Findings
Outperforms existing pruning methods on zero-shot benchmarks.
Achieves significant parameter reduction in LLMs with minimal performance degradation.
Competitive results among 1B-scale open source LLMs.
Abstract
In spite of strong performance achieved by LLMs, the costs of their deployment are unaffordable. For the compression of LLMs, gradient-based pruning methods present promising effectiveness. However, in these methods, the gradient computation with one-hot labels ignore the potential predictions on other words, thus missing key information for generative capability of the original model. To address this issue, we introduce a self-distillation loss during the pruning phase (rather than post-training) to fully exploit the predictions of the original model, thereby obtaining more accurate gradient information for pruning. Moreover, we find that, compared to attention modules, the predictions of LLM are less sensitive to multilayer perceptron (MLP) modules, which take up more than parameters (LLaMA3.2-1.2B). To this end, we focus on the pruning of MLP modules, to significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsSoftmax · Attention Is All You Need · Focus · Pruning
