Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models
Rocktim Jyoti Das, Mingjie Sun, Liqun Ma, Zhiqiang Shen

TL;DR
This paper introduces GBLM-Pruner, a novel, training-free, gradient-based pruning method for large language models that leverages gradients to improve sparsity decisions without retraining, outperforming existing methods.
Contribution
The paper presents GBLM-Pruner, a gradient-based pruning technique that uses first-order Taylor expansion and normalized gradients, offering a simple, effective, and retraining-free approach for pruning LLMs.
Findings
GBLM-Pruner outperforms SparseGPT and Wanda on multiple benchmarks.
Incorporating gradients reveals structural patterns in unstructured pruning.
The method maintains performance without retraining or weight updates.
Abstract
Large Language Models (LLMs) with billions of parameters are prime targets for network pruning, removing some model weights without hurting performance. Prior approaches such as magnitude pruning, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained LLMs. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the pruning metric, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguingly, by incorporating gradients,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Vision Transformer · Residual Connection · Dropout · Multi-Head Attention · Adam · Softmax
