Beyond Size: How Gradients Shape Pruning Decisions in Large Language   Models

Rocktim Jyoti Das; Mingjie Sun; Liqun Ma; Zhiqiang Shen

arXiv:2311.04902·cs.CL·April 10, 2024·1 cites

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Rocktim Jyoti Das, Mingjie Sun, Liqun Ma, Zhiqiang Shen

PDF

Open Access 2 Repos

TL;DR

This paper introduces GBLM-Pruner, a novel, training-free, gradient-based pruning method for large language models that leverages gradients to improve sparsity decisions without retraining, outperforming existing methods.

Contribution

The paper presents GBLM-Pruner, a gradient-based pruning technique that uses first-order Taylor expansion and normalized gradients, offering a simple, effective, and retraining-free approach for pruning LLMs.

Findings

01

GBLM-Pruner outperforms SparseGPT and Wanda on multiple benchmarks.

02

Incorporating gradients reveals structural patterns in unstructured pruning.

03

The method maintains performance without retraining or weight updates.

Abstract

Large Language Models (LLMs) with billions of parameters are prime targets for network pruning, removing some model weights without hurting performance. Prior approaches such as magnitude pruning, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained LLMs. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the pruning metric, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguingly, by incorporating gradients,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Vision Transformer · Residual Connection · Dropout · Multi-Head Attention · Adam · Softmax