AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved   Layer-wise Pruning of Large Language Models

Haiquan Lu; Yefan Zhou; Shiwei Liu; Zhangyang Wang; Michael W.; Mahoney; Yaoqing Yang

arXiv:2410.10912·cs.LG·October 16, 2024

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W., Mahoney, Yaoqing Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

AlphaPruning introduces a theoretically grounded layerwise pruning method for large language models, leveraging spectral density analysis to optimize sparsity ratios and achieve high pruning levels without significant performance loss.

Contribution

It applies Heavy-Tailed Self-Regularization theory to determine layerwise pruning ratios, improving upon heuristic methods for large language model pruning.

Findings

01

Prunes LLaMA-7B to 80% sparsity with maintained perplexity.

02

Uses spectral density shape metrics for pruning ratio allocation.

03

Demonstrates the effectiveness of the method across existing pruning techniques.

Abstract

Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance. In this paper, we leverage Heavy-Tailed Self-Regularization (HT-SR) Theory, in particular the shape of empirical spectral densities (ESDs) of weight matrices, to design improved layerwise pruning ratios for LLMs. Our analysis reveals a wide variability in how well-trained, and thus relatedly how prunable, different layers of an LLM are. Based on this, we propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haiquanlu/alphapruning
pytorchOfficial

Videos

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling

MethodsPruning