Fast and Effective Weight Update for Pruned Large Language Models

Vladim\'ir Bo\v{z}a

arXiv:2401.02938·cs.CL·July 23, 2024·2 cites

Fast and Effective Weight Update for Pruned Large Language Models

Vladim\'ir Bo\v{z}a

PDF

Open Access 1 Repo

TL;DR

This paper introduces a fast ADMM-based weight update method for pruning large language models, significantly improving performance recovery after pruning with minimal computational cost.

Contribution

We propose a novel ADMM-based weight update algorithm combined with gradual pruning, achieving state-of-the-art results in LLM pruning efficiency and effectiveness.

Findings

01

Achieved state-of-the-art pruning performance across various LLMs.

02

Reduced computational cost of fine-tuning after pruning.

03

Demonstrated effectiveness of gradual pruning mask selection.

Abstract

Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations. In our paper, we propose a fast and effective weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). We further extend it with a simple gradual pruning mask selection and achieve state-of-the-art pruning performance across a wide range of LLMs. Code is available at https://github.com/fmfi-compbio/admm-pruning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fmfi-compbio/admm-pruning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsPruning