Fast and Effective Weight Update for Pruned Large Language Models
Vladim\'ir Bo\v{z}a

TL;DR
This paper introduces a fast ADMM-based weight update method for pruning large language models, significantly improving performance recovery after pruning with minimal computational cost.
Contribution
We propose a novel ADMM-based weight update algorithm combined with gradual pruning, achieving state-of-the-art results in LLM pruning efficiency and effectiveness.
Findings
Achieved state-of-the-art pruning performance across various LLMs.
Reduced computational cost of fine-tuning after pruning.
Demonstrated effectiveness of gradual pruning mask selection.
Abstract
Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations. In our paper, we propose a fast and effective weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). We further extend it with a simple gradual pruning mask selection and achieve state-of-the-art pruning performance across a wide range of LLMs. Code is available at https://github.com/fmfi-compbio/admm-pruning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsPruning
