Symmetric Pruning of Large Language Models
Kai Yi, Peter Richt\'arik

TL;DR
This paper provides new theoretical insights into large language model pruning, introduces strategies that consider activations and weight importance, and proposes a training-free fine-tuning method that achieves state-of-the-art results.
Contribution
It offers a theoretical foundation for pruning methods, proposes complementary strategies, and introduces a novel training-free fine-tuning approach for improved pruning performance.
Findings
Substantial performance improvements over existing methods.
Introduction of a training-free fine-tuning approach $R^2$-DSnoT.
Establishment of a new state-of-the-art in model pruning.
Abstract
Popular post-training pruning methods such as Wanda and RIA are known for their simple, yet effective, designs that have shown exceptional empirical performance. Wanda optimizes performance through calibrated activations during pruning, while RIA emphasizes the relative, rather than absolute, importance of weight elements. Despite their practical success, a thorough theoretical foundation explaining these outcomes has been lacking. This paper introduces new theoretical insights that redefine the standard minimization objective for pruning, offering a deeper understanding of the factors contributing to their success. Our study extends beyond these insights by proposing complementary strategies that consider both input activations and weight significance. We validate these approaches through rigorous experiments, demonstrating substantial enhancements over existing methods. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
MethodsPruning
