ROSE: Reordered SparseGPT for More Accurate One-Shot Large Language Models Pruning
Mingluo Su, Huan Wang

TL;DR
ROSE introduces a reordering strategy for SparseGPT pruning that enhances the accuracy of one-shot large language model pruning by prioritizing weights with higher potential errors, leading to better performance.
Contribution
The paper proposes ROSE, a novel reordering method for SparseGPT that improves pruning accuracy by adaptively prioritizing weights based on estimated pruning loss.
Findings
ROSE outperforms SparseGPT on multiple LLM benchmarks.
Reordering weights based on loss estimates improves pruning effectiveness.
Empirical results show significant accuracy gains across various models.
Abstract
Pruning is widely recognized as an effective method for reducing the parameters of large language models (LLMs), potentially leading to more efficient deployment and inference. One classic and prominent path of LLM one-shot pruning is to leverage second-order gradients (i.e., Hessian), represented by the pioneering work SparseGPT. However, the predefined left-to-right pruning order in SparseGPT leads to suboptimal performance when the weights exhibit columnar patterns. This paper studies the effect of pruning order under the SparseGPT framework. The analyses lead us to propose ROSE, a reordered SparseGPT method that prioritizes weights with larger potential pruning errors to be pruned earlier. ROSE first performs pre-pruning to identify candidate weights for removal, and estimates both column and block pruning loss. Subsequently, two-level reordering is performed: columns within each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
