TL;DR
DenoiseRotator enhances pruning robustness in large language models by redistributing parameter importance through learnable transformations, leading to significantly reduced performance degradation under various sparsity constraints.
Contribution
It introduces a novel importance redistribution approach using orthogonal transformations to improve pruning robustness, compatible with existing pruning methods.
Findings
Reduces perplexity gap by 58% on LLaMA3-70B with SparseGPT at 2:4 sparsity.
Improves zero-shot accuracy across multiple models and pruning techniques.
Seamlessly integrates with existing pruning methods to enhance their performance.
Abstract
Pruning is a widely used technique to compress large language models (LLMs) by removing unimportant weights, but it often suffers from significant performance degradation - especially under semi-structured sparsity constraints. Existing pruning methods primarily focus on estimating the importance of individual weights, which limits their ability to preserve critical capabilities of the model. In this work, we propose a new perspective: rather than merely selecting which weights to prune, we first redistribute parameter importance to make the model inherently more amenable to pruning. By minimizing the information entropy of normalized importance scores, our approach concentrates importance onto a smaller subset of weights, thereby enhancing pruning robustness. We instantiate this idea through DenoiseRotator, which applies learnable orthogonal transformations to the model's weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsPruning · Focus
