GradMAP: Faster Layer Pruning with Gradient Metric and Projection Compensation
Hao Liu, Guangyan Li, Wensheng Zhang, and Yongqiang Tang

TL;DR
GradMAP introduces a fast, efficient layer pruning method for large language models that uses gradient metrics and projection compensation to maintain performance while significantly speeding up pruning.
Contribution
The paper presents a novel layer importance metric based on gradient magnitudes and a projection compensation technique, enabling faster and more effective layer pruning.
Findings
Achieves an average 4x speedup in pruning process.
Outperforms previous methods in maintaining model performance.
Reduces performance degradation through projection compensation.
Abstract
Large Language Models (LLMs) exhibit strong reasoning abilities, but their high computational costs limit their practical deployment. Recent studies reveal significant redundancy in LLMs layers, making layer pruning an active research topic. Layer pruning research primarily focuses on two aspects: measuring layer importance and recovering performance after pruning. Unfortunately, the present works fail to simultaneously maintain pruning performance and efficiency. In this study, we propose GradMAP, a faster layer pruning method with \textbf{Grad}ient \textbf{M}etric \textbf{A}nd \textbf{P}rojection compensation, which consists of two stages. In the first stage, we introduce a novel metric based on gradient magnitudes, enabling a global assessment of layer importance. Note that, it requires only a single backward propagation step per pruning decision, substantially enhancing pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
