ACE: Exploring Activation Cosine Similarity and Variance for Accurate and Calibration-Efficient LLM Pruning
Zhendong Mi, Zhenglun Kong, Geng Yuan, Shaoyi Huang

TL;DR
This paper introduces a novel LLM pruning method that combines activation cosine similarity and variance metrics to improve accuracy, calibration, and efficiency, achieving significant reductions in perplexity and pruning time.
Contribution
It proposes a new pruning approach using activation cosine similarity and variance metrics, enhancing both performance and speed of LLM pruning.
Findings
Achieves up to 18% perplexity reduction.
Reduces pruning time by up to 63%.
Effective on models like LLaMA, LLaMA-2, and OPT.
Abstract
With the rapid expansion of large language models (LLMs), the demand for memory and computational resources has grown significantly. Recent advances in LLM pruning aim to reduce the size and computational cost of these models. However, existing methods often suffer from either suboptimal pruning performance or low time efficiency during the pruning process. In this work, we propose an efficient and effective pruning method that simultaneously achieves high pruning performance and fast pruning speed with improved calibration efficiency. Our approach introduces two key innovations: (1) An activation cosine similarity loss-guided pruning metric, which considers the angular deviation of the output activation between the dense and pruned models. (2) An activation variance-guided pruning metric, which helps preserve semantic distinctions in output activations after pruning, enabling effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques
