Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective
Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Fei Chao, Rongrong Ji

TL;DR
This paper presents a theoretically grounded method for determining layer-wise sparsity in large language models, addressing the reconstruction error explosion issue and achieving significant performance improvements across various models and tasks.
Contribution
We propose a simple, effective sparsity allocation scheme based on a monotonic arithmetic progression, reducing the complexity of layer-wise sparsity determination to a single hyperparameter tuning.
Findings
Reduces perplexity by 52.10 on sparse LLaMA2-7B.
Improves zero-shot accuracy by 10.50%.
Achieves 2.63x and 2.23x speedups on CPU and GPU.
Abstract
In this paper, we address the challenge of determining the layer-wise sparsity rates of large language models (LLMs) through a theoretical perspective. Specifically, we identify a critical issue of '''' in existing LLMs sparsification methods. This refers to the cumulative effect of reconstruction errors throughout the sparsification process, where errors from earlier layers propagate and amplify in subsequent layers. As a result, the overall reconstruction error increases significantly, leading to a substantial degradation in model performance. Through theoretical analysis, we derive a simple yet effective approach to layer-wise sparsity allocation that mitigates this issue. Our method uses a monotonically increasing arithmetic progression, reducing the process of determining sparsity rates for multiple layers to the determination of a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
