Determining Layer-wise Sparsity for Large Language Models Through a   Theoretical Perspective

Weizhong Huang; Yuxin Zhang; Xiawu Zheng; Fei Chao; Rongrong Ji

arXiv:2502.14770·cs.LG·February 21, 2025

Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective

Weizhong Huang, Yuxin Zhang, Xiawu Zheng, Fei Chao, Rongrong Ji

PDF

Open Access

TL;DR

This paper presents a theoretically grounded method for determining layer-wise sparsity in large language models, addressing the reconstruction error explosion issue and achieving significant performance improvements across various models and tasks.

Contribution

We propose a simple, effective sparsity allocation scheme based on a monotonic arithmetic progression, reducing the complexity of layer-wise sparsity determination to a single hyperparameter tuning.

Findings

01

Reduces perplexity by 52.10 on sparse LLaMA2-7B.

02

Improves zero-shot accuracy by 10.50%.

03

Achieves 2.63x and 2.23x speedups on CPU and GPU.

Abstract

In this paper, we address the challenge of determining the layer-wise sparsity rates of large language models (LLMs) through a theoretical perspective. Specifically, we identify a critical issue of '' $reconstruction error explosion$ '' in existing LLMs sparsification methods. This refers to the cumulative effect of reconstruction errors throughout the sparsification process, where errors from earlier layers propagate and amplify in subsequent layers. As a result, the overall reconstruction error increases significantly, leading to a substantial degradation in model performance. Through theoretical analysis, we derive a simple yet effective approach to layer-wise sparsity allocation that mitigates this issue. Our method uses a monotonically increasing arithmetic progression, reducing the process of determining sparsity rates for multiple layers to the determination of a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques