TL;DR
The paper introduces TRSP, a two-stage regularization-based structured pruning method for large language models that preserves knowledge and performance without extensive retraining.
Contribution
It proposes a novel two-stage regularization approach for structured pruning, improving knowledge retention and model performance in LLMs.
Findings
TRSP outperforms existing layer-wise structured pruning methods.
It achieves significant end-to-end acceleration.
No retraining is required after pruning.
Abstract
The deployment of large language models (LLMs) is largely hindered by their large number of parameters. Structural pruning has emerged as a promising solution. Prior structured pruning methods directly remove unimportant parameters based on certain metrics, which often causes knowledge loss and necessitates extensive retraining. To overcome this, we introduce a novel pruning method TRSP: Two-Stage Regularization-Based Structured Pruning for LLMs. Specifically, we multiply the output of each transformer layer by an initial learnable weight and iteratively learn these weights by adding their -norm as a regularization term to the loss function, serving as the first-stage regularization. Subsequently, we apply additional regularization to the difference between the output and input of layers with smaller weights, encouraging the shift of knowledge to the preserved layers. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
