Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen

TL;DR
This paper introduces Adaptive Sparse Trainer (AST), a novel retraining framework for semi-structured sparse large language models that maintains high performance with minimal computational cost, enabling efficient deployment of compressed models.
Contribution
The paper proposes AST, a new method for training semi-structured sparse LLMs that learns optimal masks during training and incorporates knowledge distillation for improved efficiency and performance.
Findings
AST achieves state-of-the-art performance with less than 0.4% of pretraining tokens and GPU hours.
AST reduces perplexity and zero-shot accuracy gap significantly on LLaMA2-7B.
The method enables feasible deployment of semi-structured sparse LLMs with minimal performance loss.
Abstract
The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have attempted to compress LLMs using one-shot pruning methods. However, these methods often suffer from considerable performance degradation on complex language understanding tasks, raising concerns about the feasibility of pruning in LLMs. To address this issue, we propose Adaptive Sparse Trainer (AST), a novel and efficient retraining framework tailored for semi-structured sparse models. AST enables models to learn optimal masks during the weight update process without incurring additional computational overhead. Furthermore, we demonstrate that incorporating knowledge distillation significantly improves retraining efficiency and enhances model performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training · Knowledge Distillation · Pruning
