Pruning Large Language Models with Semi-Structural Adaptive Sparse   Training

Weiyu Huang; Yuezhou Hu; Guohao Jian; Jun Zhu; Jianfei Chen

arXiv:2407.20584·cs.CL·December 19, 2024

Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Adaptive Sparse Trainer (AST), a novel retraining framework for semi-structured sparse large language models that maintains high performance with minimal computational cost, enabling efficient deployment of compressed models.

Contribution

The paper proposes AST, a new method for training semi-structured sparse LLMs that learns optimal masks during training and incorporates knowledge distillation for improved efficiency and performance.

Findings

01

AST achieves state-of-the-art performance with less than 0.4% of pretraining tokens and GPU hours.

02

AST reduces perplexity and zero-shot accuracy gap significantly on LLaMA2-7B.

03

The method enables feasible deployment of semi-structured sparse LLMs with minimal performance loss.

Abstract

The remarkable success of Large Language Models (LLMs) relies heavily on their substantial scale, which poses significant challenges during model deployment in terms of latency and memory consumption. Recently, numerous studies have attempted to compress LLMs using one-shot pruning methods. However, these methods often suffer from considerable performance degradation on complex language understanding tasks, raising concerns about the feasibility of pruning in LLMs. To address this issue, we propose Adaptive Sparse Trainer (AST), a novel and efficient retraining framework tailored for semi-structured sparse models. AST enables models to learn optimal masks during the weight update process without incurring additional computational overhead. Furthermore, we demonstrate that incorporating knowledge distillation significantly improves retraining efficiency and enhances model performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-ml/adaptive-sparse-trainer
pytorchOfficial

Models

🤗
Yellowtree/LLaMA2-7B_2-by-4_Sparse
model· 2 dl· ♡ 2
2 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training · Knowledge Distillation · Pruning