Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training
Rui Pan, Shivanshu Shekhar, Boyao Wang, Shizhe Diao, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Tong Zhang

TL;DR
Adapt-Pruner introduces an effective adaptive structured pruning method for small language models that enhances performance, reduces training costs, and enables the discovery of competitive compact models.
Contribution
The paper proposes layer-wise adaptive pruning combined with incremental training, significantly improving small language model efficiency and performance compared to existing pruning techniques.
Findings
Outperforms existing pruning methods by 1%-7% in accuracy.
Restores MobileLLM-125M performance to 600M levels with fewer tokens.
Discovers a 1B model surpassing LLaMA-3.2-1B in benchmarks.
Abstract
Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train the models from scratch, which incurs substantial computational costs, or compress/prune existing large language models (LLMs), which results in performance drops and falls short in comparison to pre-training. In this paper, we investigate the family of acceleration methods that involve both structured pruning and model training. We found 1) layer-wise adaptive pruning (Adapt-Pruner) is extremely effective in LLMs and yields significant improvements over existing pruning techniques, 2) adaptive pruning equipped with further training leads to models comparable to those pre-training from scratch, 3) incremental pruning brings non-trivial performance gain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsSoftmax · Attention Is All You Need · Pruning
