Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Rui Pan; Shivanshu Shekhar; Boyao Wang; Shizhe Diao; Jipeng Zhang; Xingyuan Pan; Renjie Pi; Tong Zhang

arXiv:2502.03460·cs.LG·November 17, 2025

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

Rui Pan, Shivanshu Shekhar, Boyao Wang, Shizhe Diao, Jipeng Zhang, Xingyuan Pan, Renjie Pi, Tong Zhang

PDF

Open Access

TL;DR

Adapt-Pruner introduces an effective adaptive structured pruning method for small language models that enhances performance, reduces training costs, and enables the discovery of competitive compact models.

Contribution

The paper proposes layer-wise adaptive pruning combined with incremental training, significantly improving small language model efficiency and performance compared to existing pruning techniques.

Findings

01

Outperforms existing pruning methods by 1%-7% in accuracy.

02

Restores MobileLLM-125M performance to 600M levels with fewer tokens.

03

Discovers a 1B model surpassing LLaMA-3.2-1B in benchmarks.

Abstract

Small language models (SLMs) have attracted considerable attention from both academia and industry due to their broad range of applications in edge devices. To obtain SLMs with strong performance, conventional approaches either pre-train the models from scratch, which incurs substantial computational costs, or compress/prune existing large language models (LLMs), which results in performance drops and falls short in comparison to pre-training. In this paper, we investigate the family of acceleration methods that involve both structured pruning and model training. We found 1) layer-wise adaptive pruning (Adapt-Pruner) is extremely effective in LLMs and yields significant improvements over existing pruning techniques, 2) adaptive pruning equipped with further training leads to models comparable to those pre-training from scratch, 3) incremental pruning brings non-trivial performance gain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsSoftmax · Attention Is All You Need · Pruning