P$^2$ Law: Scaling Law for Post-Training After Model Pruning

Xiaodong Chen; Yuxuan Hu; Xiaokang Zhang; Yanling Wang; Cuiping Li; Hong Chen; Jing Zhang

arXiv:2411.10272·cs.AI·May 27, 2025

P$^2$ Law: Scaling Law for Post-Training After Model Pruning

Xiaodong Chen, Yuxuan Hu, Xiaokang Zhang, Yanling Wang, Cuiping Li, Hong Chen, Jing Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces the P$^2$ Law, a scaling law that predicts post-training loss for pruned large language models based on model size, pruning rate, dataset size, and initial loss, aiding efficient model fine-tuning.

Contribution

The paper proposes the P$^2$ Law, a novel scaling law that accurately predicts post-training performance of pruned LLMs considering key factors, and demonstrates its generalizability across various settings.

Findings

01

P$^2$ Law predicts post-training loss effectively.

02

The law generalizes to different dataset sizes, model sizes, and pruning rates.

03

Provides insights for optimizing post-training of pruned LLMs.

Abstract

Pruning has become a widely adopted technique for reducing the hardware requirements of large language models (LLMs). To recover model performance after pruning, post-training is commonly employed to mitigate the resulting performance degradation. While post-training benefits from larger datasets, once the dataset size is already substantial, increasing the training data provides only limited performance gains. To balance post-training cost and model performance, it is necessary to explore the optimal amount of post-training data.Through extensive experiments on the Llama-3 and Qwen-2.5 series models, pruned using various common pruning methods, we uncover the scaling \textbf{Law} for \textbf{P}ost-training after model \textbf{P}runing, referred to as the P $^{2}$ Law.This law identifies four key factors for predicting the pruned model's post-training loss: the model size before pruning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

P2 Law: Scaling Law for Post-Training After Model Pruning· underline

Taxonomy

TopicsModel-Driven Software Engineering Techniques

MethodsAttention Is All You Need · Adam · Residual Connection · Pruning · Byte Pair Encoding · Linear Layer · Absolute Position Encodings · Dense Connections · Softmax · Position-Wise Feed-Forward Layer