Pruning Foundation Models for High Accuracy without Retraining

Pu Zhao; Fei Sun; Xuan Shen; Pinrui Yu; Zhenglun Kong; Yanzhi Wang,; Xue Lin

arXiv:2410.15567·cs.LG·October 22, 2024

Pruning Foundation Models for High Accuracy without Retraining

Pu Zhao, Fei Sun, Xuan Shen, Pinrui Yu, Zhenglun Kong, Yanzhi Wang,, Xue Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel post-training pruning method for large language models that achieves high accuracy without retraining, reducing model size efficiently while maintaining performance.

Contribution

It formulates a layer-wise pruning problem for LLMs, provides an optimal solution, and designs a pruning algorithm for both unstructured and semi-structured sparsity, outperforming state-of-the-art methods.

Findings

01

Superior performance over SOTA baselines across various LLMs

02

Effective one-shot pruning without retraining

03

Maintains high accuracy with reduced model size

Abstract

Despite the superior performance, it is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations. While pruning is a promising technique to reduce model size and accelerate the inference, the traditional pruning techniques can hardly be applied for LLMs as they need to finetune the model on the full dataset with multiple epochs consuming massive data and hardware resources. To deal with this problem, post-training pruning methods are proposed to prune LLMs in one-shot without retraining. However, their accuracy after pruning may suffer from certain performance degradation due to the lack of retraining with massive data. To address this issue, in this paper, we first formulate the post-training problem for layer-wise LLM compression to simultaneously prune multiple weights in LLMs. Next, we provide an optimal solution for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

piuzha/apt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStructural Health Monitoring Techniques

MethodsPruning