A Convex-optimization-based Layer-wise Post-training Pruner for Large   Language Models

Pengxiang Zhao; Hanyu Hu; Ping Li; Yi Zheng; Zhefeng Wang; Xiaoming; Yuan

arXiv:2408.03728·cs.LG·August 8, 2024

A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Pengxiang Zhao, Hanyu Hu, Ping Li, Yi Zheng, Zhefeng Wang, Xiaoming, Yuan

PDF

Open Access

TL;DR

This paper presents FISTAPruner, a novel convex-optimization-based post-training pruning method for large language models that achieves high sparsity and efficiency without retraining, outperforming existing techniques.

Contribution

Introducing FISTAPruner, the first post-training pruner based on convex optimization with error correction and parallel support for large language models.

Findings

01

Outperforms state-of-the-art pruning methods on various LLMs.

02

Achieves high sparsity with minimal performance loss.

03

Supports both unstructured and semi-structured sparsity.

Abstract

Pruning is a critical strategy for compressing trained large language models (LLMs), aiming at substantial memory conservation and computational acceleration without compromising performance. However, existing pruning methods often necessitate inefficient retraining for billion-scale LLMs or rely on heuristic methods such as the optimal brain surgeon framework, which degrade performance. In this paper, we introduce FISTAPruner, the first post-training pruner based on convex optimization models and algorithms. Specifically, we propose a convex optimization model incorporating $ℓ_{1}$ norm to induce sparsity and utilize the FISTA solver for optimization. FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning. We comprehensively evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsPruning · LLaMA · OPT