SPAP: Structured Pruning via Alternating Optimization and Penalty   Methods

Hanyu Hu; Xiaoming Yuan

arXiv:2505.03373·cs.LG·May 7, 2025

SPAP: Structured Pruning via Alternating Optimization and Penalty Methods

Hanyu Hu, Xiaoming Yuan

PDF

Open Access

TL;DR

This paper introduces SPAP, an optimization-based structured pruning method for large language models that achieves significant speedups and memory savings while maintaining performance, addressing limitations of previous heuristic and costly approaches.

Contribution

SPAP formulates structured pruning as a mixed-integer optimization problem, employing an alternating minimization algorithm and penalty methods for efficient and effective pruning of LLMs.

Findings

01

Achieves 1.29× inference speedup at 30% sparsity

02

Reduces memory proportionally with pruning

03

Outperforms state-of-the-art pruning methods

Abstract

The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer from performance degradation, reliance on heuristic metrics, or expensive finetuning. To address these challenges, we propose SPAP (Structured Pruning via Alternating Optimization and Penalty Methods), a novel and efficient structured pruning framework for LLMs grounded in optimization theory. SPAP formulates the pruning problem through a mixed-integer optimization model, employs a penalty method that effectively makes pruning decisions to minimize pruning errors, and introduces an alternating minimization algorithm tailored to the splittable problem structure for efficient weight updates and performance recovery. Extensive experiments on OPT,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Surface Polishing Techniques · Fluid Dynamics Simulations and Interactions

MethodsPruning · OPT