SPAP: Structured Pruning via Alternating Optimization and Penalty Methods
Hanyu Hu, Xiaoming Yuan

TL;DR
This paper introduces SPAP, an optimization-based structured pruning method for large language models that achieves significant speedups and memory savings while maintaining performance, addressing limitations of previous heuristic and costly approaches.
Contribution
SPAP formulates structured pruning as a mixed-integer optimization problem, employing an alternating minimization algorithm and penalty methods for efficient and effective pruning of LLMs.
Findings
Achieves 1.29× inference speedup at 30% sparsity
Reduces memory proportionally with pruning
Outperforms state-of-the-art pruning methods
Abstract
The deployment of large language models (LLMs) is often constrained by their substantial computational and memory demands. While structured pruning presents a viable approach by eliminating entire network components, existing methods suffer from performance degradation, reliance on heuristic metrics, or expensive finetuning. To address these challenges, we propose SPAP (Structured Pruning via Alternating Optimization and Penalty Methods), a novel and efficient structured pruning framework for LLMs grounded in optimization theory. SPAP formulates the pruning problem through a mixed-integer optimization model, employs a penalty method that effectively makes pruning decisions to minimize pruning errors, and introduces an alternating minimization algorithm tailored to the splittable problem structure for efficient weight updates and performance recovery. Extensive experiments on OPT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Surface Polishing Techniques · Fluid Dynamics Simulations and Interactions
MethodsPruning · OPT
