ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Xiang Meng; Kayhan Behdin; Haoyue Wang; Rahul Mazumder

arXiv:2406.07831·cs.LG·September 9, 2025·1 cites

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Xiang Meng, Kayhan Behdin, Haoyue Wang, Rahul Mazumder

PDF

Open Access 1 Video

TL;DR

ALPS is an optimization-based pruning framework for large language models that significantly improves sparsity and performance, especially at high sparsity levels, by leveraging advanced optimization techniques and GPU acceleration.

Contribution

ALPS introduces a novel optimization-based approach for one-shot pruning of large language models, outperforming heuristic methods in achieving higher sparsity and better model performance.

Findings

01

Achieves 70% sparsity with 13% perplexity reduction on WikiText

02

Outperforms state-of-the-art methods in zero-shot benchmarks

03

Provides theoretical convergence guarantees for pruning

Abstract

The impressive performance of Large Language Models (LLMs) across various natural language processing tasks comes at the cost of vast computational resources and storage requirements. One-shot pruning techniques offer a way to alleviate these burdens by removing redundant weights without the need for retraining. Yet, the massive scale of LLMs often forces current pruning approaches to rely on heuristics instead of optimization-based techniques, potentially resulting in suboptimal compression. In this paper, we introduce ALPS, an optimization-based framework that tackles the pruning problem using the operator splitting technique and a preconditioned conjugate gradient-based post-processing step. Our approach incorporates novel techniques to accelerate and theoretically guarantee convergence while leveraging vectorization and GPU parallelism for efficiency. ALPS substantially outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsPruning