Scalable iterative pruning of large language and vision models using block coordinate descent
Gili Rosenberg, J. Kyle Brubaker, Martin J. A. Schuetz, Elton Yechao, Zhu, Serdar Kad{\i}o\u{g}lu, Sima E. Borujeni, Helmut G. Katzgraber

TL;DR
This paper introduces an iterative, block-wise pruning method called iCBS that scales to large models, improves performance at high sparsity, and offers a tradeoff between quality and computational cost.
Contribution
The paper presents a scalable, block coordinate descent-based pruning technique for large neural networks, extending the Combinatorial Brain Surgeon approach to large models like LLMs.
Findings
iCBS outperforms Wanda at the same density levels on large models
The method enables hardware acceleration and potential quantum optimization
It provides a quality-cost tradeoff not available with one-shot pruning
Abstract
Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that builds upon the Combinatorial Brain Surgeon, but solves an optimization problem over a subset of the network weights in an iterative, block-wise manner using block coordinate descent. The iterative, block-based nature of this pruning technique, which we dub ``iterative Combinatorial Brain Surgeon'' (iCBS) allows for scalability to very large models, including large language models (LLMs), that may not be feasible with a one-shot combinatorial optimization approach. When applied to large models like Mistral and DeiT, iCBS achieves higher performance metrics at the same density levels compared to existing pruning methods such as Wanda. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization
MethodsAttention Is All You Need · Dense Connections · Feedforward Network · Linear Layer · Attention Dropout · Softmax · Multi-Head Attention · Dropout · Data-efficient Image Transformer · Pruning
