OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization
Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia, Ponomareva, Rahul Mazumder

TL;DR
This paper introduces OSSCAR, a scalable combinatorial optimization framework for one-shot structured pruning of large vision and language models, significantly reducing inference costs without retraining.
Contribution
It presents a novel optimization method for one-shot structured pruning that handles very large models efficiently, outperforming existing methods in speed and accuracy.
Findings
Achieves up to 125x lower perplexity on language models.
Provides 2x inference speedup with improved pruning.
Handles models with tens of billions of parameters.
Abstract
Structured pruning is a promising approach for reducing the inference costs of large vision and language models. By removing carefully chosen structures, e.g., neurons or attention heads, the improvements from this approach can be realized on standard deep learning hardware. In this work, we focus on structured pruning in the one-shot (post-training) setting, which does not require model retraining after pruning. We propose a novel combinatorial optimization framework for this problem, based on a layer-wise reconstruction objective and a careful reformulation that allows for scalable optimization. Moreover, we design a new local combinatorial optimization algorithm, which exploits low-rank updates for efficient local search. Our framework is time and memory-efficient and considerably improves upon state-of-the-art one-shot methods on vision models (e.g., ResNet50, MobileNet) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
