OSSCAR: One-Shot Structured Pruning in Vision and Language Models with   Combinatorial Optimization

Xiang Meng; Shibal Ibrahim; Kayhan Behdin; Hussein Hazimeh; Natalia; Ponomareva; Rahul Mazumder

arXiv:2403.12983·cs.CV·March 21, 2024·2 cites

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia, Ponomareva, Rahul Mazumder

PDF

Open Access 1 Video

TL;DR

This paper introduces OSSCAR, a scalable combinatorial optimization framework for one-shot structured pruning of large vision and language models, significantly reducing inference costs without retraining.

Contribution

It presents a novel optimization method for one-shot structured pruning that handles very large models efficiently, outperforming existing methods in speed and accuracy.

Findings

01

Achieves up to 125x lower perplexity on language models.

02

Provides 2x inference speedup with improved pruning.

03

Handles models with tens of billions of parameters.

Abstract

Structured pruning is a promising approach for reducing the inference costs of large vision and language models. By removing carefully chosen structures, e.g., neurons or attention heads, the improvements from this approach can be realized on standard deep learning hardware. In this work, we focus on structured pruning in the one-shot (post-training) setting, which does not require model retraining after pruning. We propose a novel combinatorial optimization framework for this problem, based on a layer-wise reconstruction objective and a careful reformulation that allows for scalable optimization. Moreover, we design a new local combinatorial optimization algorithm, which exploits low-rank updates for efficient local search. Our framework is time and memory-efficient and considerably improves upon state-of-the-art one-shot methods on vision models (e.g., ResNet50, MobileNet) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques