Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery   Tickets from Large Models

Ajay Jaiswal; Shiwei Liu; Tianlong Chen; Ying Ding; Zhangyang Wang

arXiv:2306.10460·cs.LG·June 21, 2023·1 cites

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Instant Soup Pruning, a fast and efficient method to generate high-quality subnetworks from large pre-trained models by aggregating multiple weakly trained masks, reducing pruning costs significantly.

Contribution

It proposes a novel pruning approach that replaces iterative magnitude pruning with a quick mask aggregation technique inspired by model soups, enabling efficient lottery ticket subnetwork extraction.

Findings

01

ISP achieves comparable performance to traditional IMP-based pruning.

02

It significantly reduces computational costs in large model pruning.

03

Validated on CLIP and BERT across multiple datasets.

Abstract

Large pre-trained transformers have been receiving explosive attention in the past few years, due to their wide adaptability for numerous downstream applications via fine-tuning, but their exponentially increasing parameter counts are becoming a primary hurdle to even just fine-tune them without industry-standard hardware. Recently, Lottery Ticket Hypothesis (LTH) and its variants, have been exploited to prune these large pre-trained models generating subnetworks that can achieve similar performance as their dense counterparts, but LTH pragmatism is enormously inhibited by repetitive full training and pruning routine of iterative magnitude pruning (IMP) which worsens with increasing model size. Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/instant_soup
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Advanced Neural Network Applications · Video Analysis and Summarization

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Pruning · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Weight Decay · Softmax · Residual Connection