TL;DR
This paper introduces Variance-Based Pruning, a one-shot structured pruning method that efficiently compresses trained networks with minimal fine-tuning, maintaining high accuracy and significantly reducing computational costs.
Contribution
It proposes a novel pruning technique that uses activation statistics to select neurons, preserving performance with little retraining, and demonstrates its effectiveness on ImageNet-1k.
Findings
Retains over 70% of DeiT-Base performance immediately after pruning.
Requires only 10 epochs of fine-tuning to recover 99% of original accuracy.
Reduces MACs by 35% and model size by 36%, speeding up inference by 1.44x.
Abstract
Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
