How much pre-training is enough to discover a good subnetwork?

Cameron R. Wolfe; Fangshuo Liao; Qihan Wang; Junhyung Lyle Kim,; Anastasios Kyrillidis

arXiv:2108.00259·stat.ML·August 24, 2023·1 cites

How much pre-training is enough to discover a good subnetwork?

Cameron R. Wolfe, Fangshuo Liao, Qihan Wang, Junhyung Lyle Kim,, Anastasios Kyrillidis

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of the amount of pre-training needed for neural network pruning to produce high-performing subnetworks, supported by empirical validation on MNIST.

Contribution

It introduces a theoretical bound on pre-training iterations necessary for effective pruning, linking pre-training duration to dataset size.

Findings

01

A logarithmic relationship between dataset size and pre-training threshold.

02

A theoretical bound on pre-training iterations for pruning effectiveness.

03

Empirical validation on MNIST confirms theoretical predictions.

Abstract

Neural network pruning is useful for discovering efficient, high-performing subnetworks within pre-trained, dense network architectures. More often than not, it involves a three-step process -- pre-training, pruning, and re-training -- that is computationally expensive, as the dense model must be fully pre-trained. While previous work has revealed through experiments the relationship between the amount of pre-training and the performance of the pruned network, a theoretical characterization of such dependency is still missing. Aiming to mathematically analyze the amount of dense network pre-training needed for a pruned network to perform well, we discover a simple theoretical bound in the number of gradient descent pre-training iterations on a two-layer, fully-connected network, beyond which pruning via greedy forward selection [61] yields a subnetwork that achieves good training error.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsPruning · Stochastic Gradient Descent