Most Activation Functions Can Win the Lottery Without Excessive Depth
Rebekka Burkholz

TL;DR
This paper demonstrates that most activation functions, beyond ReLUs, can be effectively used in neural networks to find lottery tickets at practical depths with minimal overparameterization, advancing understanding of network pruning.
Contribution
It introduces a new construction showing that a depth of L+1 suffices for approximating target networks, extending lottery ticket theory to a broad class of activation functions.
Findings
Depth L+1 networks are sufficient for approximation.
Most activation functions enable lottery ticket existence at realistic depths.
Logarithmic overparameterization is enough for effective pruning.
Abstract
The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning, which has inspired interesting practical and theoretical insights into how neural networks can represent functions. For networks with ReLU activation functions, it has been proven that a target network with depth can be approximated by the subnetwork of a randomly initialized neural network that has double the target's depth and is wider by a logarithmic factor. We show that a depth network is sufficient. This result indicates that we can expect to find lottery tickets at realistic, commonly used depths while only requiring logarithmic overparametrization. Our novel construction approach applies to a large class of activation functions and is not limited to ReLUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Machine Learning and Algorithms
