How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs
Emily Dent, Jared Tanner

TL;DR
This paper demonstrates that controlling the variance in Gaussian process initializations enhances training stability and expressivity in sparsely activated deep neural networks, enabling high sparsity levels and potential energy savings.
Contribution
It introduces the importance of variance control in Gaussian process initializations for sparsely activated networks, improving training stability and expressivity.
Findings
Larger fixed Gaussian process variances improve expressivity.
Training stability is enhanced with variance control.
High sparsity levels (up to 90%) still achieve near full accuracy.
Abstract
The intermediate layers of deep networks can be characterised as a Gaussian process, in particular the Edge-of-Chaos (EoC) initialisation strategy prescribes the limiting covariance matrix of the Gaussian process. Here we show that the under-utilised chosen variance of the Gaussian process is important in the training of deep networks with sparsity inducing activation, such as a shifted and clipped ReLU, . Specifically, initialisations leading to larger fixed Gaussian process variances, allow for improved expressivity with activation sparsity as large as 90% in DNNs and CNNs, and generally improve the stability of the training process. Enabling full, or near full, accuracy at such high levels of sparsity in the hidden layers suggests a promising mechanism to reduce the energy consumption of machine learning models involving fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning in Materials Science · Neural Networks and Reservoir Computing
