Concurrent Training and Layer Pruning of Deep Neural Networks
Valentin Frank Ingmar Guenter, Athanasios Sideris

TL;DR
This paper introduces a novel algorithm for concurrent training and layer pruning in deep neural networks, leveraging variational inference and residual connections to reduce computational costs while maintaining performance.
Contribution
It presents a new layer pruning method based on variational inference with Gaussian priors, enabling simultaneous training and pruning with theoretical guarantees.
Findings
Achieves state-of-the-art layer pruning performance.
Reduces training and inference costs significantly.
Demonstrates effectiveness on MNIST, CIFAR-10, and ImageNet.
Abstract
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training. In contrast to weight or filter-level pruning, layer pruning reduces the harder to parallelize sequential computation of a neural network. We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned. Our approach is based on variational inference principles using Gaussian scale mixture priors on the neural network weights and allows for substantial cost savings during both training and inference. More specifically, the variational posterior distribution of scalar Bernoulli random variables multiplying a layer weight matrix of its nonlinear sections is learned, similarly to adaptive layer-wise dropout. To overcome challenges of concurrent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsConvolution · Kaiming Initialization · Average Pooling · Global Average Pooling · Max Pooling · Stochastic Gradient Descent · Pruning · Variational Inference
