Deep Control - a simple automatic gain control for memory efficient and high performance training of deep convolutional neural networks
Brendan Ruff

TL;DR
This paper introduces a simple automatic gain control method for convolutional neurons that enhances training efficiency and performance, matching or surpassing batch normalization, and works with both single sample and minibatch gradient descent.
Contribution
It proposes a novel automatic gain control technique integrated into each convolutional neuron, improving training stability and efficiency without requiring separate normalization layers.
Findings
Achieves comparable or better performance than batch normalization.
Compatible with single sample and minibatch training.
Reduces covariate shift and improves training speed.
Abstract
Training a deep convolutional neural net typically starts with a random initialisation of all filters in all layers which severely reduces the forward signal and back-propagated error and leads to slow and sub-optimal training. Techniques that counter that focus on either increasing the signal or increasing the gradients adaptively but the model behaves very differently at the beginning of training compared to later when stable pathways through the net have been established. To compound this problem the effective minibatch size varies greatly between layers at different depths and between individual filters as activation sparsity typically increases with depth leading to a reduction in effective learning rate since gradients may superpose rather than add and this further compounds the covariate shift problem as deeper neurons are less able to adapt to upstream shift. Proposed here is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
