Weight and Gradient Centralization in Deep Neural Networks
Wolfgang Fuhl, Enkelejda Kasneci

TL;DR
This paper explores weight and gradient normalization techniques in deep neural networks, demonstrating that combining these methods enhances generalization during training without impacting inference speed.
Contribution
It introduces a combined approach of weight and gradient normalization methods that improve network generalization and are only applied during training, unlike batch normalization.
Findings
Enhanced generalization performance in neural networks
Methods do not affect inference time
Combined normalization techniques outperform traditional batch normalization
Abstract
Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. Additional work has shown that the normalization of weights and additional conditioning as well as the normalization of gradients further improve the generalization. In this work, we combine several of these methods and thereby increase the generalization of the networks. The advantage of the newer methods compared to the batch normalization is not only increased generalization, but also that these methods only have to be applied during training and, therefore, do not influence the running time during use. Link to CUDA code https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization
