Weight and Gradient Centralization in Deep Neural Networks

Wolfgang Fuhl; Enkelejda Kasneci

arXiv:2010.00866·cs.CV·January 19, 2021

Weight and Gradient Centralization in Deep Neural Networks

Wolfgang Fuhl, Enkelejda Kasneci

PDF

TL;DR

This paper explores weight and gradient normalization techniques in deep neural networks, demonstrating that combining these methods enhances generalization during training without impacting inference speed.

Contribution

It introduces a combined approach of weight and gradient normalization methods that improve network generalization and are only applied during training, unlike batch normalization.

Findings

01

Enhanced generalization performance in neural networks

02

Methods do not affect inference time

03

Combined normalization techniques outperform traditional batch normalization

Abstract

Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. Additional work has shown that the normalization of weights and additional conditioning as well as the normalization of gradients further improve the generalization. In this work, we combine several of these methods and thereby increase the generalization of the networks. The advantage of the newer methods compared to the batch normalization is not only increased generalization, but also that these methods only have to be applied during training and, therefore, do not influence the running time during use. Link to CUDA code https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBatch Normalization