Gradient Centralization: A New Optimization Technique for Deep Neural   Networks

Hongwei Yong; Jianqiang Huang; Xiansheng Hua; Lei Zhang

arXiv:2004.01461·cs.CV·April 9, 2020·32 cites

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

Hongwei Yong, Jianqiang Huang, Xiansheng Hua, Lei Zhang

PDF

Open Access 5 Repos

TL;DR

Gradient Centralization (GC) is a novel optimization technique that operates directly on gradients by centralizing them, leading to improved training stability, regularization, and generalization in deep neural networks across various tasks.

Contribution

This paper introduces Gradient Centralization, a simple yet effective gradient operation that enhances DNN training by improving regularization and stability, and can be integrated into existing optimizers.

Findings

01

GC improves training stability and convergence.

02

GC enhances generalization performance across tasks.

03

GC is easy to implement and compatible with existing optimizers.

Abstract

Optimization techniques are of great importance to effectively and efficiently train a deep neural network (DNN). It has been shown that using the first and second order statistics (e.g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance. Different from these existing methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. Moreover, GC improves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsWeight Standardization · Batch Normalization