Gradient Centralization: A New Optimization Technique for Deep Neural Networks
Hongwei Yong, Jianqiang Huang, Xiansheng Hua, Lei Zhang

TL;DR
Gradient Centralization (GC) is a novel optimization technique that operates directly on gradients by centralizing them, leading to improved training stability, regularization, and generalization in deep neural networks across various tasks.
Contribution
This paper introduces Gradient Centralization, a simple yet effective gradient operation that enhances DNN training by improving regularization and stability, and can be integrated into existing optimizers.
Findings
GC improves training stability and convergence.
GC enhances generalization performance across tasks.
GC is easy to implement and compatible with existing optimizers.
Abstract
Optimization techniques are of great importance to effectively and efficiently train a deep neural network (DNN). It has been shown that using the first and second order statistics (e.g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance. Different from these existing methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. Moreover, GC improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsWeight Standardization · Batch Normalization
