Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks
Sumanth Sadu, Shiv Ram Dubey, SR Sreeja

TL;DR
This paper introduces a moment centralization technique for SGD optimizers in CNN training, explicitly enforcing zero mean in the first-order moment, which improves optimization performance across several state-of-the-art methods and datasets.
Contribution
It proposes a generic moment centralization method that can be integrated with existing adaptive optimizers, enhancing their effectiveness in CNN training.
Findings
Improved accuracy with Adam, Radam, and Adabelief on benchmark datasets.
Shorter and smoother optimization trajectories.
Outperforms existing gradient centralization methods.
Abstract
Convolutional neural networks (CNNs) have shown very appealing performance for many computer vision applications. The training of CNNs is generally performed using stochastic gradient descent (SGD) based optimization techniques. The adaptive momentum-based SGD optimizers are the recent trends. However, the existing optimizers are not able to maintain a zero mean in the first-order moment and struggle with optimization. In this paper, we propose a moment centralization-based SGD optimizer for CNNs. Specifically, we impose the zero mean constraints on the first-order moment explicitly. The proposed moment centralization is generic in nature and can be integrated with any of the existing adaptive momentum-based optimizers. The proposed idea is tested with three state-of-the-art optimization techniques, including Adam, Radam, and Adabelief on benchmark CIFAR10, CIFAR100, and TinyImageNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification
MethodsAdam · Stochastic Gradient Descent · Adabelief
