Gradient Correction beyond Gradient Descent
Zefan Li, Bingbing Ni, Teng Li, WenJun Zhang, Wen Gao

TL;DR
This paper introduces GCGD, a framework for gradient correction that enhances gradient quality beyond standard gradient descent, leading to faster training and improved neural network performance.
Contribution
The paper proposes a novel gradient correction framework with two plug-in modules, GC-W and GC-ODE, to improve gradient quality beyond traditional gradient descent methods.
Findings
Reduces training epochs by approximately 20%
Improves neural network performance
Effectively enhances gradient quality
Abstract
The great success neural networks have achieved is inseparable from the application of gradient-descent (GD) algorithms. Based on GD, many variant algorithms have emerged to improve the GD optimization process. The gradient for back-propagation is apparently the most crucial aspect for the training of a neural network. The quality of the calculated gradient can be affected by multiple aspects, e.g., noisy data, calculation error, algorithm limitation, and so on. To reveal gradient information beyond gradient descent, we introduce a framework (\textbf{GCGD}) to perform gradient correction. GCGD consists of two plug-in modules: 1) inspired by the idea of gradient prediction, we propose a \textbf{GC-W} module for weight gradient correction; 2) based on Neural ODE, we propose a \textbf{GC-ODE} module for hidden states gradient correction. Experiment results show that our gradient correction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and ELM
