
TL;DR
This paper introduces a novel 'power gradient' modification to existing gradient descent algorithms, aiming to improve convergence speed in flat regions and stability in steep directions, with empirical validation on popular methods.
Contribution
It proposes a new power gradient approach that enhances existing gradient descent methods, including Nesterov, AMSGrad, and ADAM, demonstrating improved performance.
Findings
Significantly better performance of modified methods on benchmark tasks.
Effective integration of power gradients into ADAM for improved stability.
Empirical validation across multiple modern gradient descent algorithms.
Abstract
The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in steep directions of the function to minimize. It is based on a "power gradient", in which each component of the gradient is replaced by its versus-preserving -th power, with . We test three modern gradient descent methods fed by such variant and by standard gradients, finding the new version to achieve significantly better performances for the Nesterov accelerated gradient and AMSGrad. We also propose an effective new take on the ADAM algorithm, which includes power gradients with varying .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Sparse and Compressive Sensing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · AMSGrad · Nesterov Accelerated Gradient · Adam
