ELRA: Exponential learning rate adaption gradient descent optimization method
Alexander Kleinsorge, Stefan Kupper, Alexander Fauck, Felix Rothe

TL;DR
ELRA is a new gradient descent optimizer that adaptively adjusts the learning rate exponentially based on gradient orthogonality, offering fast convergence, universality, and rotation invariance, demonstrated on MNIST.
Contribution
Introduces ELRA, a hyper-parameter-free, rotation-invariant optimizer with exponential learning rate adaptation, scalable to high dimensions and effective on convex and non-convex problems.
Findings
High success rate and fast convergence demonstrated on MNIST.
Does not rely on hand-tuned parameters, increasing universality.
Scales linearly with problem dimension, suitable for large-scale problems.
Abstract
We present a novel, fast (exponential rate adaption), ab initio (hyper-parameter-free) gradient based optimizer algorithm. The main idea of the method is to adapt the learning rate by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and does not rely on hand-tuned parameters giving it greater universality. It can be applied to problems of any dimensions n and scales only linearly (of order O(n)) with the dimension of the problem. It optimizes convex and non-convex continuous landscapes providing some kind of gradient. In contrast to the Ada-family (AdaGrad, AdaMax, AdaDelta, Adam, etc.) the method is rotation invariant: optimization path and performance are independent of coordinate choices. The impressive performance is demonstrated by extensive experiments on the MNIST benchmark data-set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsAdaMax · Adam · AdaDelta
