ELRA: Exponential learning rate adaption gradient descent optimization   method

Alexander Kleinsorge; Stefan Kupper; Alexander Fauck; Felix Rothe

arXiv:2309.06274·cs.LG·September 13, 2023·1 cites

ELRA: Exponential learning rate adaption gradient descent optimization method

Alexander Kleinsorge, Stefan Kupper, Alexander Fauck, Felix Rothe

PDF

Open Access

TL;DR

ELRA is a new gradient descent optimizer that adaptively adjusts the learning rate exponentially based on gradient orthogonality, offering fast convergence, universality, and rotation invariance, demonstrated on MNIST.

Contribution

Introduces ELRA, a hyper-parameter-free, rotation-invariant optimizer with exponential learning rate adaptation, scalable to high dimensions and effective on convex and non-convex problems.

Findings

01

High success rate and fast convergence demonstrated on MNIST.

02

Does not rely on hand-tuned parameters, increasing universality.

03

Scales linearly with problem dimension, suitable for large-scale problems.

Abstract

We present a novel, fast (exponential rate adaption), ab initio (hyper-parameter-free) gradient based optimizer algorithm. The main idea of the method is to adapt the learning rate $α$ by situational awareness, mainly striving for orthogonal neighboring gradients. The method has a high success and fast convergence rate and does not rely on hand-tuned parameters giving it greater universality. It can be applied to problems of any dimensions n and scales only linearly (of order O(n)) with the dimension of the problem. It optimizes convex and non-convex continuous landscapes providing some kind of gradient. In contrast to the Ada-family (AdaGrad, AdaMax, AdaDelta, Adam, etc.) the method is rotation invariant: optimization path and performance are independent of coordinate choices. The impressive performance is demonstrated by extensive experiments on the MNIST benchmark data-set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques

MethodsAdaMax · Adam · AdaDelta