We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual   Learning Rate And Beyond

Afshin Khadangi

arXiv:2308.10740·cs.LG·August 22, 2023

We Don't Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

Afshin Khadangi

PDF

Open Access 1 Repo

TL;DR

This paper introduces EVE, a novel optimization method that applies different learning rates to gradient components, improving convergence speed and stability in deep neural network training.

Contribution

The paper presents EVE, a new optimization technique that uses dual learning rates and adaptive momentum for better performance over traditional methods.

Findings

01

EVE outperforms existing optimizers on multiple benchmarks.

02

EVE achieves faster convergence and improved stability.

03

EVE adapts effectively to complex loss landscapes.

Abstract

In the rapidly advancing field of deep learning, optimising deep neural networks is paramount. This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct components of the gradients. By bifurcating the learning rate, EVE enables more nuanced control and faster convergence, addressing the challenges associated with traditional single learning rate approaches. Utilising a momentum term that adapts to the learning landscape, the method achieves a more efficient navigation of the complex loss surface, resulting in enhanced performance and stability. Extensive experiments demonstrate that EVE significantly outperforms existing optimisation techniques across various benchmark datasets and architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akhadangi/EVE
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Image and Signal Denoising Methods · Anomaly Detection Techniques and Applications