AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

Jun Lu

arXiv:2204.00825·cs.LG·April 5, 2022

AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio

Jun Lu

PDF

Open Access

TL;DR

AdaSmooth is a new adaptive learning rate method for gradient descent that reduces the need for hyper-parameter tuning and performs well across various neural network architectures and tasks.

Contribution

It introduces a hyper-parameter insensitive, per-dimension learning rate method called AdaSmooth for stochastic optimization.

Findings

01

AdaSmooth outperforms other methods on CNNs and MLPs.

02

It requires no manual hyper-parameter tuning.

03

Empirical results show strong practical performance.

Abstract

It is well known that we need to choose the hyper-parameters in Momentum, AdaGrad, AdaDelta, and other alternative stochastic optimizers. While in many cases, the hyper-parameters are tuned tediously based on experience becoming more of an art than science. We present a novel per-dimension learning rate method for gradient descent called AdaSmooth. The method is insensitive to hyper-parameters thus it requires no manual tuning of the hyper-parameters like Momentum, AdaGrad, and AdaDelta methods. We show promising results compared to other methods on different convolutional neural networks, multi-layer perceptron, and alternative machine learning tasks. Empirical results demonstrate that AdaSmooth works well in practice and compares favorably to other stochastic optimization methods in neural networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques

MethodsAdaptive Smooth Optimizer · AdaGrad · AdaDelta