A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
Guoqiang Zhang, Kenta Niwa, W. Bastiaan Kleijn

TL;DR
This paper introduces Aida, an improved optimizer based on AdaBelief that suppresses the adaptive stepsize range, leading to better performance across various neural network training tasks.
Contribution
The paper proposes a novel optimizer, Aida, which extends AdaBelief by suppressing adaptive stepsize range through layerwise vector projections, improving training outcomes.
Findings
Aida outperforms nine optimizers on NLP and image classification tasks.
Aida matches top performance on image generation models.
Aida yields higher validation accuracy than AdaBelief on ImageNet.
Abstract
We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon within the update expressions of AdaBelief reduces the range of the adaptive stepsizes, making AdaBelief closer to SGD with momentum. Secondly, we extend AdaBelief by further suppressing the range of the adaptive stepsizes. To achieve the above goal, we perform mutual layerwise vector projections between the gradient g_t and its first momentum m_t before using them to estimate the second momentum. The new optimization method is referred to as Aida. Thirdly, extensive experimental results show that Aida outperforms nine optimizers when training transformers and LSTMs for NLP, and VGG and ResNet for image classification over CIAF10 and CIFAR100…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Medical Image Segmentation Techniques
MethodsStochastic Gradient Descent · Batch Normalization · Average Pooling · 1x1 Convolution · Residual Connection · Softmax · Residual Block · Kaiming Initialization · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout
