On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance
Guoqiang Zhang

TL;DR
This paper introduces SET-Adam, an adaptive optimizer that suppresses the range of adaptive stepsizes using layerwise gradient statistics, leading to improved generalization in training deep neural networks across various tasks.
Contribution
We propose SET-Adam, a novel adaptive optimizer that applies three operations to the second momentum to enhance generalization performance.
Findings
SET-Adam outperforms eight adaptive optimizers on NLP and image classification tasks.
SET-Adam matches the best performance of existing adaptive methods on image generation.
SET-Adam achieves higher validation accuracy than Adam and AdaBelief on ImageNet.
Abstract
A number of recent adaptive optimizers improve the generalisation performance of Adam by essentially reducing the variance of adaptive stepsizes to get closer to SGD with momentum. Following the above motivation, we suppress the range of the adaptive stepsizes of Adam by exploiting the layerwise gradient statistics. In particular, at each iteration, we propose to perform three consecutive operations on the second momentum v_t before using it to update a DNN model: (1): down-scaling, (2): epsilon-embedding, and (3): down-translating. The resulting algorithm is referred to as SET-Adam, where SET is a brief notation of the three operations. The down-scaling operation on v_t is performed layerwise by making use of the angles between the layerwise subvectors of v_t and the corresponding all-one subvectors. Extensive experimental results show that SET-Adam outperforms eight adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Metaheuristic Optimization Algorithms Research
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Average Pooling · Residual Connection · Dense Connections · Bottleneck Residual Block · Residual Block · Dropout · Kaiming Initialization
