Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen, Zuchao Li, Lefei Zhang, Bo Du, Hai Zhao

TL;DR
This paper introduces Admeta, a novel optimizer combining a double exponential moving average with a dynamic lookahead strategy, demonstrating improved convergence and performance over existing optimizers in deep learning tasks.
Contribution
The paper proposes a new optimizer framework, Admeta, integrating a DEMA-based backward-looking component and a dynamic lookahead forward-looking strategy, with implementations based on RAdam and SGDM.
Findings
Admeta outperforms baseline optimizers in diverse tasks.
Theoretical proof confirms convergence of Admeta algorithms.
Experimental results show advantages over recent competitive optimizers.
Abstract
Optimizer is an essential component for the success of deep learning, which guides the neural network to update the parameters according to the loss on the training set. SGD and Adam are two classical and effective optimizers on which researchers have proposed many variants, such as SGDM and RAdam. In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel \textsc{Admeta} (\textbf{A} \textbf{D}ouble exponential \textbf{M}oving averag\textbf{E} \textbf{T}o \textbf{A}daptive and non-adaptive momentum) optimizer framework. For backward-looking part, we propose a DEMA variant scheme, which is motivated by a metric in the stock market, to replace the common exponential moving average scheme. While in the forward-looking part, we present a dynamic lookahead strategy which asymptotically approaches a set value,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stock Market Forecasting Methods · Metaheuristic Optimization Algorithms Research
MethodsAdam · Balanced Selection · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent · RAdam
