Dynamic Momentum Recalibration in Online Gradient Learning
Zhipeng Yao, Rui Yu, Guisong Chang, Ying Li, Yu Zhang, Dazhou Li

TL;DR
This paper introduces SGDF, a novel optimizer inspired by signal processing, which adaptively recalibrates momentum in SGD to improve gradient estimation and overall training performance.
Contribution
The paper proposes SGDF, an optimizer that dynamically adjusts gradient updates using optimal linear filtering principles, addressing bias-variance trade-offs in momentum-based methods.
Findings
SGDF outperforms traditional momentum methods in various architectures.
SGDF achieves comparable or superior results to state-of-the-art optimizers.
The approach generalizes to other optimization algorithms.
Abstract
Stochastic Gradient Descent (SGD) and its momentum variants form the backbone of deep learning optimization, yet the underlying dynamics of their gradient behavior remain insufficiently understood. In this work, we reinterpret gradient updates through the lens of signal processing and reveal that fixed momentum coefficients inherently distort the balance between bias and variance, leading to skewed or suboptimal parameter updates. To address this, we propose SGDF (SGD with Filter), an optimizer inspired by the principles of Optimal Linear Filtering. SGDF computes an online, time-varying gain to dynamically refine gradient estimation by minimizing the mean-squared error, thereby achieving an optimal trade-off between noise suppression and signal preservation. Furthermore, our approach could extend to other optimizers, showcasing its broad applicability to optimization frameworks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
