On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
Xianliang Li, Jun Luo, Zhiwei Zheng, Hanxiao Wang, Li Luo, Lingkun Wen, Linlong Wu, Sheng Xu

TL;DR
This paper introduces a frequency domain analysis framework for momentum-based optimizers, revealing how adjusting momentum affects gradient filtering and proposing a new dynamic optimizer that improves training performance.
Contribution
It provides a novel frequency domain perspective on momentum methods, offering insights into optimal momentum coefficient selection and introducing the FSGDM optimizer.
Findings
High-frequency gradients are undesirable in late training stages.
Preserving initial gradients and amplifying low-frequency components improves performance.
FSGDM outperforms traditional momentum optimizers in experiments.
Abstract
Momentum-based optimizers are widely adopted for training neural networks. However, the optimal selection of momentum coefficients remains elusive. This uncertainty impedes a clear understanding of the role of momentum in stochastic gradient methods. In this paper, we present a frequency domain analysis framework that interprets the momentum method as a time-variant filter for gradients, where adjustments to momentum coefficients modify the filter characteristics. Our experiments support this perspective and provide a deeper understanding of the mechanism involved. Moreover, our analysis reveals the following significant findings: high-frequency gradient components are undesired in the late stages of training; preserving the original gradient in the early stages, and gradually amplifying low-frequency gradient components during training both enhance performance. Based on these insights,…
Peer Reviews
Decision·ICLR 2025 Poster
**Originality:** The paper brings the signal processing perspective to the understanding of SGD, which is invaluable. Seeing that a lot of the progress in ML is purely empirical, the systematic and thorough approach the paper takes is both refreshing and useful. Viewing momentum term contributions as corresponding to low/high-pass filters opens the door to apply various insights from signal processing directly to neural network training. Thinking of momentum as attenuating and amplifying various
An important work on how momentum works was published in 2017 in the Distill web-based journal: https://distill.pub/2017/momentum/ I believe that this paper needs to feature in the related literature. Also, in case the authors are not aware of it, I believe they will enjoy reading it! As mentioned above, the experiments conducted are truly comprehensive and span a wide range of problems and architectures. However, what is missing from the experiments is the presence of the (arguably) state of t
1. The paper is clear and straightforward, making it easy to follow. The core idea is both simple and effective, showcasing an elegant solution that achieves strong results without unnecessary complexity.
1. In traditional signal processing, many derivations provided in the paper are well-established and widely recognized. 2. Quantitative comparisons within this field remain relatively limited, often lacking comprehensive metrics or benchmarks to evaluate performance across different methods.
Originality: * The paper explores the impact of momentum on gradient frequency domain. This work is very original and has not been well explored before. The exploration shows that the momentum update is usually a low pass filter which gets more and more narrower low pass filter in the later part of the training. Quality: * The derivations based on signal processing principles, particularly the application of Z-transforms, provide a good theoretical foundation. The analysis effectively distingui
Significance: * While the Frequency perspective of momentum based weight updates is very impressive, the proposed FSGDM (Frequency Stochastic Gradient Descent with Momentum) does not appear to be very significant because of the following 2 reasons 1. Setting schedules for momentum coefficient and learning rates during training is a very common practice in NN training. In the proposed FSGDM method we still need to set bunch of hyperparameters like scaling factor c, momentum coefficient v. There
Code & Models
Videos
Taxonomy
TopicsPrecipitation Measurement and Analysis · Radio Wave Propagation Studies · Soil Moisture and Remote Sensing
