The Epochal Sawtooth Phenomenon: Unveiling Training Loss Oscillations in Adam and Other Optimizers
Qi Liu, Wanjing Ma

TL;DR
This paper uncovers and analyzes the Epochal Sawtooth Phenomenon (ESP), a recurring loss pattern during training with Adam and similar optimizers, caused by adaptive learning rate adjustments and data shuffling effects.
Contribution
It is the first comprehensive analysis of ESP, linking it to optimizer parameters, data shuffling, and model capacity, supported by empirical and simplified quadratic minimization experiments.
Findings
ESP is most pronounced with Adam optimizer.
Smaller $eta_2$ values increase ESP severity.
ESP can occur in simple quadratic optimization tasks.
Abstract
In this paper, we identify and analyze a recurring training loss pattern, which we term the \textit{Epochal Sawtooth Phenomenon (ESP)}, commonly observed during training with adaptive gradient-based optimizers, particularly Adam optimizer. This pattern is characterized by a sharp drop in loss at the beginning of each epoch, followed by a gradual increase, resulting in a sawtooth-shaped loss curve. Through empirical observations, we demonstrate that while this effect is most pronounced with Adam, it persists, although less severely, with other optimizers such as RMSProp. We empirically analyze the mechanisms underlying ESP, focusing on key factors such as Adam's parameters, batch size, data shuffling, and sample replacement. Our analysis shows that ESP arises from adaptive learning rate adjustments controlled by the second moment estimate. Additionally, we identify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHospital Admissions and Outcomes · Religion, Spirituality, and Psychology
MethodsRMSProp · Adam
