On Multi-Stage Loss Dynamics in Neural Networks: Mechanisms of Plateau and Descent Stages
Zheng-An Chen, Tao Luo, GuiHong Wang

TL;DR
This paper investigates the multi-stage training loss dynamics of neural networks, analyzing the mechanisms behind plateau and descent phases, and providing new theoretical insights into these complex training behaviors.
Contribution
It offers a detailed theoretical analysis of the initial descent and secondary plateau stages, extending previous work on the initial plateau in neural network training.
Findings
Identified three distinct training stages: initial plateau, initial descent, secondary plateau.
Provided rigorous proofs for the initial plateau and analyzed the descent stage dynamics.
Used Wasserstein distance to connect global training trends with local parameter changes.
Abstract
The multi-stage phenomenon in the training loss curves of neural networks has been widely observed, reflecting the non-linearity and complexity inherent in the training process. In this work, we investigate the training dynamics of neural networks (NNs), with particular emphasis on the small initialization regime, identifying three distinct stages observed in the loss curve during training: the initial plateau stage, the initial descent stage, and the secondary plateau stage. Through rigorous analysis, we reveal the underlying challenges contributing to slow training during the plateau stages. While the proof and estimate for the emergence of the initial plateau were established in our previous work, the behaviors of the initial descent and secondary plateau stages had not been explored before. Here, we provide a more detailed proof for the initial plateau, followed by a comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
