Multiple Descents in Deep Learning as a Sequence of Order-Chaos Transitions
Wenbo Wei, Nicholas Chong Jia Le, Choy Heng Lai, Ling Feng

TL;DR
This paper uncovers a 'multiple-descent' phenomenon in deep learning training, linking test loss cycles to order-chaos phase transitions, and identifies optimal training points at critical phase transition moments.
Contribution
It introduces the concept of multiple descent cycles during training and connects these to phase transitions between order and chaos in neural networks.
Findings
Test loss exhibits multiple cycles during training.
Optimal training occurs at the order-chaos transition point.
The first transition from order to chaos yields the best model performance.
Abstract
We observe a novel 'multiple-descent' phenomenon during the training process of LSTM, in which the test loss goes through long cycles of up and down trend multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in test loss are closely associated with the phase transition process between order and chaos, and the local optimal epochs are consistently at the critical transition point between the two phases. More importantly, the global optimal epoch occurs at the first transition from order to chaos, where the 'width' of the 'edge of chaos' is the widest, allowing the best exploration of better weight configurations for learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
