Phases of learning dynamics in artificial neural networks: with or without mislabeled data
Yu Feng, Yuhai Tu

TL;DR
This paper uses a statistical physics framework to analyze the learning dynamics of neural networks trained with SGD, revealing distinct phases in learning with or without mislabeled data and their impact on generalization.
Contribution
It introduces a novel physics-inspired framework to characterize SGD dynamics and identifies phase transitions in learning, especially in the presence of mislabeled data.
Findings
SGD dynamics transition from fast learning to slow exploration without mislabeled data.
With mislabeled data, four distinct learning phases are identified.
Sample loss separation during phase II aids in eliminating mislabeled samples.
Abstract
Despite tremendous success of deep neural network in machine learning, the underlying reason for its superior learning capability remains unclear. Here, we present a framework based on statistical physics to study dynamics of stochastic gradient descent (SGD) that drives learning in neural networks. By using the minibatch gradient ensemble, we construct order parameters to characterize dynamics of weight updates in SGD. Without mislabeled data, we find that the SGD learning dynamics transitions from a fast learning phase to a slow exploration phase, which is associated with large changes in order parameters that characterize the alignment of SGD gradients and their mean amplitude. In the case with randomly mislabeled samples, SGD learning dynamics falls into four distinct phases. The system first finds solutions for the correctly labeled samples in phase I, it then wanders around these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms · Face and Expression Recognition
MethodsStochastic Gradient Descent
