$\epsilon$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics
Jiang Yang, Yuxiang Zhao, Quanhui Zhu

TL;DR
This paper introduces the $\epsilon$-rank metric to analyze neural network training dynamics, revealing a staircase pattern during loss reduction and proposing a pre-training method to enhance training efficiency and accuracy.
Contribution
The paper defines the $\epsilon$-rank$ as a new metric, proves its negative correlation with loss, and develops a pre-training strategy to improve training outcomes.
Findings
$\epsilon$-rank$ increases during training and correlates with loss reduction.
A staircase pattern is observed in the training process.
Pre-training the initial layer elevates $\epsilon$-rank$ and accelerates training.
Abstract
Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of -rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the -rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and -rank, demonstrating that a high -rank is essential for significant loss reduction. Moreover, numerical evidences show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
