$\epsilon$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics

Jiang Yang; Yuxiang Zhao; Quanhui Zhu

arXiv:2412.05144·cs.LG·July 21, 2025

$\epsilon$-rank and the Staircase Phenomenon: New Insights into Neural Network Training Dynamics

Jiang Yang, Yuxiang Zhao, Quanhui Zhu

PDF

Open Access

TL;DR

This paper introduces the $\epsilon$-rank metric to analyze neural network training dynamics, revealing a staircase pattern during loss reduction and proposing a pre-training method to enhance training efficiency and accuracy.

Contribution

The paper defines the $\epsilon$-rank$ as a new metric, proves its negative correlation with loss, and develops a pre-training strategy to improve training outcomes.

Findings

01

$\epsilon$-rank$ increases during training and correlates with loss reduction.

02

A staircase pattern is observed in the training process.

03

Pre-training the initial layer elevates $\epsilon$-rank$ and accelerates training.

Abstract

Understanding the training dynamics of deep neural networks (DNNs), particularly how they evolve low-dimensional features from high-dimensional data, remains a central challenge in deep learning theory. In this work, we introduce the concept of $ϵ$ -rank, a novel metric quantifying the effective feature of neuron functions in the terminal hidden layer. Through extensive experiments across diverse tasks, we observe a universal staircase phenomenon: during training process implemented by the standard stochastic gradient descent methods, the decline of the loss function is accompanied by an increase in the $ϵ$ -rank and exhibits a staircase pattern. Theoretically, we rigorously prove a negative correlation between the loss lower bound and $ϵ$ -rank, demonstrating that a high $ϵ$ -rank is essential for significant loss reduction. Moreover, numerical evidences show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications