Precise gradient descent training dynamics for finite-width multi-layer neural networks
Qiyang Han, Masaaki Imaizumi

TL;DR
This paper provides a detailed, non-asymptotic analysis of gradient descent dynamics for finite-width multi-layer neural networks, capturing fluctuations and generalization beyond existing theories.
Contribution
It introduces the first finite-width, non-asymptotic state evolution theory for multi-layer neural networks, extending understanding beyond NTK, MF, and TP frameworks.
Findings
Captures Gaussian fluctuations in first-layer weights.
Allows weights to evolve from individual initializations.
Enables estimation of generalization error during training.
Abstract
In this paper, we provide the first precise distributional characterization of gradient descent iterates for general multi-layer neural networks under the canonical single-index regression model, in the `finite-width proportional regime' where the sample size and feature dimension grow proportionally while the network width and depth remain bounded. Our non-asymptotic state evolution theory captures Gaussian fluctuations in first-layer weights and concentration in deeper-layer weights, and remains valid for non-Gaussian features. Our theory differs from existing neural tangent kernel (NTK), mean-field (MF) theories and tensor program (TP) in several key aspects. First, our theory operates in the finite-width regime whereas these existing theories are fundamentally infinite-width. Second, our theory allows weights to evolve from individual initializations beyond the lazy training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Model Reduction and Neural Networks
MethodsNeural Tangent Kernel · Early Stopping
