Feature Learning Dynamics in Infinite-Depth Neural Networks
Zihan Yao, Ruoyu Wu, Tianxiang Gao

TL;DR
This paper rigorously analyzes the feature learning dynamics in infinite-depth ResNets, revealing how weight reuse affects training and proposing a simplified Neural Feature Dynamics model.
Contribution
It introduces a novel analysis of weight reuse effects in deep ResNets and derives a Neural Feature Dynamics model capturing feature-gradient interactions.
Findings
Coupling effects at initialization vanish at rate O(n^{-1})
SGD induces nontrivial correlations surviving infinite-width limit
Depth scaling suppresses coupling effects, enabling a simplified feature dynamics model
Abstract
Deep neural networks have achieved remarkable success in practice, yet a mechanistic understanding of how features evolve during training remains incomplete, especially in the large-depth limit. For ResNets under depth-P scaling, prior work treats the layer index as a continuous time , yielding SDE descriptions of the training dynamics. A key unresolved issue is that backpropagation reuses each forward weight matrix through its transpose , creating correlations between forward features and backward gradients whose behavior and role in feature learning remain unclear. We study this reused-weight forward--backward coupling in one-layer ResNets under depth-P. Using conditional Gaussian representations, we explicitly separate the coupling terms induced by weight reuse from decoupled Gaussian fluctuations before taking any network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
