Some Theoretical Results on Layerwise Effective Dimension Oscillations in Finite Width ReLU Networks
Darshan Makwana

TL;DR
This paper provides a detailed theoretical analysis of how the effective dimension, measured by rank, oscillates across layers in finite-width ReLU networks with Gaussian weights, revealing geometric decay and revival patterns.
Contribution
It derives explicit formulas and bounds for the expected rank of layer activations, highlighting the finite-width effects and oscillatory behavior absent in infinite-width models.
Findings
Expected rank decays geometrically with depth
Rank revival peaks occur at specific depths related to network parameters
Oscillatory rank behavior is specific to finite-width networks
Abstract
We analyze the layerwise effective dimension (rank of the feature matrix) in fully-connected ReLU networks of finite width. Specifically, for a fixed batch of inputs and random Gaussian weights, we derive closed-form expressions for the expected rank of the $m\times n$ hidden activation matrices. Our main result shows that so that the rank deficit decays geometrically with ratio . We also prove a sub-Gaussian concentration bound, and identify the "revival" depths at which the expected rank attains local maxima. In particular, these peaks occur at depths with height . We further show that this oscillatory rank behavior is a finite-width phenomenon: under orthogonal weight initialization or strong negative-slope…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Complex Network Analysis Techniques · Interconnection Networks and Systems
