Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis, Mitliagkas, Murat A. Erdogdu

TL;DR
This paper proves that two-layer neural networks trained with SGD learn low-dimensional structures in data, leading to improved generalization bounds and sample efficiency, especially when the true data lies in a low-dimensional subspace.
Contribution
It establishes that SGD-trained neural networks recover the principal subspace of the true model and provides new generalization bounds and sample complexity results for low-dimensional learning.
Findings
Weights converge to the true subspace spanned by the underlying features.
Generalization error bound of O(√(kd/T)) independent of network width.
SGD-trained NNs can efficiently learn single-index models with linear sample complexity.
Abstract
We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input is Gaussian and the target follows a multiple-index model, i.e., with a noisy link function . We prove that the first-layer weights of the NN converge to the -dimensional principal subspace spanned by the vectors of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when . First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of after iterations of SGD, which is independent of the width of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
MethodsWeight Decay · Stochastic Gradient Descent
