Neural Networks Efficiently Learn Low-Dimensional Representations with   SGD

Alireza Mousavi-Hosseini; Sejun Park; Manuela Girotti; Ioannis; Mitliagkas; Murat A. Erdogdu

arXiv:2209.14863·stat.ML·March 17, 2023·6 cites

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD

Alireza Mousavi-Hosseini, Sejun Park, Manuela Girotti, Ioannis, Mitliagkas, Murat A. Erdogdu

PDF

Open Access 1 Video

TL;DR

This paper proves that two-layer neural networks trained with SGD learn low-dimensional structures in data, leading to improved generalization bounds and sample efficiency, especially when the true data lies in a low-dimensional subspace.

Contribution

It establishes that SGD-trained neural networks recover the principal subspace of the true model and provides new generalization bounds and sample complexity results for low-dimensional learning.

Findings

01

Weights converge to the true subspace spanned by the underlying features.

02

Generalization error bound of O(√(kd/T)) independent of network width.

03

SGD-trained NNs can efficiently learn single-index models with linear sample complexity.

Abstract

We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $x \in R^{d}$ is Gaussian and the target $y \in R$ follows a multiple-index model, i.e., $y = g (⟨ u_{1}, x ⟩, ..., ⟨ u_{k}, x ⟩)$ with a noisy link function $g$ . We prove that the first-layer weights of the NN converge to the $k$ -dimensional principal subspace spanned by the vectors $u_{1}, ..., u_{k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k ≪ d$ . First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $O (k d / T)$ after $T$ iterations of SGD, which is independent of the width of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Neural Networks Efficiently Learn Low-Dimensional Representations with SGD· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM

MethodsWeight Decay · Stochastic Gradient Descent