A Theory of Non-Linear Feature Learning with One Gradient Step in   Two-Layer Neural Networks

Behrad Moniri; Donghwan Lee; Hamed Hassani; Edgar Dobriban

arXiv:2310.07891·stat.ML·April 11, 2025·2 cites

A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

Behrad Moniri, Donghwan Lee, Hamed Hassani, Edgar Dobriban

PDF

Open Access

TL;DR

This paper analyzes how increasing the learning rate in two-layer neural networks enables the learning of non-linear features through multiple spectral spikes, improving training and test performance.

Contribution

It introduces a theoretical framework showing that larger learning rates induce multiple spectral spikes, corresponding to polynomial features, thus enabling non-linear feature learning.

Findings

01

Multiple spectral spikes correspond to polynomial features.

02

Larger learning rates improve training and test errors.

03

Non-linear features enhance learning performance.

Abstract

Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Face and Expression Recognition · Machine Learning and ELM