How Does Gradient Descent Learn Features -- A Local Analysis for   Regularized Two-Layer Neural Networks

Mo Zhou; Rong Ge

arXiv:2406.01766·cs.LG·November 6, 2024

How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

Mo Zhou, Rong Ge

PDF

Open Access 1 Video

TL;DR

This paper provides a local convergence analysis of gradient descent in regularized two-layer neural networks, showing how features are learned both early and late in training, beyond the neural tangent kernel regime.

Contribution

It introduces a local convergence framework demonstrating feature learning at different training stages in regularized neural networks, extending beyond NTK limitations.

Findings

01

Gradient descent captures ground-truth directions after a loss threshold

02

Feature learning occurs both early and late in training

03

Regularization facilitates feature learning in neural networks

Abstract

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. We further strengthen this local convergence analysis by incorporating early-stage feature learning analysis. Our results demonstrate that feature learning not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

How does Gradient Descent Learn Features --- A Local Analysis for Regularized Two-Layer Neural Networks· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsNeural Tangent Kernel