Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of   Neural Networks with Polynomial Width, Samples, and Time

Arvind Mahankali; Jeff Z. Haochen; Kefan Dong; Margalit Glasgow,; Tengyu Ma

arXiv:2306.16361·cs.LG·October 10, 2023·1 cites

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

Arvind Mahankali, Jeff Z. Haochen, Kefan Dong, Margalit Glasgow,, Tengyu Ma

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that vanilla gradient descent on polynomial-width two-layer neural networks can outperform kernel methods in terms of sample complexity, using a mean-field analysis without unnatural modifications.

Contribution

It provides a mean-field analysis showing unmodified gradient descent achieves better sample complexity than kernel methods, with polynomial convergence guarantees.

Findings

01

Gradient flow converges with $n=O(d^{3.1})$ samples

02

Network outperforms kernel methods with fewer samples

03

Projected gradient descent converges to low error with polynomial iterations

Abstract

Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from prior works, our analysis does not require unnatural modifications of the optimization algorithm. We prove that with sample size $n = O (d^{3.1})$ where $d$ is the dimension of the inputs, the network trained with projected gradient flow converges in $poly (d)$ time to a non-trivial error that is not achievable by kernel methods using $n ≪ d^{4}$ samples, hence demonstrating a clear separation between unmodified gradient descent and NTK. As a corollary, we show that projected gradient descent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Model Reduction and Neural Networks

MethodsNeural Tangent Kernel