Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Yuanzhi Li; Tengyu Ma; Hongyang R. Zhang

arXiv:2007.04596·cs.LG·July 10, 2020·6 cites

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

PDF

Open Access

TL;DR

This paper demonstrates that over-parameterized two-layer ReLU neural networks trained with gradient descent can efficiently learn certain functions beyond the capabilities of kernel methods like NTK, with provable guarantees.

Contribution

It provides the first theoretical proof that over-parameterized neural networks can learn beyond the NTK regime in polynomial time with polynomial samples.

Findings

01

Neural networks achieve population loss at most o(1/d).

02

Kernel methods have population loss at least Ω(1/d).

03

Gradient descent can learn the target function efficiently.

Abstract

We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x \in R^{d}$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{⋆} (x) = a^{⊤} ∣ W^{⋆} x ∣$ , where $a \in R^{d}$ is a nonnegative vector and $W^{⋆} \in R^{d \times d}$ is an orthonormal matrix. We show that an over-parametrized two-layer neural network with ReLU activation, trained by gradient descent from random initialization, can provably learn the ground truth network with population loss at most $o (1/ d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$ , has population loss at least $Ω (1/ d)$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

Methods*Communicated@Fast*How Do I Communicate to Expedia?