Gradient Descent can Learn Less Over-parameterized Two-layer Neural   Networks on Classification Problems

Atsushi Nitanda; Geoffrey Chinot; Taiji Suzuki

arXiv:1905.09870·stat.ML·March 19, 2020·22 cites

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

PDF

Open Access

TL;DR

This paper demonstrates that gradient descent can effectively learn less over-parameterized two-layer neural networks for classification tasks using logistic loss, with improved convergence and generalization bounds based on a neural tangent model.

Contribution

It introduces a refined convergence analysis for two-layer networks with smooth activations, emphasizing the separability assumption over the positivity of the neural tangent kernel, and shows better bounds for less over-parameterized networks.

Findings

01

Gradient descent converges under the separability assumption.

02

Better dependence on network width compared to prior work.

03

Provides generalization guarantees for smaller networks.

Abstract

Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out. On the other hand, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible. In this work, we demonstrate that the separability assumption using a neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. A remarkable point of our result is that our convergence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM

Methods*Communicated@Fast*How Do I Communicate to Expedia?