Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems
Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

TL;DR
This paper demonstrates that gradient descent can effectively learn less over-parameterized two-layer neural networks for classification tasks using logistic loss, with improved convergence and generalization bounds based on a neural tangent model.
Contribution
It introduces a refined convergence analysis for two-layer networks with smooth activations, emphasizing the separability assumption over the positivity of the neural tangent kernel, and shows better bounds for less over-parameterized networks.
Findings
Gradient descent converges under the separability assumption.
Better dependence on network width compared to prior work.
Provides generalization guarantees for smaller networks.
Abstract
Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out. On the other hand, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible. In this work, we demonstrate that the separability assumption using a neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. A remarkable point of our result is that our convergence and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
Methods*Communicated@Fast*How Do I Communicate to Expedia?
