Stochastic Gradient Descent for Two-layer Neural Networks
Dinghao Cao, Zheng-Chu Guo, Lei Shi

TL;DR
This paper analyzes the convergence rates of stochastic gradient descent on overparameterized two-layer neural networks, combining NTK approximation with RKHS analysis to provide new theoretical insights and relax network size constraints.
Contribution
It introduces a novel framework combining NTK and RKHS to analyze SGD convergence, significantly relaxing neuron number constraints from exponential to polynomial dependence.
Findings
Established sharp convergence rates for last iterate of SGD
Reduced neuron number constraints from exponential to polynomial
Enhanced understanding of optimization dynamics in overparameterized networks
Abstract
This paper presents a comprehensive study on the convergence rates of the stochastic gradient descent (SGD) algorithm when applied to overparameterized two-layer neural networks. Our approach combines the Neural Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Hilbert Space (RKHS) generated by NTK, aiming to provide a deep understanding of the convergence behavior of SGD in overparameterized two-layer neural networks. Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the optimization dynamics and convergence properties of neural networks. In this study, we establish sharp convergence rates for the last iterate of the SGD algorithm in overparameterized two-layer neural networks. Additionally, we have made significant advancements in relaxing the constraints on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Brain Tumor Detection and Classification
MethodsStochastic Gradient Descent · Neural Tangent Kernel
