Stochastic Gradient Descent for Two-layer Neural Networks

Dinghao Cao; Zheng-Chu Guo; Lei Shi

arXiv:2407.07670·stat.ML·July 11, 2024·1 cites

Stochastic Gradient Descent for Two-layer Neural Networks

Dinghao Cao, Zheng-Chu Guo, Lei Shi

PDF

Open Access

TL;DR

This paper analyzes the convergence rates of stochastic gradient descent on overparameterized two-layer neural networks, combining NTK approximation with RKHS analysis to provide new theoretical insights and relax network size constraints.

Contribution

It introduces a novel framework combining NTK and RKHS to analyze SGD convergence, significantly relaxing neuron number constraints from exponential to polynomial dependence.

Findings

01

Established sharp convergence rates for last iterate of SGD

02

Reduced neuron number constraints from exponential to polynomial

03

Enhanced understanding of optimization dynamics in overparameterized networks

Abstract

This paper presents a comprehensive study on the convergence rates of the stochastic gradient descent (SGD) algorithm when applied to overparameterized two-layer neural networks. Our approach combines the Neural Tangent Kernel (NTK) approximation with convergence analysis in the Reproducing Kernel Hilbert Space (RKHS) generated by NTK, aiming to provide a deep understanding of the convergence behavior of SGD in overparameterized two-layer neural networks. Our research framework enables us to explore the intricate interplay between kernel methods and optimization processes, shedding light on the optimization dynamics and convergence properties of neural networks. In this study, we establish sharp convergence rates for the last iterate of the SGD algorithm in overparameterized two-layer neural networks. Additionally, we have made significant advancements in relaxing the constraints on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Brain Tumor Detection and Classification

MethodsStochastic Gradient Descent · Neural Tangent Kernel