Early Stage Convergence and Global Convergence of Training Mildly   Parameterized Neural Networks

Mingze Wang; Chao Ma

arXiv:2206.02139·cs.LG·May 30, 2023·1 cites

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

Mingze Wang, Chao Ma

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the convergence behavior of gradient descent and stochastic gradient descent in training mildly parameterized neural networks, demonstrating early stage rapid loss decrease and conditions for global convergence.

Contribution

It introduces a microscopic neuron activation pattern analysis to establish early and global convergence results without extreme over-parameterization.

Findings

01

Significant loss decrease in early training stages

02

Global convergence under certain data and loss conditions

03

Neuron partition analysis offers new insights into training dynamics

Abstract

The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied. For a broad range of models and loss functions, including the most commonly used square loss and cross entropy loss, we prove an ``early stage convergence'' result. We show that the loss is decreased by a significant amount in the early stage of the training, and this decrease is fast. Furthurmore, for exponential type loss functions, and under some assumptions on the training data, we show global convergence of GD. Instead of relying on extreme over-parameterization, our study is based on a microscopic analysis of the activation patterns for the neurons, which helps us derive more powerful lower bounds for the gradient. The results on activation patterns, which we call ``neuron partition'', help build intuitions for understanding the behavior of neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wmz9/early_stage_convergence_neurips2022
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning in Materials Science · Neural Networks and Applications

MethodsStochastic Gradient Descent