The Quenching-Activation Behavior of the Gradient Descent Dynamics for   Two-layer Neural Network Models

Chao Ma; Lei Wu; Weinan E

arXiv:2006.14450·cs.LG·June 26, 2020·6 cites

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

Chao Ma, Lei Wu, Weinan E

PDF

Open Access 1 Repo

TL;DR

This paper investigates the gradient descent dynamics of two-layer neural networks, revealing a quenching-activation process that explains implicit regularization and differs from mean-field behavior, across different parameter regimes.

Contribution

It introduces a detailed phenomenological analysis of GD dynamics, highlighting the quenching-activation transition and its implications for neural network training behavior.

Findings

01

Identifies two phases in GD dynamics: quenched and activated neurons.

02

Shows a transition from neural network-like to random feature-like behavior.

03

Suggests the quenching-activation process as a mechanism for implicit regularization.

Abstract

A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons. It is found that for Xavier-like initialization, there are two distinctive phases in the dynamic behavior of GD in the under-parametrized regime: An early phase in which the GD dynamics follows closely that of the corresponding random feature model and the neurons are effectively quenched, followed by a late phase in which the neurons are divided into two groups: a group of a few "activated" neurons that dominate the dynamics and a group of background (or "quenched") neurons that support the continued activation and deactivation process. This neural network-like behavior is continued into the mildly over-parametrized regime,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TheoreticalML/GD.quenching_activation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques