Global Convergence and Geometric Characterization of Slow to Fast Weight   Evolution in Neural Network Training for Classifying Linearly Non-Separable   Data

Ziang Long; Penghang Yin; Jack Xin

arXiv:2002.12563·cs.LG·December 11, 2020·1 cites

Global Convergence and Geometric Characterization of Slow to Fast Weight Evolution in Neural Network Training for Classifying Linearly Non-Separable Data

Ziang Long, Penghang Yin, Jack Xin

PDF

Open Access

TL;DR

This paper analyzes the dynamics of gradient descent in neural networks trained on linearly non-separable data, revealing conditions for convergence to global minima and characterizing the transition from slow to fast weight evolution.

Contribution

It introduces a geometric condition on network weights that explains the transition from slow to fast weight convergence in training neural networks for non-separable data.

Findings

01

Gradient descent converges to global minima with perfect classification.

02

A geometric condition on weights predicts the transition from slow to fast convergence.

03

All critical points in the landscape are global minima under certain network size conditions.

Abstract

In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in orthogonal subspaces. We show that when the network has sufficient (but not exceedingly large) number of neurons, (1) the corresponding minimization problem has a desirable landscape where all critical points are global minima with perfect classification; (2) gradient descent is guaranteed to converge to the global minima. Moreover, we discovered a geometric condition on the network weights so that when it is satisfied, the weight evolution transitions from a slow phase of weight direction spreading to a fast phase of weight convergence. The geometric condition says that the convex hull of the weights projected on the unit sphere contains the origin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications