Global Convergence and Geometric Characterization of Slow to Fast Weight Evolution in Neural Network Training for Classifying Linearly Non-Separable Data
Ziang Long, Penghang Yin, Jack Xin

TL;DR
This paper analyzes the dynamics of gradient descent in neural networks trained on linearly non-separable data, revealing conditions for convergence to global minima and characterizing the transition from slow to fast weight evolution.
Contribution
It introduces a geometric condition on network weights that explains the transition from slow to fast weight convergence in training neural networks for non-separable data.
Findings
Gradient descent converges to global minima with perfect classification.
A geometric condition on weights predicts the transition from slow to fast convergence.
All critical points in the landscape are global minima under certain network size conditions.
Abstract
In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in orthogonal subspaces. We show that when the network has sufficient (but not exceedingly large) number of neurons, (1) the corresponding minimization problem has a desirable landscape where all critical points are global minima with perfect classification; (2) gradient descent is guaranteed to converge to the global minima. Moreover, we discovered a geometric condition on the network weights so that when it is satisfied, the weight evolution transitions from a slow phase of weight direction spreading to a fast phase of weight convergence. The geometric condition says that the convex hull of the weights projected on the unit sphere contains the origin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications
