Demystifying the Global Convergence Puzzle of Learning Over-parameterized ReLU Nets in Very High Dimensions
Peng He

TL;DR
This paper provides a rigorous theoretical framework explaining why over-parameterized ReLU neural networks in very high dimensions tend to converge globally, highlighting the role of high-dimensional geometry and spectral properties.
Contribution
It introduces a novel geometric analysis of the optimization landscape for over-parameterized ReLU nets, improving bounds and revealing the influence of data geometry on convergence.
Findings
Asymptotic characterization of gradient norms and curvature.
Improved bounds on over-parameterization and learning rate.
Identification of geometric and spectral data properties affecting convergence.
Abstract
This theoretical paper is devoted to developing a rigorous theory for demystifying the global convergence phenomenon in a challenging scenario: learning over-parameterized Rectified Linear Unit (ReLU) nets for very high dimensional dataset under very mild assumptions. A major ingredient of our analysis is a fine-grained analysis of random activation matrices. The essential virtue of dissecting activation matrices is that it bridges the dynamics of optimization and angular distribution in high-dimensional data space. This angle-based detailed analysis leads to asymptotic characterizations of gradient norm and directional curvature of objective function at each gradient descent iteration, revealing that the empirical loss function enjoys nice geometrical properties in the overparameterized setting. Along the way, we significantly improve existing theoretical bounds on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Face and Expression Recognition
