Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Mikhail Belkin

TL;DR
This paper establishes new theoretical bounds on the Hessian's spectral norm and introduces a novel Restricted Strong Convexity analysis, demonstrating geometric convergence of gradient descent in deep models with smooth activations.
Contribution
It provides the first RSC-based analysis guaranteeing geometric convergence of gradient descent in deep learning models without relying on the Neural Tangent Kernel.
Findings
Spectral norm of Hessian is bounded by O(poly(L)/sqrt(m)).
RSC-based analysis guarantees geometric convergence of GD.
Experimental results support theoretical claims.
Abstract
We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with layers, width, and initialization variance. First, for suitable , we establish a upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is for the square loss. We also present results for more general losses. The RSC based analysis does not need…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Medical Imaging Techniques and Applications
