Restricted Strong Convexity of Deep Learning Models with Smooth   Activations

Arindam Banerjee; Pedro Cisneros-Velarde; Libin Zhu; Mikhail Belkin

arXiv:2209.15106·cs.LG·October 3, 2022

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Mikhail Belkin

PDF

Open Access 1 Video

TL;DR

This paper establishes new theoretical bounds on the Hessian's spectral norm and introduces a novel Restricted Strong Convexity analysis, demonstrating geometric convergence of gradient descent in deep models with smooth activations.

Contribution

It provides the first RSC-based analysis guaranteeing geometric convergence of gradient descent in deep learning models without relying on the Neural Tangent Kernel.

Findings

01

Spectral norm of Hessian is bounded by O(poly(L)/sqrt(m)).

02

RSC-based analysis guarantees geometric convergence of GD.

03

Experimental results support theoretical claims.

Abstract

We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $σ_{0}^{2}$ initialization variance. First, for suitable $σ_{0}^{2}$ , we establish a $O (\frac{poly ( L )}{m})$ upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $Ω (\frac{poly ( L )}{m})$ for the square loss. We also present results for more general losses. The RSC based analysis does not need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Restricted Strong Convexity of Deep Learning Models with Smooth Activations· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Medical Imaging Techniques and Applications