Loss Landscape Characterization of Neural Networks without Over-Parametrization
Rustem Islamov, Niccol\`o Ajroldi, Antonio Orvieto, Aurelien Lucchi

TL;DR
This paper introduces a new class of functions to characterize neural network loss landscapes without heavy over-parametrization, providing theoretical convergence guarantees and empirical validation across various models.
Contribution
The authors propose a novel function class that captures neural network loss landscapes without requiring extensive over-parametrization, enabling convergence analysis.
Findings
Gradient-based optimizers converge under the new function class.
The new class includes saddle points and models real loss landscapes.
Empirical results validate the theoretical analysis across multiple models.
Abstract
Optimization methods play a crucial role in modern machine learning, powering the remarkable empirical achievements of deep learning models. These successes are even more remarkable given the complex non-convex nature of the loss landscape of these models. Yet, ensuring the convergence of optimization methods requires specific structural conditions on the objective function that are rarely satisfied in practice. One prominent example is the widely recognized Polyak-Lojasiewicz (PL) inequality, which has gained considerable attention in recent years. However, validating such assumptions for deep neural networks entails substantial and often impractical levels of over-parametrization. In order to address this limitation, we propose a novel class of functions that can characterize the loss landscape of modern deep models without requiring extensive over-parametrization and can also include…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need
