Loss Landscape Characterization of Neural Networks without   Over-Parametrization

Rustem Islamov; Niccol\`o Ajroldi; Antonio Orvieto; Aurelien Lucchi

arXiv:2410.12455·cs.LG·October 28, 2024

Loss Landscape Characterization of Neural Networks without Over-Parametrization

Rustem Islamov, Niccol\`o Ajroldi, Antonio Orvieto, Aurelien Lucchi

PDF

Open Access 1 Video

TL;DR

This paper introduces a new class of functions to characterize neural network loss landscapes without heavy over-parametrization, providing theoretical convergence guarantees and empirical validation across various models.

Contribution

The authors propose a novel function class that captures neural network loss landscapes without requiring extensive over-parametrization, enabling convergence analysis.

Findings

01

Gradient-based optimizers converge under the new function class.

02

The new class includes saddle points and models real loss landscapes.

03

Empirical results validate the theoretical analysis across multiple models.

Abstract

Optimization methods play a crucial role in modern machine learning, powering the remarkable empirical achievements of deep learning models. These successes are even more remarkable given the complex non-convex nature of the loss landscape of these models. Yet, ensuring the convergence of optimization methods requires specific structural conditions on the objective function that are rarely satisfied in practice. One prominent example is the widely recognized Polyak-Lojasiewicz (PL) inequality, which has gained considerable attention in recent years. However, validating such assumptions for deep neural networks entails substantial and often impractical levels of over-parametrization. In order to address this limitation, we propose a novel class of functions that can characterize the loss landscape of modern deep models without requiring extensive over-parametrization and can also include…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Loss Landscape Characterization of Neural Networks without Over-Parametrization· slideslive

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Attention Is All You Need