Escaping mediocrity: how two-layer networks learn hard generalized   linear models with SGD

Luca Arnaboldi; Florent Krzakala; Bruno Loureiro; Ludovic Stephan

arXiv:2305.18502·stat.ML·March 4, 2024·1 cites

Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD

Luca Arnaboldi, Florent Krzakala, Bruno Loureiro, Ludovic Stephan

PDF

Open Access 2 Repos

TL;DR

This paper analyzes how two-layer neural networks learn generalized linear models with SGD, revealing that overparameterization offers limited benefits and stochasticity plays a minor role in escaping flat initialization regions.

Contribution

It provides precise sample complexity results for two-layer networks, showing overparameterization's limited impact and the effectiveness of deterministic approximations in analyzing SGD dynamics.

Findings

01

Overparameterization improves convergence only by a constant factor.

02

Deterministic approximations effectively model SGD escape times.

03

Minimal stochasticity influence in escaping flat regions at initialization.

Abstract

This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n = O (d lo g d)$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Neural Networks and Applications

MethodsStochastic Gradient Descent