On the Impact of Overparameterization on the Training of a Shallow   Neural Network in High Dimensions

Simon Martin (DI-ENS; LPENS); Francis Bach (DI-ENS); Giulio Biroli; (LPENS)

arXiv:2311.03794·math.OC·November 8, 2023·AISTATS·2 cites

On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions

Simon Martin (DI-ENS, LPENS), Francis Bach (DI-ENS), Giulio Biroli, (LPENS)

PDF

Open Access

TL;DR

This paper analyzes how overparameterization affects the training of shallow neural networks with quadratic activations in high-dimensional Gaussian settings, providing theoretical convergence results and minimal overparameterization thresholds.

Contribution

It derives convergence properties and minimal overparameterization conditions for effective training of shallow neural networks in high dimensions, extending previous theoretical insights.

Findings

01

Convergence of gradient flow under certain overparameterization levels

02

Minimal overparameterization needed for strong signal recovery

03

Numerical validation for general initializations

Abstract

We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian.We first derive convergence properties for the gradient flow and quantify the overparameterization that is necessary to achieve a strong signal recovery. Then, assuming that the teachers and the students at initialization form independent orthonormal families, we derive a high-dimensional limit for the flow and show that the minimal overparameterization is sufficient for strong recovery. We verify by numerical experiments that these results hold for more general initializations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques