A Framework for Overparameterized Learning
D\'avid Terj\'ek, Diego Gonz\'alez-S\'anchez

TL;DR
This paper provides a theoretical framework for overparameterized neural networks, demonstrating convergence, implicit regularization, and generalization benefits of wide deep models, supported by empirical evidence.
Contribution
It proves convergence and regularization effects for wide deep neural networks, linking width to improved learning dynamics and generalization bounds.
Findings
Gradient descent converges to a global optimum in wide networks.
Wide networks exhibit implicit regularization effects.
Generalization bounds improve with increasing network width.
Abstract
A candidate explanation of the good empirical performance of deep neural networks is the implicit regularization effect of first order optimization methods. Inspired by this, we prove a convergence theorem for nonconvex composite optimization, and apply it to a general learning problem covering many machine learning applications, including supervised learning. We then present a deep multilayer perceptron model and prove that, when sufficiently wide, it leads to the convergence of gradient descent to a global optimum with a linear rate, benefits from the implicit regularization effect of gradient descent, is subject to novel bounds on the generalization error, exhibits the lazy training phenomenon and enjoys learning rate transfer across different widths. The corresponding coefficients, such as the convergence rate, improve as width is further increased,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
