TL;DR
This paper reviews recent theoretical advances in understanding overparameterized neural networks, focusing on mathematical models and their implications for two-layer networks and deep learning challenges.
Contribution
It summarizes key mathematical models and progress in analyzing overparameterized neural networks, highlighting their convex-like behavior and algorithmic implications.
Findings
Overparameterized networks behave like convex systems in certain settings
Neural tangent kernel provides a local analysis framework
Recent progress enhances understanding of two-layer neural networks
Abstract
Deep learning has received considerable empirical successes in recent years. However, while many ad hoc tricks have been discovered by practitioners, until recently, there has been a lack of theoretical understanding for tricks invented in the deep learning literature. Known by practitioners that overparameterized neural networks are easy to learn, in the past few years there have been important theoretical developments in the analysis of overparameterized neural networks. In particular, it was shown that such systems behave like convex systems under various restricted settings, such as for two-layer NNs, and when learning is restricted locally in the so-called neural tangent kernel space around specialized initializations. This paper discusses some of these recent progresses leading to significant better understanding of neural networks. We will focus on the analysis of two-layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
