Understanding How Over-Parametrization Leads to Acceleration: A case of learning a single teacher neuron
Jun-Kun Wang, Jacob Abernethy

TL;DR
This paper investigates how over-parametrization accelerates learning by analyzing a simplified model with a teacher neuron and multiple student neurons, providing theoretical insights into convergence speed.
Contribution
It offers a theoretical explanation for over-parametrization-induced acceleration by analyzing a single teacher neuron model with multiple students, highlighting the role of over-parametrization in faster convergence.
Findings
Over-parametrization helps gradient descent reach near-global optima faster.
Scaling of output neurons influences convergence time.
Theoretical proof of acceleration in a simplified neuron learning setting.
Abstract
Over-parametrization has become a popular technique in deep learning. It is observed that by over-parametrization, a larger neural network needs a fewer training iterations than a smaller one to achieve a certain level of performance -- namely, over-parametrization leads to acceleration in optimization. However, despite that over-parametrization is widely used nowadays, little theory is available to explain the acceleration due to over-parametrization. In this paper, we propose understanding it by studying a simple problem first. Specifically, we consider the setting that there is a single teacher neuron with quadratic activation, where over-parametrization is realized by having multiple student neurons learn the data generated from the teacher neuron. We provably show that over-parametrization helps the iterate generated by gradient descent to enter the neighborhood of a global optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced X-ray Imaging Techniques · Advanced Neural Network Applications
