On Generalization of Adaptive Methods for Over-parameterized Linear Regression
Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

TL;DR
This paper investigates how adaptive optimization methods perform in over-parameterized linear regression, revealing two classes with distinct generalization behaviors and supporting findings through experiments on linear models and neural networks.
Contribution
It characterizes the generalization properties of adaptive methods in over-parameterized linear regression, distinguishing two classes based on their convergence and saturation behaviors.
Findings
Adaptive methods either stay in data span and converge to minimum norm solution or have out-of-span saturation.
Experiments confirm theoretical distinctions in linear regression and neural networks.
Adaptive methods' behavior depends on their effect on parameter span and convergence properties.
Abstract
Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on these areas in the quest to understand -- why do neural networks generalize well? The setting of over-parameterized linear regression has provided key insights into understanding this mysterious behavior of neural networks. In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting. First, we focus on two sub-classes of adaptive methods depending on their generalization performance. For the first class of adaptive methods, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
MethodsLinear Regression
