On Generalization of Adaptive Methods for Over-parameterized Linear   Regression

Vatsal Shah; Soumya Basu; Anastasios Kyrillidis; Sujay Sanghavi

arXiv:2011.14066·stat.ML·December 1, 2020

On Generalization of Adaptive Methods for Over-parameterized Linear Regression

Vatsal Shah, Soumya Basu, Anastasios Kyrillidis, Sujay Sanghavi

PDF

Open Access

TL;DR

This paper investigates how adaptive optimization methods perform in over-parameterized linear regression, revealing two classes with distinct generalization behaviors and supporting findings through experiments on linear models and neural networks.

Contribution

It characterizes the generalization properties of adaptive methods in over-parameterized linear regression, distinguishing two classes based on their convergence and saturation behaviors.

Findings

01

Adaptive methods either stay in data span and converge to minimum norm solution or have out-of-span saturation.

02

Experiments confirm theoretical distinctions in linear regression and neural networks.

03

Adaptive methods' behavior depends on their effect on parameter span and convergence properties.

Abstract

Over-parameterization and adaptive methods have played a crucial role in the success of deep learning in the last decade. The widespread use of over-parameterization has forced us to rethink generalization by bringing forth new phenomena, such as implicit regularization of optimization algorithms and double descent with training progression. A series of recent works have started to shed light on these areas in the quest to understand -- why do neural networks generalize well? The setting of over-parameterized linear regression has provided key insights into understanding this mysterious behavior of neural networks. In this paper, we aim to characterize the performance of adaptive methods in the over-parameterized linear regression setting. First, we focus on two sub-classes of adaptive methods depending on their generalization performance. For the first class of adaptive methods, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition

MethodsLinear Regression