Dimension Independent Generalization Error by Stochastic Gradient   Descent

Xi Chen; Qiang Liu; Xin T. Tong

arXiv:2003.11196·stat.ML·January 5, 2021·1 cites

Dimension Independent Generalization Error by Stochastic Gradient Descent

Xi Chen, Qiang Liu, Xin T. Tong

PDF

Open Access

TL;DR

This paper develops a theory showing that stochastic gradient descent solutions can generalize well in high-dimensional settings, especially when data and models have low effective dimension, explaining the success of overparameterized neural networks.

Contribution

It introduces a general framework for understanding the generalization error of SGD in high-dimensional models, highlighting conditions for low effective dimension and benign overfitting.

Findings

01

Generalization error can be independent of ambient dimension under certain conditions.

02

Low effective dimension naturally occurs in overparameterized models like neural networks.

03

The theory applies to both convex and non-convex models, including linear and neural network models.

Abstract

One classical canon of statistics is that large models are prone to overfitting, and model selection procedures are necessary for high dimensional data. However, many overparameterized models, such as neural networks, perform very well in practice, although they are often trained with simple online methods and regularization. The empirical success of overparameterized models, which is often known as benign overfitting, motivates us to have a new look at the statistical generalization theory for online optimization. In particular, we present a general theory on the generalization error of stochastic gradient descent (SGD) solutions for both convex and locally convex loss functions. We further discuss data and model conditions that lead to a ``low effective dimension". Under these conditions, we show that the generalization error either does not depend on the ambient dimension $p$ or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms

MethodsLogistic Regression · Linear Regression