Statistical Guarantees for Regularized Neural Networks

Mahsa Taheri; Fang Xie; Johannes Lederer

arXiv:2006.00294·cs.LG·November 12, 2020

Statistical Guarantees for Regularized Neural Networks

Mahsa Taheri, Fang Xie, Johannes Lederer

PDF

TL;DR

This paper provides the first comprehensive statistical guarantees for regularized neural network estimators, showing their prediction error remains controlled with increasing network complexity, thus strengthening the theoretical foundation of deep learning.

Contribution

The paper develops a general statistical guarantee for regularized estimators, specifically exemplified with $ ext{l}_1$-regularization in neural networks, linking error bounds to network depth and size.

Findings

01

Prediction error grows sub-linearly with the number of layers.

02

Error increases logarithmically with the total number of parameters.

03

Provides a mathematical basis for regularized neural network estimation.

Abstract

Neural networks have become standard tools in the analysis of data, but they lack comprehensive mathematical theories. For example, there are very few statistical guarantees for learning neural networks from data, especially for classes of estimators that are used in practice or at least similar to such. In this paper, we develop a general statistical guarantee for estimators that consist of a least-squares term and a regularizer. We then exemplify this guarantee with $ℓ_{1}$ -regularization, showing that the corresponding prediction error increases at most sub-linearly in the number of layers and at most logarithmically in the total number of parameters. Our results establish a mathematical basis for regularized estimation of neural networks, and they deepen our mathematical understanding of neural networks and deep learning more generally.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.