Approximation and Estimation for High-Dimensional Deep Learning Networks

Andrew R. Barron; Jason M. Klusowski

arXiv:1809.03090·stat.ML·September 19, 2018·42 cites

Approximation and Estimation for High-Dimensional Deep Learning Networks

Andrew R. Barron, Jason M. Klusowski

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of deep neural networks with ramp activation functions, showing their statistical risk can be bounded independently of the input dimension, explaining their surprising generalization ability.

Contribution

It introduces a risk bound for deep networks with $ ext{l}^1$ controls, demonstrating their effectiveness even in high-dimensional settings, and develops a sampling strategy for sparse covering.

Findings

01

Risk bound of $[(L^3 ext{log} d)/n]^{1/2}$ for deep ramp networks

02

Input dimension can be much larger than sample size with accurate estimation

03

Lower bounds indicate the risk bound is nearly optimal

Abstract

It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations. Is there a theoretical basis for this? The best available bounds on their metric entropy and associated complexity measures are essentially linear in the number of parameters, which is inadequate to explain this phenomenon. Here we examine the statistical risk (mean squared predictive error) of multi-layer networks with $ℓ^{1}$ -type controls on their parameters and with ramp activation functions (also called lower-rectified linear units). In this setting, the risk is shown to be upper bounded by $[(L^{3} lo g d) / n]^{1/2}$ , where $d$ is the input dimension to each layer, $L$ is the number of layers, and $n$ is the sample size. In this way, the input dimension can be much larger than the sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning