Approximation and Estimation for High-Dimensional Deep Learning Networks
Andrew R. Barron, Jason M. Klusowski

TL;DR
This paper provides a theoretical analysis of deep neural networks with ramp activation functions, showing their statistical risk can be bounded independently of the input dimension, explaining their surprising generalization ability.
Contribution
It introduces a risk bound for deep networks with $ ext{l}^1$ controls, demonstrating their effectiveness even in high-dimensional settings, and develops a sampling strategy for sparse covering.
Findings
Risk bound of $[(L^3 ext{log} d)/n]^{1/2}$ for deep ramp networks
Input dimension can be much larger than sample size with accurate estimation
Lower bounds indicate the risk bound is nearly optimal
Abstract
It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations. Is there a theoretical basis for this? The best available bounds on their metric entropy and associated complexity measures are essentially linear in the number of parameters, which is inadequate to explain this phenomenon. Here we examine the statistical risk (mean squared predictive error) of multi-layer networks with -type controls on their parameters and with ramp activation functions (also called lower-rectified linear units). In this setting, the risk is shown to be upper bounded by , where is the input dimension to each layer, is the number of layers, and is the sample size. In this way, the input dimension can be much larger than the sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
