Self-Regularity of Non-Negative Output Weights for Overparameterized   Two-Layer Neural Networks

David Gamarnik; Eren C. K{\i}z{\i}lda\u{g}; and Ilias Zadik

arXiv:2103.01887·stat.ML·April 6, 2022

Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks

David Gamarnik, Eren C. K{\i}z{\i}lda\u{g}, and Ilias Zadik

PDF

Open Access

TL;DR

This paper proves that for overparameterized two-layer neural networks with non-negative output weights, a small training error implies a controlled outer norm, leading to strong generalization guarantees under mild data assumptions.

Contribution

The paper establishes that low training error guarantees a bounded outer norm for non-negative output weights, independent of hidden units, with polynomial sample complexity and mild data assumptions.

Findings

01

Small training error implies controlled outer norm.

02

Generalization bounds are polynomial in input dimension.

03

Results are independent of the number of hidden units.

Abstract

We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in $d$ ) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Graph Neural Networks