Self-Regularity of Non-Negative Output Weights for Overparameterized Two-Layer Neural Networks
David Gamarnik, Eren C. K{\i}z{\i}lda\u{g}, and Ilias Zadik

TL;DR
This paper proves that for overparameterized two-layer neural networks with non-negative output weights, a small training error implies a controlled outer norm, leading to strong generalization guarantees under mild data assumptions.
Contribution
The paper establishes that low training error guarantees a bounded outer norm for non-negative output weights, independent of hidden units, with polynomial sample complexity and mild data assumptions.
Findings
Small training error implies controlled outer norm.
Generalization bounds are polynomial in input dimension.
Results are independent of the number of hidden units.
Abstract
We consider the problem of finding a two-layer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of non-negative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a well-controlled outer norm. Notably, our results (a) have a polynomial (in ) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Graph Neural Networks
