General bound of overfitting for MLP regression models

Joseph Rynkiewicz (SAMM)

arXiv:1201.0633·math.ST·May 10, 2012·Neurocomputing

General bound of overfitting for MLP regression models

Joseph Rynkiewicz (SAMM)

PDF

Open Access

TL;DR

This paper establishes a universal bound on overfitting for MLP regression models that does not rely on Gaussian noise assumptions, aiding in determining the true model architecture as data size grows.

Contribution

It introduces a new, assumption-free theoretical bound on overfitting for MLPs, applicable in non-Gaussian noise scenarios, and proposes criteria for selecting the true model architecture.

Findings

01

Derived a universal overfitting bound for MLPs

02

Provided criteria for true architecture selection

03

Validated the bound through theoretical analysis

Abstract

Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite. As an illustration, we use this theoretical result to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Blind Source Separation Techniques · Statistical Methods and Inference