Consistency for Large Neural Networks: Regression and Classification
Haoran Zhan, Yingcun Xia

TL;DR
This paper investigates the theoretical underpinnings of overparameterized neural networks, demonstrating their decreasing approximation error, bounded generalization error, and statistical consistency across tasks, explaining the double descent phenomenon.
Contribution
It provides a theoretical analysis of the tail behavior and consistency of deep overparameterized neural networks in regression and classification tasks.
Findings
Approximation error decreases monotonically with more parameters.
Regularization keeps the generalization error bounded.
Deep overparameterized networks are statistically consistent when regularized.
Abstract
Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that the test error curve of neural networks decreases monotonically as model size grows and eventually converges to a non-zero constant. This work aims to explain the theoretical mechanism underlying this tail behavior and study the statistical consistency of deep overparameterized neural networks in many different learning tasks including regression and classification. Firstly, we prove that as the number of parameters increases, the approximation error decreases monotonically, while explicit or implicit regularization (e.g., weight decay) keeps the generalization error existing but bounded. Consequently, the overall error curve eventually converges to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
