Overfitting has a limitation: a model-independent generalization gap bound based on R\'enyi entropy
Atsushi Suzuki, Jing Wang

TL;DR
This paper presents a model-independent upper bound on the generalization gap based on Rényi entropy, explaining why large models can generalize well if data quantity matches the data distribution's entropy.
Contribution
It introduces a novel Rényi entropy-based generalization gap bound applicable to data histogram-dependent algorithms, providing insights into overfitting and data noise effects.
Findings
The bound depends solely on the data distribution's Rényi entropy.
Large models can generalize well if data size exceeds the entropy.
Adding noise increases Rényi entropy, degrading generalization.
Abstract
Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization gap, which is the impact of overfitting. Understanding generalization gap behavior of increasingly large-scale machine learning models remains a significant area of investigation, as conventional analyses often link error bounds to model complexity, failing to fully explain the success of extremely large architectures. This research introduces a novel perspective by establishing a model-independent upper bound for generalization gap applicable to algorithms whose outputs are determined solely by the data's histogram, such as empirical risk minimization or gradient-based methods. Crucially, this bound is shown to depend only on the R\'enyi entropy of the data-generating distribution, suggesting that a small generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification
