Rethinking generalization of classifiers in separable classes scenarios and over-parameterized regimes
Julius Martinetz, Christoph Linse, Thomas Martinetz

TL;DR
This paper analyzes how classifiers generalize in separable and over-parameterized regimes, showing that most global minima generalize well and providing bounds and models that match experimental results.
Contribution
It introduces a theoretical framework that explains generalization in over-parameterized models based on error density distribution, supported by empirical validation.
Findings
Proportion of 'bad' minima decreases exponentially with data size.
Bounds depend only on true error distribution, not model complexity.
Model predictions align with experiments on MNIST and CIFAR-10.
Abstract
We investigate the learning dynamics of classifiers in scenarios where classes are separable or classifiers are over-parameterized. In both cases, Empirical Risk Minimization (ERM) results in zero training error. However, there are many global minima with a training error of zero, some of which generalize well and some of which do not. We show that in separable classes scenarios the proportion of "bad" global minima diminishes exponentially with the number of training data n. Our analysis provides bounds and learning curves dependent solely on the density distribution of the true error for the given classifier function set, irrespective of the set's size or complexity (e.g., number of parameters). This observation may shed light on the unexpectedly good generalization of over-parameterized Neural Networks. For the over-parameterized scenario, we propose a model for the density…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
