Good Classifiers are Abundant in the Interpolating Regime

Ryan Theisen; Jason M. Klusowski; Michael W. Mahoney

arXiv:2006.12625·stat.ML·March 5, 2021

Good Classifiers are Abundant in the Interpolating Regime

Ryan Theisen, Jason M. Klusowski, Michael W. Mahoney

PDF

Open Access

TL;DR

This paper introduces a statistical mechanics-inspired method to analyze the distribution of test errors among interpolating classifiers, revealing that most classifiers perform well and

Contribution

It develops a new methodology to compute the full distribution of test errors for interpolating classifiers, challenging traditional uniform convergence bounds.

Findings

01

Test errors concentrate around a small typical value $oldsymbol{ ext{ε}^*}$.

02

Bad classifiers are extremely rare among interpolating models.

03

The distribution of test errors can be characterized analytically in simple settings.

Abstract

Within the machine learning community, the widely-used uniform convergence framework has been used to answer the question of how complex, over-parameterized models can generalize well to new data. This approach bounds the test error of the worst-case model one could have fit to the data, but it has fundamental limitations. Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes. We apply our method to compute this distribution for several real and synthetic datasets, with both linear and random feature classification models. We find that test errors tend to concentrate around a small typical value $ε^{*}$ , which deviates substantially from the test error of the worst-case interpolating model on the same datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHousing Market and Economics · Italy: Economic History and Contemporary Issues · Imbalanced Data Classification Techniques