The role of regularization in classification of high-dimensional noisy   Gaussian mixture

Francesca Mignacco; Florent Krzakala; Yue M. Lu; Lenka Zdeborov\'a

arXiv:2002.11544·stat.ML·March 22, 2021·6 cites

The role of regularization in classification of high-dimensional noisy Gaussian mixture

Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborov\'a

PDF

Open Access 1 Video

TL;DR

This paper rigorously analyzes how regularized convex classifiers perform in high-dimensional noisy Gaussian mixture models, revealing effects like reaching Bayes-optimal performance and the interpolation peak.

Contribution

It provides a theoretical analysis of regularized classifiers' generalization error in high-dimensional Gaussian mixtures, highlighting surprising effects of regularization.

Findings

01

Regularization can enable Bayes-optimal performance in high-dimensional noisy settings.

02

Interpolation peak occurs at low regularization levels.

03

The size imbalance of clusters affects classifier performance.

Abstract

We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and their dimension $d$ go to infinity while their ratio is fixed to $α = n / d$ . We discuss surprising effects of the regularization that in some cases allows to reach the Bayes-optimal performances. We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Role of Regularization in Classification of High-dimensional Noisy Gaussian Mixture· slideslive

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Advanced Statistical Methods and Models