Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model: A new take on Robust Interpolation
Stanislav Minsker, Mohamed Ndaoud, Yiqiu Shen

TL;DR
This paper analyzes supervised clustering in high-dimensional anisotropic Gaussian mixtures, showing that interpolating classifiers can outperform regularized methods and be robust to noise covariance corruption, challenging traditional views.
Contribution
It provides the first minimax risk bounds for supervised clustering in anisotropic Gaussian mixtures and demonstrates the optimality and robustness of interpolation in this setting.
Findings
Interpolating classifiers can outperform regularized classifiers in high dimensions.
Interpolation can be robust to covariance noise corruption under certain alignments.
LDA is sub-optimal in the high-dimensional minimax sense for this problem.
Abstract
We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model in high dimensions and in the non-asymptotic setting. We first derive a lower and a matching upper bound for the minimax risk of clustering in this framework. We also show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense. Next, we characterize precisely the risk of -regularized supervised least squares classifiers. We deduce the fact that the interpolating solution may outperform the regularized classifier, under mild assumptions on the covariance structure of the noise. Our analysis also shows that interpolation can be robust to corruption in the covariance of the noise when the signal is aligned with the "clean" part of the covariance, for the properly defined notion of alignment. To the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference
