Kernel spectral clustering of large dimensional data
Romain Couillet, Florent Benaych-Georges

TL;DR
This paper analyzes kernel spectral clustering in high-dimensional settings, showing that the normalized Laplacian behaves like a spiked random matrix, with eigenvalues and eigenvectors revealing clustering information, validated on MNIST data.
Contribution
It provides a theoretical analysis of kernel spectral clustering in large dimensions, linking eigenstructure to clustering performance under Gaussian mixture models.
Findings
Eigenvalues and eigenvectors encode clustering information.
Theoretical predictions match MNIST clustering results.
Identifies conditions for effective spectral clustering in high dimensions.
Abstract
This article proposes a first analysis of kernel spectral clustering methods in the regime where the dimension of the data vectors to be clustered and their number grow large at the same rate. We demonstrate, under a -class Gaussian mixture model, that the normalized Laplacian matrix associated with the kernel matrix asymptotically behaves similar to a so-called spiked random matrix. Some of the isolated eigenvalue-eigenvector pairs in this model are shown to carry the clustering information upon a separability condition classical in spiked matrix models. We evaluate precisely the position of these eigenvalues and the content of the eigenvectors, which unveil important (sometimes quite disruptive) aspects of kernel spectral clustering both from a theoretical and practical standpoints. Our results are then compared to the actual clustering performance of images from the MNIST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
