The geometry of kernelized spectral clustering
Geoffrey Schiebinger, Martin J. Wainwright, Bin Yu

TL;DR
This paper analyzes the geometric properties of kernelized spectral clustering, showing how it effectively recovers latent labels in mixture models by leveraging eigenspaces related to component densities.
Contribution
It provides a theoretical framework linking the geometry of spectral clustering to the separation of mixture components in nonparametric settings.
Findings
Principal eigenspace approximates component densities when overlap is small.
Embedded samples from different components are nearly orthogonal with large samples.
Spectral clustering can control mislabeling in finite mixture models.
Abstract
Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSpectral Clustering
