Data spectroscopy: Eigenspaces of convolution operators and clustering
Tao Shi, Mikhail Belkin, Bin Yu

TL;DR
This paper introduces a theoretical framework and a new clustering algorithm, DaSpec, based on eigenvectors of data adjacency matrices, improving cluster detection especially in unbalanced and complex-shaped data.
Contribution
The paper develops population analyses for eigenvector selection in spectral clustering and introduces DaSpec, a novel algorithm that automatically determines the number of clusters and improves clustering accuracy.
Findings
DaSpec effectively handles unbalanced groups.
It recovers clusters of various shapes better than existing methods.
Theoretical insights explain when and why spectral methods succeed or fail.
Abstract
This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
