Spectral Clustering on Large Datasets: When Does it Work? Theory from Continuous Clustering and Density Cheeger-Buser
Timothy Chu, Gary Miller, Noel Walkington

TL;DR
This paper provides a theoretical analysis of spectral clustering on large datasets drawn from probability densities, identifying conditions under which it effectively finds true clusters and introducing a new Cheeger-Buser inequality for densities.
Contribution
It introduces a continuous form of spectral clustering analysis, proving when it aligns with true density clusters and establishing a new Cheeger-Buser inequality applicable to all probability densities.
Findings
Spectral clustering works well on mixtures of Laplace distributions.
It performs poorly on certain densities like the 'square-root trough'.
A new Cheeger-Buser inequality for all probability densities is established.
Abstract
Spectral clustering is one of the most popular clustering algorithms that has stood the test of time. It is simple to describe, can be implemented using standard linear algebra, and often finds better clusters than traditional clustering algorithms like -means and -centers. The foundational algorithm for two-way spectral clustering, by Shi and Malik, creates a geometric graph from data and finds a spectral cut of the graph. In modern machine learning, many data sets are modeled as a large number of points drawn from a probability density function. Little is known about when spectral clustering works in this setting -- and when it doesn't. Past researchers justified spectral clustering by appealing to the graph Cheeger inequality (which states that the spectral cut of a graph approximates the ``Normalized Cut''), but this justification is known to break down on large data sets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Face and Expression Recognition
MethodsTest · Spectral Clustering
