Clustering with Spectral Norm and the k-means Algorithm
Amit Kumar, Ravindran Kannan

TL;DR
This paper introduces a simple, model-agnostic clustering algorithm based on spectral norms and the k-means method, which effectively clusters data under a proximity condition without relying on probabilistic generative assumptions.
Contribution
The authors establish a proximity condition that guarantees clustering success with k-means without generative assumptions, and prove convergence of k-means with spurious points.
Findings
The proximity condition is satisfied in common generative models.
The k-means algorithm converges to true centers despite spurious points.
New techniques for enhancing inter-center separation to standard deviation ratio.
Abstract
There has been much progress on efficient algorithms for clustering data points generated by a mixture of probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between the means of any two distributions is at least standard deviations. These results generally make heavy use of the generative model and particular properties of the distributions. In this paper, we show that a simple clustering algorithm works without assuming any generative (probabilistic) model. Our only assumption is what we call a "proximity condition": the projection of any data point onto the line joining its cluster center to any other cluster center is standard deviations closer to its own center than the other center. Here the notion of standard deviations is based on the spectral norm of the matrix whose rows represent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Bayesian Methods and Mixture Models · Advanced Clustering Algorithms Research
