Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering
Konstantin Makarychev, Yury Makarychev, Ilya Razenshteyn

TL;DR
This paper proves that Johnson-Lindenstrauss projections approximately preserve the cost of optimal and all clusterings in Euclidean k-means and k-medians, with nearly optimal dimension bounds, resolving open problems in the field.
Contribution
It establishes nearly optimal dimension bounds for Johnson-Lindenstrauss transforms preserving clustering costs, solving open problems for k-means and k-medians.
Findings
Optimal cost preservation within (1+ε) for k-means and k-medians.
Dimension bound of O(log(k/ε)/ε^2) is nearly optimal.
Results extend to p-th power Euclidean distances for any constant p.
Abstract
Consider an instance of Euclidean -means or -medians clustering. We show that the cost of the optimal solution is preserved up to a factor of under a projection onto a random -dimensional subspace. Further, the cost of every clustering is preserved within . More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean -clustering with the distances raised to the -th power for any constant . For -means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for -medians, it answers a question raised by Kannan.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutomated Road and Building Extraction · Data Management and Algorithms · Face and Expression Recognition
