Random Projections for k-Means: Maintaining Coresets Beyond Merge &   Reduce

Marc Bury; Chris Schwiegelshohn

arXiv:1504.01584·cs.DS·February 19, 2020

Random Projections for k-Means: Maintaining Coresets Beyond Merge & Reduce

Marc Bury, Chris Schwiegelshohn

PDF

Open Access

TL;DR

This paper introduces a new method for maintaining small, accurate coresets for k-means clustering in data streams using Johnson-Lindenstrauss embeddings, avoiding the traditional merge-and-reduce approach.

Contribution

It presents a novel coreset construction that is minimal in size and employs Johnson-Lindenstrauss embeddings to efficiently maintain coresets in streaming data without merge-and-reduce.

Findings

01

Coreset size is minimized to O(k \\epsilon^{-2} (d,k )) points.

02

Achieves streaming coreset maintenance with O(k^2 \\epsilon^{-2} n^2 ) points.

03

Avoids exponential dependence on dimension and reduces space complexity compared to previous methods.

Abstract

We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the $k$ -means objective function. The number of points required in an offline construction is in $\tilde{O} (k ϵ^{- 2} min (d, k ϵ^{- 2}))$ which is minimal among all available constructions. Aside from two constructions with exponential dependence on the dimension, all known coresets are maintained in data streams via the merge and reduce framework, which incurs are large space dependency on $lo g n$ . Instead, our construction crucially relies on Johnson-Lindenstrauss type embeddings which combined with results from online algorithms give us a new technique for efficiently maintaining coresets in data streams without relying on merge and reduce. The final number of points stored by our algorithm in a data stream is in $\tilde{O}(k^2 \epsilon^{-2} \log^2 n…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Machine Learning and Algorithms