Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering
Dan Feldman, Melanie Schmidt, Christian Sohler

TL;DR
This paper introduces a method to create small, dimension-independent coresets for high-dimensional data, enabling efficient approximate analysis for tasks like k-means and PCA, suitable for streaming and distributed settings.
Contribution
The authors develop a new coreset construction that is independent of data dimension and size, and supports merging, facilitating scalable and distributed data analysis.
Findings
Coresets are independent of data dimension and size.
The method supports streaming and distributed algorithms.
Applicable to k-means, PCA, and subspace clustering.
Abstract
We develop and analyze a method to reduce the size of a very large set of data points in a high dimensional Euclidean space R d to a small set of weighted points such that the result of a predetermined data analysis task on the reduced set is approximately the same as that for the original point set. For example, computing the first k principal components of the reduced set will return approximately the first k principal components of the original set or computing the centers of a k-means clustering on the reduced set will return an approximation for the original set. Such a reduced set is also known as a coreset. The main new feature of our construction is that the cardinality of the reduced set is independent of the dimension d of the input space and that the sets are mergable. The latter property means that the union of two reduced sets is a reduced set for the union of the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Sparse and Compressive Sensing Techniques
