Linear time small coresets for k-mean clustering of segments with applications
David Denisov, Shlomi Dolev, Dan Felmdan, Michael Segal

TL;DR
This paper introduces a novel linear-time algorithm for constructing small coresets for k-means clustering of segments, enabling efficient and accurate clustering in streaming and distributed settings with practical applications.
Contribution
It presents the first coreset construction for arbitrary segments in k-means clustering, achieving size $O( ext{log}^2 n)$ in linear time for fixed parameters.
Findings
Coresets enable fast, approximate clustering with minimal accuracy loss.
Method scales efficiently to large datasets and real-time applications.
Experimental results show significant speedups with maintained accuracy.
Abstract
We study the -means problem for a set of segments, aiming to find centers that minimize , where measures the total distance from each point along a segment to a center. Variants of this problem include handling outliers, employing alternative distance functions such as M-estimators, weighting distances to achieve balanced clustering, or enforcing unique cluster assignments. For any , an -coreset is a weighted subset that approximates within a factor of for any set of centers, enabling efficient streaming, distributed, or parallel computation. We propose the first coreset construction that provably handles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Face and Expression Recognition · Sparse and Compressive Sensing Techniques
