TL;DR
This paper introduces simple, space-efficient MapReduce and streaming algorithms for $k$-center clustering with outliers, achieving near-optimal solutions with high scalability and improved quality on large datasets.
Contribution
The paper presents the first coreset-based 2-round MapReduce and 1-pass streaming algorithms for $k$-center clustering with outliers that are nearly as accurate as sequential algorithms.
Findings
Algorithms achieve approximation ratios within $ ext{epsilon}$ of optimal.
Experiments show better solution quality and scalability on large datasets.
Algorithms are faster and more space-efficient, especially for small doubling dimension D.
Abstract
Center-based clustering is a fundamental primitive for data analysis and becomes very challenging for large datasets. In this paper, we focus on the popular -center variant which, given a set of points from some metric space and a parameter , requires to identify a subset of centers in minimizing the maximum distance of any point of from its closest center. A more general formulation, introduced to deal with noisy datasets, features a further parameter and allows up to points of (outliers) to be disregarded when computing the maximum distance from the centers. We present coreset-based 2-round MapReduce algorithms for the above two formulations of the problem, and a 1-pass Streaming algorithm for the case with outliers. For any fixed , the algorithms yield solutions whose approximation ratios are a mere additive term away from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
