A Nearly Optimal Size Coreset Algorithm with Nearly Linear Time
Yichuan Deng, Zhao Song, Yitan Wang, Yuanyuan Yang

TL;DR
This paper introduces a nearly optimal coreset construction algorithm for the $(k,z)$-clustering problem, achieving nearly linear time complexity while maintaining state-of-the-art coreset sizes, thus improving efficiency in clustering tasks.
Contribution
The paper presents a new sketching-based approach that significantly speeds up coreset construction for $(k,z)$-clustering without increasing the coreset size.
Findings
Achieves nearly linear time coreset construction.
Maintains state-of-the-art coreset sizes.
Applicable to generalized clustering problems.
Abstract
A coreset is a point set containing information about geometric properties of a larger point set. A series of previous works show that in many machine learning problems, especially in clustering problems, coreset could be very useful to build efficient algorithms. Two main measures of an coreset construction algorithm's performance are the running time of the algorithm and the size of the coreset output by the algorithm. In this paper we study the construction of coresets for the -clustering problem, which is a generalization of -means and -median problem. By properly designing a sketching-based distance estimation data structure, we propose faster algorithms that construct coresets with matching size of the state-of-the-art results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutomated Road and Building Extraction · Facility Location and Emergency Management · Remote-Sensing Image Classification
