Provable Imbalanced Point Clustering

David Denisov; Dan Feldman; Shlomi Dolev; and Michael Segal

arXiv:2408.14225·cs.LG·March 13, 2025

Provable Imbalanced Point Clustering

David Denisov, Dan Feldman, Shlomi Dolev, and Michael Segal

PDF

Open Access

TL;DR

This paper introduces efficient, provable methods for imbalanced point clustering using coresets, providing theoretical guarantees and empirical validation across various datasets.

Contribution

It presents novel coreset-based algorithms for imbalanced clustering with provable approximation guarantees and demonstrates their effectiveness through experiments.

Findings

01

Effective clustering on real and synthetic data

02

Coreset methods achieve approximation guarantees

03

Choice clustering improves performance

Abstract

We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$ -centers to a set of points in $R^{d}$ , for any $d, k \geq 1$ . To this end, we utilize \emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $R^{d}$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1 \pm ε$ . We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training