Computational Feasibility of Clustering under Clusterability Assumptions
Shai Ben-David

TL;DR
This paper surveys recent research on the computational feasibility of clustering under clusterability assumptions, critically evaluating whether such assumptions can explain the practical efficiency of clustering algorithms.
Contribution
It provides a comprehensive review and critical analysis of recent work on clusterability notions, assessing their potential to justify efficient clustering in practice.
Findings
The CDNM thesis remains unproven and lacks formal support.
Existing notions of clusterability often do not meet necessary formal requirements.
Open research challenges include defining stronger clusterability conditions.
Abstract
It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such 'clusterable' instances. In other words, hope that "Clustering is difficult only when it does not matter" (CDNM thesis, for short). We believe that to some extent this may indeed be the case. This paper provides a survey of recent papers along this line of research and a critical evaluation their results. Our bottom line conclusion is that that CDNM thesis is still far from being formally substantiated. We start by discussing which requirements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Clustering Algorithms Research · Advanced Database Systems and Queries
