Clustering - What Both Theoreticians and Practitioners are Doing Wrong
Shai Ben-David

TL;DR
This paper highlights the critical importance of model selection in clustering, criticizing current practices and theories for lacking guidance, which leads to inconsistent results and suboptimal clustering solutions.
Contribution
It emphasizes the need for systematic methods for clustering tool selection and discusses recent proposals to address this gap.
Findings
Model selection is the most significant challenge in clustering.
Practitioners often choose algorithms without understanding implications.
Current theory focuses more on optimization than on practical suitability.
Abstract
Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowa- days. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. I claim that the most signif- icant challenge for clustering is model selection. In contrast with other common computational tasks, for clustering, dif- ferent algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm, and their pa- rameters (like the number of clusters) may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool-selection for a given clustering task. Practitioners pick the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
