Clustering - What Both Theoreticians and Practitioners are Doing Wrong

Shai Ben-David

arXiv:1805.08838·cs.LG·May 24, 2018

Clustering - What Both Theoreticians and Practitioners are Doing Wrong

Shai Ben-David

PDF

TL;DR

This paper highlights the critical importance of model selection in clustering, criticizing current practices and theories for lacking guidance, which leads to inconsistent results and suboptimal clustering solutions.

Contribution

It emphasizes the need for systematic methods for clustering tool selection and discusses recent proposals to address this gap.

Findings

01

Model selection is the most significant challenge in clustering.

02

Practitioners often choose algorithms without understanding implications.

03

Current theory focuses more on optimization than on practical suitability.

Abstract

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowa- days. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. I claim that the most signif- icant challenge for clustering is model selection. In contrast with other common computational tasks, for clustering, dif- ferent algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm, and their pa- rameters (like the number of clusters) may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool-selection for a given clustering task. Practitioners pick the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.