A framework for benchmarking clustering algorithms
Marek Gagolewski

TL;DR
This paper introduces a comprehensive framework for benchmarking clustering algorithms, standardizing datasets and methodologies to enable consistent and fair evaluation across diverse clustering problems.
Contribution
It develops a unified framework with standardized datasets, an interactive explorer, and multi-language API support to improve clustering algorithm benchmarking.
Findings
Standardized and aggregated benchmark datasets.
Interactive dataset explorer and Python API.
Support for multiple programming languages.
Abstract
The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
