A framework for benchmarking clustering algorithms

Marek Gagolewski

arXiv:2209.09493·cs.LG·October 27, 2023

A framework for benchmarking clustering algorithms

Marek Gagolewski

PDF

Open Access 4 Repos

TL;DR

This paper introduces a comprehensive framework for benchmarking clustering algorithms, standardizing datasets and methodologies to enable consistent and fair evaluation across diverse clustering problems.

Contribution

It develops a unified framework with standardized datasets, an interactive explorer, and multi-language API support to improve clustering algorithm benchmarking.

Findings

01

Standardized and aggregated benchmark datasets.

02

Interactive dataset explorer and Python API.

03

Support for multiple programming languages.

Abstract

The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, the fact that there can be many equally valid ways to cluster a given problem set is rarely taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark dataset collections referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research