TL;DR
This paper introduces HAWKS, a framework using evolutionary algorithms to generate diverse, challenging synthetic datasets for benchmarking clustering algorithms, addressing the lack of standardized evaluation methods.
Contribution
The paper presents HAWKS, a novel framework that evolves synthetic datasets with specific properties to improve benchmarking and comparison of clustering algorithms.
Findings
Supports flexible generation of benchmark datasets
Enables evolution of data with specific properties
Assists in identifying performance differences between algorithms
Abstract
Comprehensive benchmarking of clustering algorithms is rendered difficult by two key factors: (i)~the elusiveness of a unique mathematical definition of this unsupervised learning approach and (ii)~dependencies between the generating models or clustering criteria adopted by some clustering algorithms and indices for internal cluster validation. Consequently, there is no consensus regarding the best practice for rigorous benchmarking, and whether this is possible at all outside the context of a given application. Here, we argue that synthetic datasets must continue to play an important role in the evaluation of clustering algorithms, but that this necessitates constructing benchmarks that appropriately cover the diverse set of properties that impact clustering algorithm performance. Through our framework, HAWKS, we demonstrate the important role evolutionary algorithms play to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
