HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis

Cameron Shand; Richard Allmendinger; Julia Handl; Andrew Webb; and; John Keane

arXiv:2102.06940·cs.NE·January 11, 2022

HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis

Cameron Shand, Richard Allmendinger, Julia Handl, Andrew Webb, and, John Keane

PDF

2 Repos

TL;DR

This paper introduces HAWKS, a framework using evolutionary algorithms to generate diverse, challenging synthetic datasets for benchmarking clustering algorithms, addressing the lack of standardized evaluation methods.

Contribution

The paper presents HAWKS, a novel framework that evolves synthetic datasets with specific properties to improve benchmarking and comparison of clustering algorithms.

Findings

01

Supports flexible generation of benchmark datasets

02

Enables evolution of data with specific properties

03

Assists in identifying performance differences between algorithms

Abstract

Comprehensive benchmarking of clustering algorithms is rendered difficult by two key factors: (i)~the elusiveness of a unique mathematical definition of this unsupervised learning approach and (ii)~dependencies between the generating models or clustering criteria adopted by some clustering algorithms and indices for internal cluster validation. Consequently, there is no consensus regarding the best practice for rigorous benchmarking, and whether this is possible at all outside the context of a given application. Here, we argue that synthetic datasets must continue to play an important role in the evaluation of clustering algorithms, but that this necessitates constructing benchmarks that appropriately cover the diverse set of properties that impact clustering algorithm performance. Through our framework, HAWKS, we demonstrate the important role evolutionary algorithms play to support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.