# Evolutionary Dataset Optimisation: learning algorithm quality through   evolution

**Authors:** Henry Wilde, Vincent Knight, Jonathan Gillard

arXiv: 1907.13508 · 2019-11-01

## TL;DR

This paper introduces Evolutionary Dataset Optimisation, a novel method that uses genetic algorithms to generate datasets for analyzing and understanding algorithm performance, demonstrated through clustering case studies.

## Contribution

It presents a new approach to evaluate algorithms by evolving datasets that highlight their strengths and weaknesses, moving beyond fixed benchmark comparisons.

## Key findings

- Generated datasets reflect known properties of clustering algorithms.
- The method reveals attributes that influence algorithm performance.
- Demonstrates the approach with (k)-means and DBSCAN clustering algorithms.

## Abstract

In this paper we propose a novel method for learning how algorithms perform. Classically, algorithms are compared on a finite number of existing (or newly simulated) benchmark datasets based on some fixed metrics. The algorithm(s) with the smallest value of this metric are chosen to be the `best performing'. We offer a new approach to flip this paradigm. We instead aim to gain a richer picture of the performance of an algorithm by generating artificial data through genetic evolution, the purpose of which is to create populations of datasets for which a particular algorithm performs well on a given metric. These datasets can be studied so as to learn what attributes lead to a particular progression of a given algorithm. Following a detailed description of the algorithm as well as a brief description of an open source implementation, a case study in clustering is presented. This case study demonstrates the performance and nuances of the method which we call Evolutionary Dataset Optimisation. In this study, a number of known properties about preferable datasets for the clustering algorithms known as (k)-means and DBSCAN are realised in the generated datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.13508/full.md

## Figures

25 figures with captions in the complete paper: https://tomesphere.com/paper/1907.13508/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1907.13508/full.md

---
Source: https://tomesphere.com/paper/1907.13508