# Recombinator-k-means: An evolutionary algorithm that exploits k-means++   for recombination

**Authors:** Carlo Baldassi

arXiv: 1905.00531 · 2022-02-10

## TL;DR

This paper presents recombinator-k-means, an evolutionary algorithm that uses a novel population-wide recombination strategy based on k-means++ to improve clustering solutions, especially on challenging datasets.

## Contribution

It introduces a new population-based recombination method for k-means optimization that outperforms traditional genetic algorithms in escaping local minima.

## Key findings

- Recombinator-k-means generally achieves better optimization results than state-of-the-art methods.
- The new algorithm is more effective at escaping local minima on difficult datasets.
- It is simpler and more adaptable to other clustering problems like k-medians or k-medoids.

## Abstract

We introduce an evolutionary algorithm called recombinator-$k$-means for optimizing the highly non-convex kmeans problem. Its defining feature is that its crossover step involves all the members of the current generation, stochastically recombining them with a repurposed variant of the $k$-means++ seeding algorithm. The recombination also uses a reweighting mechanism that realizes a progressively sharper stochastic selection policy and ensures that the population eventually coalesces into a single solution. We compare this scheme with state-of-the-art alternative, a more standard genetic algorithm with deterministic pairwise-nearest-neighbor crossover and an elitist selection policy, of which we also provide an augmented and efficient implementation. Extensive tests on large and challenging datasets (both synthetic and real-word) show that for fixed population sizes recombinator-$k$-means is generally superior in terms of the optimization objective, at the cost of a more expensive crossover step. When adjusting the population sizes of the two algorithms to match their running times, we find that for short times the (augmented) pairwise-nearest-neighbor method is always superior, while at longer times recombinator-$k$-means will match it and, on the most difficult examples, take over. We conclude that the reweighted whole-population recombination is more costly, but generally better at escaping local minima. Moreover, it is algorithmically simpler and more general (it could be applied even to $k$-medians or $k$-medoids, for example). Our implementations are publicly available at \href{https://github.com/carlobaldassi/RecombinatorKMeans.jl}{https://github.com/carlobaldassi/RecombinatorKMeans.jl}.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00531/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00531/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1905.00531/full.md

---
Source: https://tomesphere.com/paper/1905.00531