Genetic Programming for Evolving Similarity Functions for Clustering:   Representations and Analysis

Andrew Lensen; Bing Xue; Mengjie Zhang

arXiv:1910.10264·cs.NE·October 24, 2019

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Andrew Lensen, Bing Xue, Mengjie Zhang

PDF

TL;DR

This paper introduces a genetic programming approach to automatically evolve dataset-specific similarity functions for clustering, improving performance and interpretability over traditional fixed metrics.

Contribution

The paper presents a novel genetic programming method for automatically creating tailored similarity functions for clustering, including feature selection and construction, with demonstrated performance gains.

Findings

01

Evolved similarity functions outperform benchmark methods.

02

Multi-tree approach enhances clustering performance.

03

Analyzed similarity functions offer interpretability insights.

Abstract

Clustering is a difficult and widely-studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g. Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally pre-defined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this paper, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability