Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis
Andrew Lensen, Bing Xue, Mengjie Zhang

TL;DR
This paper introduces a genetic programming approach to automatically evolve dataset-specific similarity functions for clustering, improving performance and interpretability over traditional fixed metrics.
Contribution
The paper presents a novel genetic programming method for automatically creating tailored similarity functions for clustering, including feature selection and construction, with demonstrated performance gains.
Findings
Evolved similarity functions outperform benchmark methods.
Multi-tree approach enhances clustering performance.
Analyzed similarity functions offer interpretability insights.
Abstract
Clustering is a difficult and widely-studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g. Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally pre-defined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this paper, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
