An Investigation into Distance Measures in Cluster Analysis

Zoe Shapcott

arXiv:2404.13664·stat.OT·April 23, 2024

An Investigation into Distance Measures in Cluster Analysis

Zoe Shapcott

PDF

Open Access

TL;DR

This paper explores various distance measures for the K-means clustering algorithm, comparing their effectiveness on simulated and real datasets, including an analysis of the Mahalanobis distance versus traditional metrics.

Contribution

It provides a comparative analysis of distance measures in K-means clustering, including the application of Mahalanobis distance and evaluation of their performance on different datasets.

Findings

01

Mahalanobis distance can offer benefits over traditional measures in certain cases

02

Different distance measures impact cluster quality and interpretability

03

Analysis includes the use of ChatGPT for supplementary insights

Abstract

This report provides an exploration of different distance measures that can be used with the $K$ -means algorithm for cluster analysis. Specifically, we investigate the Mahalanobis distance, and critically assess any benefits it may have over the more traditional measures of the Euclidean, Manhattan and Maximum distances. We perform this by first defining the metrics, before considering their advantages and drawbacks as discussed in literature regarding this area. We apply these distances, first to some simulated data and then to subsets of the Dry Bean dataset [1], to explore if there is a better quality detectable for one metric over the others in these cases. One of the sections is devoted to analysing the information obtained from ChatGPT in response to prompts relating to this topic.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research