Cluster Metric Sensitivity to Irrelevant Features

Miles McCrory; Spencer A. Thomas

arXiv:2402.12008·cs.LG·February 20, 2024·2 cites

Cluster Metric Sensitivity to Irrelevant Features

Miles McCrory, Spencer A. Thomas

PDF

Open Access

TL;DR

This paper examines how irrelevant features affect clustering performance, revealing that some metrics are more sensitive than others and suggesting better metrics for feature selection in unsupervised learning.

Contribution

It provides an empirical analysis of the impact of irrelevant features on clustering metrics, highlighting the sensitivity differences among popular evaluation scores.

Findings

01

ARI and NMI are resilient to Gaussian irrelevant features.

02

Silhouette and Davies-Bouldin scores are highly sensitive to irrelevant features.

03

Irrelevant features can cause significant score fluctuations, especially in certain metrics.

Abstract

Clustering algorithms are used extensively in data analysis for data exploration and discovery. Technological advancements lead to continually growth of data in terms of volume, dimensionality and complexity. This provides great opportunities in data analytics as the data can be interrogated for many different purposes. This however leads challenges, such as identification of relevant features for a given task. In supervised tasks, one can utilise a number of methods to optimise the input features for the task objective (e.g. classification accuracy). In unsupervised problems, such tools are not readily available, in part due to an inability to quantify feature relevance in unlabeled tasks. In this paper, we investigate the sensitivity of clustering performance noisy uncorrelated variables iteratively added to baseline datasets with well defined clusters. We show how different types of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification

MethodsFeature Selection