Assessing the impact of dimensionality reduction on clustering performance -- a systematic study
Ousmane Assani-Amate, Mohammadreza Bakhtyari, \'Emilie Roy, Vladimir Makarenkov

TL;DR
This systematic study evaluates how five different dimensionality reduction techniques affect clustering performance across various algorithms and data types, highlighting the importance of tailored method and level selection.
Contribution
It provides a comprehensive analysis of the impact of multiple dimensionality reduction methods on clustering quality, guiding better preprocessing choices.
Findings
Dimensionality reduction significantly influences clustering outcomes.
The optimal reduction technique varies with data type and clustering algorithm.
Careful selection of reduction level improves clustering performance.
Abstract
Dimensionality reduction is a critical preprocessing step for clustering high-dimensional data, yet comprehensive evaluation of its impact across diverse methods and data types remains limited. In this study, we systematically assess the influence of five dimensionality reduction techniques - Principal Component Analysis (PCA), Kernel Principal Component Analysis (Kernel PCA), Variational Autoencoder (VAE), Isometric Mapping (Isomap), and Multidimensional Scaling (MDS) - on the performance of four popular clustering algorithms - k-means, Agglomerative Hierarchical Clustering (AHC), Gaussian Mixture Models (GMM), and Ordering Points to Identify the Clustering Structure (OPTICS). We evaluate clustering quality using the Adjusted Rand Index (ARI), comparing results without and with dimensionality reduction at different reduction levels recommended in the literature (i.e., k-1, where k is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
