Feature selection or extraction decision process for clustering using PCA and FRSD
Jean-Sebastien Dessureault, Daniel Massicotte

TL;DR
This paper introduces a decision framework for choosing between feature selection and extraction for clustering, utilizing PCA and FRSD, to improve unsupervised learning outcomes.
Contribution
It proposes a novel method to select the optimal dimensionality reduction technique based on data parameters for clustering tasks.
Findings
The method effectively guides the choice between feature selection and extraction.
Application to smart city data demonstrates practical utility.
Analysis of advantages and disadvantages of each approach.
Abstract
This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are usually made for a supervised learning technique process. A clustering algorithm is an unsupervised method. It means that there is no known output label to match the input data. This paper proposes a new method to choose the best dimensionality reduction method (selection or extraction) according to the data scientist's parameters, aiming to apply a clustering process at the end. It uses Feature Ranking Process Based on Silhouette Decomposition (FRSD) algorithm, a Principal Component Analysis (PCA) algorithm, and a K-Means algorithm along with its metric, the Silhouette Index (SI). This paper presents 5 use cases based on a smart city dataset. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Anomaly Detection Techniques and Applications
