Feature selection or extraction decision process for clustering using   PCA and FRSD

Jean-Sebastien Dessureault; Daniel Massicotte

arXiv:2111.10492·cs.LG·November 23, 2021

Feature selection or extraction decision process for clustering using PCA and FRSD

Jean-Sebastien Dessureault, Daniel Massicotte

PDF

Open Access

TL;DR

This paper introduces a decision framework for choosing between feature selection and extraction for clustering, utilizing PCA and FRSD, to improve unsupervised learning outcomes.

Contribution

It proposes a novel method to select the optimal dimensionality reduction technique based on data parameters for clustering tasks.

Findings

01

The method effectively guides the choice between feature selection and extraction.

02

Application to smart city data demonstrates practical utility.

03

Analysis of advantages and disadvantages of each approach.

Abstract

This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are usually made for a supervised learning technique process. A clustering algorithm is an unsupervised method. It means that there is no known output label to match the input data. This paper proposes a new method to choose the best dimensionality reduction method (selection or extraction) according to the data scientist's parameters, aiming to apply a clustering process at the end. It uses Feature Ranking Process Based on Silhouette Decomposition (FRSD) algorithm, a Principal Component Analysis (PCA) algorithm, and a K-Means algorithm along with its metric, the Silhouette Index (SI). This paper presents 5 use cases based on a smart city dataset. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Complex Network Analysis Techniques · Anomaly Detection Techniques and Applications