Combining clustering of variables and feature selection using random   forests

Marie Chavent (CQFD); Robin Genuer (SISTM); Jerome Saracco (CQFD)

arXiv:1608.06740·math.ST·November 7, 2018·Commun. Stat. Simul. Comput.

Combining clustering of variables and feature selection using random forests

Marie Chavent (CQFD), Robin Genuer (SISTM), Jerome Saracco (CQFD)

PDF

1 Repo

TL;DR

This paper introduces a novel method combining hierarchical clustering of variables with random forest-based feature selection to improve high-dimensional classification, especially with mixed data types, enhancing interpretability and performance.

Contribution

The proposed approach automatically identifies variable groups and selects relevant synthetic variables without prior knowledge of group structure, handling mixed numerical and categorical data.

Findings

01

Improved classification accuracy over standard random forests.

02

Effective reduction of variable redundancy and dimensionality.

03

Enhanced interpretability through variable grouping.

Abstract

Standard approaches to tackle high-dimensional supervised classification problem often include variable selection and dimension reduction procedures. The novel methodology proposed in this paper combines clustering of variables and feature selection. More precisely, hierarchical clustering of variables procedure allows to build groups of correlated variables in order to reduce the redundancy of information and summarizes each group by a synthetic numerical variable. Originality is that the groups of variables (and the number of groups) are unknown a priori. Moreover the clustering approach used can deal with both numerical and categorical variables (i.e. mixed dataset). Among all the possible partitions resulting from dendrogram cuts, the most relevant synthetic variables (i.e. groups of variables) are selected with a variable selection procedure using random forests. Numerical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robingenuer/CoVVSURF
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.