On feature selection in double-imbalanced data settings: a Random Forest approach

Fabio Demaria

arXiv:2506.10929·stat.ME·June 13, 2025

On feature selection in double-imbalanced data settings: a Random Forest approach

Fabio Demaria

PDF

Open Access

TL;DR

This paper introduces a new minimal depth-based feature selection method for Random Forests tailored to double-imbalanced high-dimensional data, improving stability and accuracy of variable importance rankings.

Contribution

It proposes a novel thresholding scheme based on minimal depth to enhance feature selection stability and interpretability in double-imbalanced settings.

Findings

01

More parsimonious variable subsets achieved

02

Improved accuracy over traditional methods

03

Validated on simulated and real datasets

Abstract

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional asymmetry in the data $(n ≫ p)$ . In such scenarios, traditional feature selection methods applied to Random Forests (RF) often yield unstable or misleading importance rankings. This paper proposes a novel thresholding scheme for feature selection based on minimal depth, which exploits the tree topology to assess variable relevance. Extensive experiments on simulated and real-world datasets demonstrate that the proposed approach produces more parsimonious and accurate subsets of variables compared to conventional minimal depth-based selection. The method provides a practical and interpretable solution for variable selection in RF under double imbalance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Face and Expression Recognition · Financial Distress and Bankruptcy Prediction

MethodsFeature Selection