Feature Selection for Imbalanced Data with Deep Sparse Autoencoders Ensemble
Michela C. Massi, Francesca Ieva, Francesca Gasperoni, Anna Maria, Paganoni

TL;DR
This paper introduces a novel feature selection method using deep sparse autoencoders ensemble to improve minority class classification in imbalanced datasets, demonstrating superior performance over benchmarks.
Contribution
The paper proposes a filtering feature selection algorithm based on reconstruction error of deep sparse autoencoders trained on majority class data, tailored for imbalanced datasets.
Findings
Outperforms benchmark feature selection methods in high-dimensional datasets.
Effectively identifies features that discriminate minority class in imbalanced data.
Successfully applied to a real radiogenomics case.
Abstract
Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, in the same domains it is much more relevant to correctly classify and profile minority class observations. This need can be addressed by Feature Selection (FS), that offers several further advantages, s.a. decreasing computational costs, aiding inference and interpretability. However, traditional FS techniques may become sub-optimal in the presence of strongly imbalanced data. To achieve FS advantages in this setting, we propose a filtering FS algorithm ranking feature importance on the basis of the Reconstruction Error of a Deep Sparse AutoEncoders Ensemble (DSAEE). We use each DSAE trained only on majority class to reconstruct both classes. From the analysis of the aggregated Reconstruction Error, we determine the features where the minority class presents a different distribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · AI in cancer detection · Artificial Intelligence in Healthcare
MethodsFeature Selection
