Ensembling improves stability and power of feature selection for deep learning models
Prashnna K Gyawali, Xiaoxia Liu, James Zou, Zihuai He

TL;DR
This paper introduces an ensembling approach to stabilize and enhance feature importance scores in deep learning models, addressing inherent stochasticity and variability in feature selection across different training runs.
Contribution
The paper proposes a simple ensembling framework for feature importance scores across epochs and hyperparameters, improving feature selection stability and power in deep learning models.
Findings
Ensembling feature importance scores reduces instability across runs.
The approach improves feature selection power in simulated and real datasets.
Ensembling outperforms single-model feature importance methods.
Abstract
With the growing adoption of deep learning models in different real-world domains, including computational biology, it is often necessary to understand which data features are essential for the model's decision. Despite extensive recent efforts to define different feature importance metrics for deep learning models, we identified that inherent stochasticity in the design and training of deep learning models makes commonly used feature importance scores unstable. This results in varied explanations or selections of different features across different runs of the model. We demonstrate how the signal strength of features and correlation among features directly contribute to this instability. To address this instability, we explore the ensembling of feature importance scores of models across different epochs and find that this simple approach can substantially address this issue. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Machine Learning in Materials Science
MethodsFeature Selection
