Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data
Youngro Lee, Giacomo Baruzzo, Jeonghwan Kim, Jongmo Seo, Barbara Di Camillo

TL;DR
This study challenges the belief that high accuracy is necessary for valid feature importance in biomedical tabular data, showing that low-performing models can still provide meaningful feature importance insights.
Contribution
It demonstrates that feature importance validity can be maintained at low performance levels if data size is sufficient, using experiments on synthetic and real biomedical datasets.
Findings
Feature cutting maintains stable feature importance rankings.
Data cutting leads to higher discrepancies in feature importance at lower performance.
Models can distinguish feature importance despite performance degradation through feature cutting.
Abstract
In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · AI in cancer detection
