Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data

Youngro Lee; Giacomo Baruzzo; Jeonghwan Kim; Jongmo Seo; Barbara Di Camillo

arXiv:2409.13342·stat.ML·October 20, 2025

Validity of Feature Importance in Low-Performing Machine Learning for Tabular Biomedical Data

Youngro Lee, Giacomo Baruzzo, Jeonghwan Kim, Jongmo Seo, Barbara Di Camillo

PDF

Open Access

TL;DR

This study challenges the belief that high accuracy is necessary for valid feature importance in biomedical tabular data, showing that low-performing models can still provide meaningful feature importance insights.

Contribution

It demonstrates that feature importance validity can be maintained at low performance levels if data size is sufficient, using experiments on synthetic and real biomedical datasets.

Findings

01

Feature cutting maintains stable feature importance rankings.

02

Data cutting leads to higher discrepancies in feature importance at lower performance.

03

Models can distinguish feature importance despite performance degradation through feature cutting.

Abstract

In tabular biomedical data analysis, tuning models to high accuracy is considered a prerequisite for discussing feature importance, as medical practitioners expect the validity of feature importance to correlate with performance. In this work, we challenge the prevailing belief, showing that low-performing models may also be used for feature importance. We propose experiments to observe changes in feature rank as performance degrades sequentially. Using three synthetic datasets and six real biomedical datasets, we compare the rank of features from full datasets to those with reduced sample sizes (data cutting) or fewer features (feature cutting). In synthetic datasets, feature cutting does not change feature rank, while data cutting shows higher discrepancies with lower performance. In real datasets, feature cutting shows similar or smaller changes than data cutting, though some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · AI in cancer detection