# Revisiting AI Interpretability in Precision Oncology: Why Predictive Accuracy Does Not Ensure Stable Feature Importance

**Authors:** Souichi Oka, Yoshiyasu Takefuji

PMC · DOI: 10.3390/cancers18040593 · 2026-02-11

## TL;DR

This paper shows that accurate AI models in cancer research can give unstable explanations, and simpler methods may offer more reliable insights.

## Contribution

The study introduces feature ranking order consistency as a new metric to evaluate AI interpretability stability in precision oncology.

## Key findings

- Supervised models like XGBoost and Random Forest show unstable feature importance rankings with small input changes.
- Unsupervised methods like Highly Variable Gene Selection and Spearman’s correlation provide stable and biologically meaningful results.
- High predictive accuracy does not guarantee reliable or reproducible AI explanations in cancer data analysis.

## Abstract

Artificial intelligence (AI) is becoming a powerful tool in cancer research, helping researchers and clinicians predict patient outcomes and identify important biological markers. However, many AI models can appear highly accurate while still giving unstable or unreliable explanations about which factors are truly critical. This study evaluates the consistency and reliability of different AI methods in the analysis of complex breast cancer data. We found that some popular machine learning models change their explanations dramatically with only tiny changes to the input, raising concerns about their reliability. In contrast, simpler data-driven approaches identified important features more consistently and still achieved superior predictive performance. These findings highlight the importance of evaluating not only how accurate an AI model is, but also how stable and transparent its reasoning is. Improving the stability of AI explanations can support the development of safer, more dependable tools for understanding cancer and guiding future decisions.

Background: Artificial intelligence (AI) is becoming important in oncology, supporting risk prediction, treatment planning, and biomarker discovery. However, current evaluation practices often assume that high predictive accuracy implies reliable interpretation—a misconception that may undermine reproducibility and clinical decision-making. This study aims to reassess interpretability by introducing feature ranking order consistency as a stability-focused metric to evaluate how model explanations respond to minimal input perturbations. Methods: Using The Cancer Genome Atlas (TCGA) breast cancer multi-omics dataset, we compared supervised models—Linear Regression, Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest, and Extreme Gradient Boosting (XGBoost)—with unsupervised and statistical methods, including Principal Component Analysis (PCA), Highly Variable Gene Selection, and Spearman’s rank correlation. Each method produced a Top 20 feature ranking, and stability was assessed by testing whether rankings remained consistent after removing the top-ranked feature. Predictive performance was evaluated using a Random Forest classifier with stratified 10-fold cross-validation. Results: Supervised models exhibited unstable feature importance rankings even under minimal perturbations (<0.1% feature removal), suggesting that high predictive accuracy may obscure fragile or misleading explanations. In contrast, Highly Variable Gene Selection and Spearman’s correlation consistently produced stable, biologically coherent feature sets and maintained competitive predictive performance. Conclusions: Interpretive instability is a major limitation of many machine learning models in oncology. Incorporating stability-based criteria—such as feature ranking consistency—into evaluation frameworks is essential for ensuring reproducible, trustworthy, and clinically actionable AI. As AI adoption accelerates, prioritizing interpretability alongside accuracy is critical for responsible deployment in precision oncology.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ADH1B (alcohol dehydrogenase 1B (class I), beta polypeptide) [NCBI Gene 125] {aka ADH2, HEL-S-117}, ABCA12 (ATP binding cassette subfamily A member 12) [NCBI Gene 26154] {aka ARCI4A, ARCI4B, ICR2B, LI2}, TBX3 (T-box transcription factor 3) [NCBI Gene 6926] {aka TBX3-ISO, UMS, XHL}, GSTM1 (glutathione S-transferase mu 1) [NCBI Gene 2944] {aka GST1, GSTM1-1, GSTM1a-1a, GSTM1b-1b, GTH4, GTM1}, TFF1 (trefoil factor 1) [NCBI Gene 7031] {aka BCEI, D21S21, HP1.A, HPS2, pNR-2, pS2}, FOXA1 (forkhead box A1) [NCBI Gene 3169] {aka HNF3A, TCF3A}, ADIPOQ (adiponectin, C1Q and collagen domain containing) [NCBI Gene 9370] {aka ACDC, ACRP30, ADIPQTL1, ADPN, APM-1, APM1}, PTEN (phosphatase and tensin homolog) [NCBI Gene 5728] {aka 10q23del, BZS, CWS1, DEC, GLM2, MHAM}, SCGB2A2 (secretoglobin family 2A member 2) [NCBI Gene 4250] {aka MGB1, PSBP1, UGB2}, AKT1 (AKT serine/threonine kinase 1) [NCBI Gene 207] {aka AKT, PKB, PKB-ALPHA, PRKBA, RAC, RAC-ALPHA}
- **Diseases:** breast cancer (MESH:D001943), Ductal Carcinoma (MESH:D044584), death (MESH:D003643), metastasis (MESH:D009362), ILC (MESH:D018275), obesity (MESH:D009765), AI (MESH:C538142), inflammation (MESH:D007249), injury to (MESH:D014947), Cancer (MESH:D009369)
- **Chemicals:** lipid (MESH:D008055), alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12939461/full.md

---
Source: https://tomesphere.com/paper/PMC12939461