# A SHAP-interpretable XGBoost model: MRI-based intratumoral perfusion heterogeneity predicts HER2-zero, -low, and -positive ternary expression status in breast cancer

**Authors:** Shuxing Wang, Xiaowen Liu, Yudie Pan, Cici Zhang, Yu Wu, Changsi Jiang, Xue Tang, Yan Luo, Jingshan Gong

PMC · DOI: 10.1186/s40644-026-01000-4 · Cancer Imaging · 2026-02-02

## TL;DR

A machine learning model using MRI data and SHAP interpretation can predict HER2 status in breast cancer with high accuracy and transparency.

## Contribution

A novel XGBoost model with SHAP interpretability that predicts HER2-zero, -low, and -positive statuses using MRI perfusion heterogeneity.

## Key findings

- The Habitat model achieved AUCs of 0.902 for HER2-zero, 0.877 for HER2-low, and 0.880 for HER2-positive in the training cohort.
- SHAP analysis identified subregion-specific radiomic features most influential in distinguishing HER2 statuses.
- The model demonstrated consistent performance across external test cohorts with AUCs above 0.84 for all HER2 categories.

## Abstract

This study aimed to predict HER2 status (HER2-zero, -low, and -positive) in breast cancer using MRI perfusion heterogeneity. The SHapley Additive exPlanations (SHAP) method was employed to interpret the outputs of machine learning models, which is crucial for guiding treatment with novel antibody-drug conjugates (ADCs).

The retrospective study included 912 women from three centers (Center A [n = 570] as the training cohort, and Centers B [n = 173] and C [n = 169] as external test cohorts) who underwent MRI between April 2018 and March 2024. Voxel vectors from MRI perfusion parameters (wash-in, wash-out, wash-out ratio) were clustered into subregions using k-means clustering. Radiomics features were extracted, and an XGBoost model incorporating these features was used to build the Habitat model. SHAP was applied to evaluate feature contributions and their importance.

Four sub-regions of tumor perfusion patterns were identified, containing 8, 8, 8, and 10 radiomics features, respectively. The Habitat model achieved AUCs of 0.902 for HER2-zero, 0.877 for HER2-low, and 0.880 for HER2-positive in Center A. In the external test cohorts, AUCs were 0.873, 0.845, and 0.865 for Center B and 0.865, 0.844, and 0.878 for Center C, respectively. SHAP analysis revealed the radiomic features that most strongly contributed to distinguishing HER2-zero, -low, and -positive tumors across the four perfusion-derived subregions. The global SHAP results identified subregion-specific features with the highest influence on model decisions, while the local SHAP explanations clarified how individual feature patterns drove prediction outcomes for specific patients.

The Habitat model accurately predicts HER2-zero, HER2-low, and HER2-positive expression status, while SHAP clarifies the contribution of subregion-derived radiomic features and enhances the overall interpretability and clinical transparency of the prediction framework.

Not applicable.

The online version contains supplementary material available at 10.1186/s40644-026-01000-4.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Genes:** ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}
- **Diseases:** breast cancer (MESH:D001943)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12954891/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12954891/full.md

---
Source: https://tomesphere.com/paper/PMC12954891