Predicting MammaPrint Recurrence Risk from Breast Cancer Pathological Images Using a Weakly Supervised Transformer

Chaoyang Yan; Linwei Li; Xiaolong Qian; Yang Ou; Zhidong Huang; Zhihan Ruan; Wenting Xiang; Hong Liu; Jian Liu

PMC · DOI:10.1002/advs.202510307·November 7, 2025

Predicting MammaPrint Recurrence Risk from Breast Cancer Pathological Images Using a Weakly Supervised Transformer

Chaoyang Yan, Linwei Li, Xiaolong Qian, Yang Ou, Zhidong Huang, Zhihan Ruan, Wenting Xiang, Hong Liu, Jian Liu

PDF

Open Access

TL;DR

A new AI model called CPMP predicts breast cancer recurrence risk from histopathological images, offering a cost-effective alternative to genomic tests.

Contribution

CPMP is a weakly supervised transformer model that enables spatial and morphological analysis of tumor images to predict MammaPrint risk groups.

Findings

01

CPMP achieves an AUROC of 0.824 ± 0.03 in predicting MammaPrint recurrence risk groups.

02

The model reveals distinct spatial and morphological patterns associated with high and low MammaPrint risk groups.

03

Prognostic evaluation shows CPMP significantly stratifies distant metastasis risk in an external cohort.

Abstract

Recurrence related to poor prognosis is a leading cause of mortality in patients with breast cancer (BC). The MammaPrint (MP) genomic assay is designed to stratify recurrence risk and evaluate chemotherapy benefits for early‐stage HR+/HER2‐ BC patients. However, MP fails to reveal spatial tumor morphology and is limited by high costs. In this study, a BC MP cohort is established and CPMP is developed, a weakly supervised agent‐attention transformer model, to predict MP recurrence risk from annotation‐free BC histopathological slides. CPMP achieves an AUROC of 0.824 ± 0.03 in predicting MP risk groups. CPMP is further leveraged for spatial and morphological analyses to explore histological patterns associated with MP risk groups. The model reveals tumor spatial localization at the whole‐slide level and highlights distinct intercellular interaction patterns of MP groups. It also…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

ERBB2

Proteins1

Species1

Homo sapiens(human · species)

Diseases4

breast cancer distant metastasis BC tumor

Figures7

Click any figure to enlarge with its caption.

Overview of CPMP and study design. a) Depiction of the proposed weakly‐supervised transformer architecture CPMP based on a multi‐instance learning paradigm. It first performs linear projection on the feature embeddings of local tiles. Then it employs the Agent Transformer architecture for feature aggregation to obtain a slide‐level global feature representation, followed by a regression‐based MLP predictor head that predicts the slide‐level results for the case. b) The Agent Transformer block adopted in the CPMP model. We introduced a 2D absolute coordinate encoding method to incorporate the spatial positional information and added the resulting positional embeddings to the projected tile embeddings. Then we initialized a class token (cls_token) and concatenated it with the spatial‐aware tile embeddings. This agent transformer block consists of 1 layer with four heads for global information aggregation. c) The agent‐attention mechanism utilized in the Agent Transformer block. The agent‐attention paradigm consists of a stacked state of two Softmax attention operations, comprising the first phase of first‐aggregation (colored in light blue) and the second phase of second‐broadcast (colored in light green). d–g) Flowchart of the study design: d) MP‐informed recurrence risk prediction, e) Spatial‐morphological attention heatmap visualization, f) Tumor‐associated phenotypical diversity discovery, and g) Distant metastasis prognostic risk assessment. These also summarize the clinical value of CPMP. MLP = Multi‐Layer Perceptron, MP = MammaPrint.

Performance evaluation and comparison of CPMP with state‐of‐the‐art methods for MP recurrence risk prediction. a) Scatter plot showing the correlation between the continuous genomic MP risk score and the continuous CPMP‐predicted risk probability for each patient in the test set, with a Spearman correlation coefficient (R) of 0.648 (p‐value < 0.0001). The fitted linear regression line is also shown. b) ROC curve of the CPMP model for MP recurrence risk group prediction across 5‐Time experiments (time0 to time4). c) Radar chart comparing the overall performance of CPMP with other comparative models (CEMIL, CLAM, TransMIL, Wagner et al., ABMIL, DTFDMIL, SAMMIL) across multiple metrics, including AUROC, balanced accuracy (bACC), average F1 score, average recall (avg REC), and average precision‐recall curve (avg PRC). d–f) Comprehensive comparison of CPMP with seven state‐of‐the‐art weakly supervised MIL methods using multiple metrics: d) ROC curves, e) Precision‐Recall (PR) curves at the low‐as‐positive mode, and f) PR curves at the high‐as‐positive mode. Low‐as‐positive is the condition that we aim to focus on low‐risk patients to avoid overtreatment. High‐as‐positive is to observe the high‐risk patients who may be at risk of recurrence. g) Evaluation of the efficiency‐AUC performance trade‐off of CPMP and comparative models in terms of AUC versus computational operations (G/MACs) and AUC versus inference runtime (ms). The data points are colored by the corresponding method. The results were obtained on a workstation equipped with two RTX 4090 GPUs. h) Evaluation of the number of parameters (K) of trained models for CPMP and comparative models. ROC = Receiver Operating Characteristic, PR = Precision‐Recall, AUC = Area Under the Receiver Operating Characteristic.

Effectiveness overview of CPMP in leveraging annotation‐free WSIs. a) Comparison of the performance of CPMP and the model with noisy tissue‐type filtering in terms of 5 evaluation metrics. The Wilcoxon signed‐rank test analysis was performed. b) AUROC values of models trained on different sets with a specific number of tiles randomly sampled from each slide for evaluation. The AUROC values were averaged across 5‐time experiments. c) Comparison of the AUROC performance of CPMP and the clinical‐based logistic regression model trained on clinicopathological features only. The Wilcoxon signed‐rank test analysis was performed. d) AUROC performance trend evaluation of the agent‐transformer models with a varying number of agent tokens. The light blue shading represents the standard deviation interval of performance across 5‐Time fivefold experiments, while the dashes are the AUROC performance lines of CPMP and the self‐attention transformer. e) Performance evaluation of the experimental models equipped with different positional encoding configurations, including PE, No‐PE, and PPEG. f) Performance evaluation of the experimental models equipped with different loss functions. CE: Classification‐based Cross‐Entropy loss, MSE: Regression‐based Mean‐Square‐Error loss. MSE+CE: Regression‐based MSE loss and CE loss with soft label, MSE+MSE: Regression‐based dual MSE loss strategy. g) Performance evaluation of the experimental models equipped with different feature extraction foundation models, including UNI, Virchow, H‐optimus‐0, Phikon, Prov‐GigaPath, and CTransPath. w/ = With, w/o = Without, ns = Statistically no significant difference, AUC = Area Under the receiver operating Characteristic, PE = Positional coordinate information Embedding, PPEG = Pyramid Position Encoding Generator, CE = Cross‐Entropy, MSE = Mean Square Error.

Heatmap visualization from CPMP revealing the spatial localization of tumors at the whole‐slide level. a,e) Example H&E‐stained WSIs of patients with genomic MP low‐risk (a) and high‐risk (e) in the TJMUCH‐MP cohort. Genomic MP diagnostic risk index and corresponding risk prediction results from CPMP are marked in the box. Scale bar: 5 mm/3 mm. b,f) The overlaid spatial heatmap visualization for the corresponding low‐risk (b) and high‐risk (f) cases at the whole‐slide level. The tile attention scores that contributed to the final patient‐level risk prediction were yielded from the agent‐attention matrix of local tiles using the attention rollout method. Spatial heatmap results were then generated by mapping the attention scores of all local tiles to their corresponding locations within the WSIs, and they were superposed onto the histopathological WSIs for interpretability. Please see the Experimental Section. A colorbar was added on the lower right of the heatmap. Scale bar: 5 mm/3 mm. c,g) Fine‐grained pathological ROIs extracted from the WSIs and the corresponding spatial heatmap visualization results. Scale bar: 500 µm. d,h) Top 10 local tiles with the highest attention score from each slide for both MP low‐risk (d) and high‐risk (h) cases. Scale bar: 50 µm. WSI = Whole Slide Image, ROI = Region of Interest, MP = MammaPrint diagnostic test.

Intercellular interaction patterns within the tumor ecosystem. a,b) Visualization of nuclei segmentation results for MP low‐risk (a), high‐risk (b) ROIs and their corresponding fine‐grained regions. HoverNet was applied for simultaneous nuclei segmentation and classification. Each cell nucleus was classified as one of 5 cell types (neoplastic, inflammatory, necrotic, connective, and non‐neoplastic epithelial cells). The contours of identified nuclei were located and drawn on the fine‐grained pathological ROIs. Scale bar: 500 and 100 µm. c,e) Nuclei segmentation results of the top 10 local tiles with the highest attention scores for MP low‐risk (c) and high‐risk (e). Scale bar: 50 µm. d,f) The cellular composition quantitation of the five cell types in each local tile region. The number of cells for each type is labeled on the right side of the plot, and the TcD values representing the density of tumor cells are marked in the lower part of the plot. g,h) Visualization of cellular interaction topological graphs in the tumor ecosystem. Three main cellular components (neoplastic, inflammatory, and connective stromal cells) within the tumor ecosystem were retained to construct topological graphs among these detected cell nuclei. MP low‐risk (g), high‐risk (h). Scale bar: 500 and 100 µm. i–k) The feature distribution of graph‐edge‐length‐related MeanEdgeLength (the average edge lengths of a tumor cell interacting with tumor cells (i, Graph_T‐T_MeanEdgeLength), stromal cells (j, Graph_T‐S_MeanEdgeLength), and inflammatory cells (k, Graph_T‐I_MeanEdgeLength)) at the single‐cell level for both MP risk groups. The sc‐MTOP framework was leveraged for profiling the graph‐based features of tumor cells interacting with other cells. P‐value was calculated by the nonparametric Mann–Whitney U rank test. WSI = Whole Slide Image, ROI = Region of Interest, MP = MammaPrint diagnostic test, TcD = Tumor cell Density.

Characterization of MP‐specific and colocalized morphological phenotypes and their associated biological features. a–c) UMAP subplots of high‐specific (a), colocalized (b), and low‐specific (c) subclusters of embeddings of local tiles, derived from Leiden clustering of tile‐level representations. Subclusters are labeled with their inferred IDs (SC0‐SC26). The high‐specific subclusters (a) are exclusively associated with the MP high‐risk group, while the low‐specific subclusters (c) are specifically related to the MP low‐risk group. d–f) Depiction of representative H&E‐stained local tiles and corresponding nuclei segmentation results for selected subclusters demonstrating the phenotypic characteristics of MP risk groups. d) Subcluster 23 and 14 (high‐specific), e) Subcluster 18 and 2 (colocalized), and f) Subcluster 22 and 10 (low‐specific). Nuclei are classified as connective (blue), inflammatory (green), necrotic (yellow), neoplastic (red), and non‐neoplastic epithelial cells (orange). Scale bars: 50 µm. g) Heatmap depicting the cellular composition quantitation of the five cell types (connective, inflammatory, necrotic, neoplastic, and non‐neoplastic epithelial cells) across the identified 27 subclusters. Min–max normalization was applied to scale the quantitative features of the cellular composition. Hierarchical clustering was performed on 27 subclusters using the Ward variance minimization algorithm and the Euclidean distance metric. Hierarchical clustering reveals distinct cellular compositions among subclusters, with high‐specific and low‐specific subclusters forming separate clusters. h) KEGG pathway enrichment analysis of genes within the MP signature that showed a significant correlation with CPMP‐predicted risk scores, including nine significantly enriched pathways. i,j) Scatter plots showing the correlation between CPMP‐predicted risk score (x‐axis) and AUCell scores (y‐axis) for two key biological pathways: i) Cell cycle and j) DNA replication. The data points are colored by the corresponding genomic MP risk group (low‐risk in Spring green, high‐risk in Lilac). The Spearman correlation coefficient (R) and p‐value are reported for each pathway. The fitted linear regression line is also shown. A negative correlation is observed, suggesting that CPMP‐predicted “high‐risk” scores are associated with higher activity in these proliferation‐related pathways. UMAP = Uniform Manifold Approximation and Projection, MP = MammaPrint diagnostic test.

Prognostic risk stratification performance of CPMP on the external TCGA‐BRCA data. a,b) Whole‐slide‐level spatial attention heatmap visualization results for tumor morphological patterns associated with prognostic events in the TCGA‐BRCA dataset. The H&E‐stained diagnostic WSIs of patients with (b, Scale bar: 5 mm) or without (a, Scale bar: 4 mm) distant metastasis event, the whole‐slide‐level overlaid spatial heatmap visualization derived from CPMP, and top 10 local tiles (Scale bar: 50 µm) with the highest attention score revealing the tumor morphological patterns of the slide are all shown. c–f) Kaplan–Meier curves for disease‐free Interval (c), Progression‐free Interval (d), Overall Survival (e), and Distant metastasis‐free Survival (f) of patient groups with high‐risk and low‐risk in the external TCGA‐BRCA dataset. Risk groups were directly predicted by CPMP trained in the TJMUCH‐MP cohort. HR and 95% CI were calculated by the Cox proportional hazard model, while the p‐value was calculated using the two‐sided log‐rank test. g) Forest plot of multivariate Cox regression analysis including the covariates age, AJCC pathologic tumor stage, PR status, ER status, HER2 status, LNp, and CPMP risk group, and their association with distant metastasis‐free survival. HR and 95% CI were calculated by the Cox proportional hazard model, and the p‐value was calculated using the two‐sided log‐rank test. (* indicates statistical significance, * p < 0.05, p < 0.01). WSI = Whole Slide Image, HR = Hazard Ratio, DMFS = Distant Metastasis‐Free Survival, CI = Confidence Intervals, AJCC = The American Joint Committee on Cancer, PR = Progesterone Receptor, ER = Estrogen Receptor, HER2 = Human Epidermal growth factor Receptor 2, LNp = The number of Lymph Nodes positive, HR+ = Hormone Receptor positive, LN0 = Lymph Nodes negative, LN1 = The number of Lymph Nodes positive≤ 3.

Funding2

—National Key Research and Development Program of China10.13039/501100012166
—National Natural Science Foundation of China10.13039/501100001809

Keywords

breast cancercomputational pathologyrecurrence risk assessmenttumor spatial morphologyweakly supervised learning

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Breast Cancer Treatment Studies · Single-cell and spatial transcriptomics

Full text

Introduction

1

Breast cancer (BC) is the most common cancer and the leading cause of cancer deaths among women worldwide.^[^ 1 ^]^ Recurrence and distant metastasis are significant factors that contribute to poor prognostic outcomes and are the primary causes of death among BC patients.^[^ 2, 3 ^]^ Recurrence risk stratification is essential for informing therapeutic regimens in the clinical management of BC. Currently, recurrence and metastasis prediction remains a challenging task, typically assessed through clinical characteristics, the Nottingham Grading system, or the tumor‐node‐metastasis staging system.^[^ 4 ^]^ However, it can be challenging to determine the optimal treatment for individuals using these schemes, which may lead to overtreatment in some cases.

The 70‐gene MammaPrint (MP) test, which has drawn attention among clinicians, is specifically designed to evaluate recurrence risk in early‐stage Hormone Receptor positive (HR+), Human Epidermal Growth Factor Receptor 2 negative (HER2‐) invasive BC patients using formalin‐fixed paraffin‐embedded (FFPE) or fresh frozen tissue.^[^ 5, 6, 7 ^]^ This test estimates the 10‐year risk of distant metastasis and classifies patients into low or high‐risk groups for recurrence. The MP test effectively identifies high‐risk patients who may benefit from adjuvant chemotherapy and low‐risk patients who, despite having clinical high‐risk features, may safely be exempt from chemotherapy.^[^ 7, 8 ^]^ Although the genomic MP test offers significant clinical value, it falls short in revealing the spatial characteristics of tumors for recurrence risk diagnosis. Additionally, the MP test is time‐consuming, labor‐intensive, and requires substantial resources. Its high cost presents a financial barrier to many patients.^[^ 9 ^]^ This situation ultimately hinders patients from receiving precise treatments based on a risk assessment of their condition. Therefore, there is a need to develop an accurate and cost‐efficient method for patient recurrence risk stratification.

Histopathology provides extensive insights into the spatial distribution of tissue, cellular interactions, and tumor morphology,^[^ 10 ^]^ establishing itself as the “gold standard” for cancer diagnosis. The increasing accessibility of digital histopathological whole‐slide images (WSIs), coupled with advancements in artificial intelligence (AI) technology, has propelled the research of computational pathology (CPATH) forward^[^ 11, 12, 13 ^]^ and made remarkable strides in molecular biomarker discovery^[^ 14, 15, 16, 17, 18, 19, 20 ^]^‐ Recent studies^[^ 21, 22, 23, 24, 25 ^]^ have demonstrated the potential of deep learning to extract prognostic information from BC histopathology. Wahab et al. developed an AI‐based BRACE marker from stromal and tumor heterogeneity to predict survival in early‐stage luminal BC patients using routine WSIs.^[^ 26 ^]^ Chen et al. created a computational pathology framework that enhances risk stratification for early‐stage BC patients within the risk categories defined by the Oncotype DX (ODX) assay.^[^ 27 ^]^ Furthermore, Howard et al. integrated clinicopathological variables with image‐based features to predict recurrence risk scores based on ODX assay results.^[^ 28 ^]^ These studies underscore the value of identifying morphological biomarkers from pathological WSIs to complement genomic assays in enhancing prognostic stratification. However, the application of deep learning to predict recurrence risk based on paired MP genomic data and histopathological images remains unexplored. There is a critical gap in research that leverages the MP test's genomic foundation to characterize histological patterns relevant to distant metastasis risk stratification and patient prognosis.

To bridge this gap, we construct the clinical MP cohort of HR+/HER2‐ invasive BC patients who underwent standardized MP genomic testing at Tianjin Medical University Cancer Institute and Hospital (TJMUCH), National Clinical Research Center for Cancer, China. We present an AI‐driven computational pathology framework (CPMP) for MP‐informed recurrence risk assessment using histopathological WSIs from early‐stage BC patients. We develop a spatially‐aware, weakly supervised learning model with a regression‐based design and an agent‐attention transformer paradigm to predict continuous recurrence risk scores from annotation‐free digital WSIs. We further generate spatial attention heatmap visualization at the whole‐slide level to reveal intratumoral spatial localization, as well as the cellular interaction patterns associated with genomic MP within the tumor ecosystem. We characterize the phenotypic diversity in tumors and uncover high‐specific, low‐specific, and colocalized morphological phenotypes among MP risk groups, each differing in their quantitative cellular compositions. Moreover, we expand our model to conduct the generalization analysis in the external cohorts and demonstrate its predictive capability for risk stratification across multiple prognostic indicators. The CPMP framework is capable of predicting MP recurrence risk, facilitating the discovery of prognostic knowledge, and has the potential to serve as a flexible complement to the standard clinical diagnostic workflow for early‐stage BC patients. Here, “flexible” refers to cost‐efficient, rapid recurrence risk assessment directly from Hematoxylin and Eosin (H&E)‐stained slides, while also providing spatial and morphological information that enhances the genomic MP test.

Results

2

Overview of the CPMP Model and Study Design

2.1

We establish an early‐stage HR+/HER2‐ invasive BC cohort, TJMUCH‐MP, comprising 477 female patients (Figures S1,S2a, Supporting Information). The patient cohort is composed of clinicopathologic characteristics, digital WSIs, and genomic MP diagnostic results (Figure S1b–d, Supporting Information). The clinicopathological characteristics of patients in the TJMUCH‐MP cohort are presented in Table S1 (Supporting Information). MP diagnostic tests have been performed on tumor specimens of patients who met the inclusion criteria with invasive BC, HR+/HER2‐, staged T1‐2, lymph node‐negative or 1–3 metastases from the StGallen recommendations.^[^ 29 ^]^ Patients are categorized into high‐risk and low‐risk based on the risk values of the genomic MP test (Figures S1c,S2b, Supporting Information).

We present CPMP, a weakly supervised multiple instance learning (MIL) model for MP‐informed recurrence risk assessment of BC patients based on annotation‐free histopathological WSIs (Figure 1a). Our proposed CPMP framework first preprocesses WSIs for the tissue region identification and tessellates local tiles with a size of 256 × 256 within the foreground regions at 20× objective magnification (Figure S2c, Supporting Information). The median number of tiles across WSIs in the TJMUCH‐MP cohort is 8229, and the distribution of the number of tiles in WSIs has been reported (Figure S2d, Supporting Information). Then, the CPMP framework captures diverse patterns and generates a low‐dimensional feature representation of local tile instances using self‐supervised foundation models (Figure S2e, Supporting Information). Given that the spatial relationships among tissue regions suggest cellular interactions that potentially reveal biological behaviors and prognostic significance, the Positional coordinate information Embedding (PE) of each tile is encoded and integrated with low‐dimensional tissue embeddings to enhance spatial correlation among local regions. CPMP introduces the transformer architecture (Figure 1b) for feature aggregation and employs the agent‐attention mechanism^[^ 30 ^]^ (Figure 1c) to achieve a balance between global representation capability and computational efficiency. Additionally, CPMP utilizes a regression‐based multi‐layer perceptron (MLP) predictor to generate a continuous numerical value representing genomic MP risk scores.

Overview of CPMP and study design. a) Depiction of the proposed weakly‐supervised transformer architecture CPMP based on a multi‐instance learning paradigm. It first performs linear projection on the feature embeddings of local tiles. Then it employs the Agent Transformer architecture for feature aggregation to obtain a slide‐level global feature representation, followed by a regression‐based MLP predictor head that predicts the slide‐level results for the case. b) The Agent Transformer block adopted in the CPMP model. We introduced a 2D absolute coordinate encoding method to incorporate the spatial positional information and added the resulting positional embeddings to the projected tile embeddings. Then we initialized a class token (cls_token) and concatenated it with the spatial‐aware tile embeddings. This agent transformer block consists of 1 layer with four heads for global information aggregation. c) The agent‐attention mechanism utilized in the Agent Transformer block. The agent‐attention paradigm consists of a stacked state of two Softmax attention operations, comprising the first phase of first‐aggregation (colored in light blue) and the second phase of second‐broadcast (colored in light green). d–g) Flowchart of the study design: d) MP‐informed recurrence risk prediction, e) Spatial‐morphological attention heatmap visualization, f) Tumor‐associated phenotypical diversity discovery, and g) Distant metastasis prognostic risk assessment. These also summarize the clinical value of CPMP. MLP = Multi‐Layer Perceptron, MP = MammaPrint.

We developed and trained the CPMP model to predict MP‐informed patient‐level recurrence risk (Figure 1d) using the clinical TJMUCH‐MP cohort. In addition, we leverage CPMP to generate whole‐slide‐level spatial attention heatmaps using histopathological slides (Figure 1e), which provide key morphological patterns and spatial localization associated with MP recurrence risk. We investigate tumor morphological patterns by clustering the learned tile embeddings from CPMP to characterize tumor diversity and identify specific, colocalized morphological phenotypes across MP risk groups (Figure 1f). Moreover, we generalize our trained CPMP model to the external cohorts for recurrence risk stratification and prognostic analysis, aiming to validate the potential capabilities of CPMP in assessing distant metastasis risk (Figure 1g). More details are provided in the Experimental Section.

CPMP Outperforms Other Methods for MP Recurrence Risk Prediction

2.2

The model's ability to predict the recurrence risk from histopathological WSIs was measured on both continuous risk scores and discrete risk groups using the TJMUCH‐MP cohort. To mitigate data bias and underlying scanning effects (Figure S2f, Supporting Information), we employed a patient‐level 5‐Time fivefold cross‐validation strategy (Figure S3a, Supporting Information) for model development and evaluation. Our model achieved an average Spearman correlation coefficient of 0.648 [p < 0.0001, range: 0.606–0.684], indicating a favorable consistency between genomic MP risk scores and recurrence risk prediction probabilities (Figure 2a; Figure S3d–i, Supporting Information). In classifying the MP recurrence risk groups, CPMP attained an average area under the receiver operating characteristic curve (AUROC) of 0.824 ± 0.03 [range: 0.789–0.853] (Figure 2b), with a balanced accuracy of 0.772 ± 0.03 [range: 0.750–0.773]. CPMP provides low‐as‐positive and high‐as‐positive modes, which means that we aim to focus on low‐risk patients to avoid overtreatment, or, conversely, to monitor high‐risk patients who may be at risk of recurrence. CPMP reached average areas under the precision‐recall curves (AUPRCs) of 0.873 ± 0.02 [range: 0.835–0.900] on the Low‐as‐positive mode (Figure S3b, Supporting Information) and 0.774 ± 0.01 [range: 0.751–0.795] on the High‐as‐positive mode (Figure S3c, Supporting Information).

Performance evaluation and comparison of CPMP with state‐of‐the‐art methods for MP recurrence risk prediction. a) Scatter plot showing the correlation between the continuous genomic MP risk score and the continuous CPMP‐predicted risk probability for each patient in the test set, with a Spearman correlation coefficient (R) of 0.648 (p‐value < 0.0001). The fitted linear regression line is also shown. b) ROC curve of the CPMP model for MP recurrence risk group prediction across 5‐Time experiments (time0 to time4). c) Radar chart comparing the overall performance of CPMP with other comparative models (CEMIL, CLAM, TransMIL, Wagner et al., ABMIL, DTFDMIL, SAMMIL) across multiple metrics, including AUROC, balanced accuracy (bACC), average F1 score, average recall (avg REC), and average precision‐recall curve (avg PRC). d–f) Comprehensive comparison of CPMP with seven state‐of‐the‐art weakly supervised MIL methods using multiple metrics: d) ROC curves, e) Precision‐Recall (PR) curves at the low‐as‐positive mode, and f) PR curves at the high‐as‐positive mode. Low‐as‐positive is the condition that we aim to focus on low‐risk patients to avoid overtreatment. High‐as‐positive is to observe the high‐risk patients who may be at risk of recurrence. g) Evaluation of the efficiency‐AUC performance trade‐off of CPMP and comparative models in terms of AUC versus computational operations (G/MACs) and AUC versus inference runtime (ms). The data points are colored by the corresponding method. The results were obtained on a workstation equipped with two RTX 4090 GPUs. h) Evaluation of the number of parameters (K) of trained models for CPMP and comparative models. ROC = Receiver Operating Characteristic, PR = Precision‐Recall, AUC = Area Under the Receiver Operating Characteristic.

The performance of our spatially aware, agent‐attention transformer model was further compared with other weakly supervised MIL methods, including CEMIL,^[^ 31 ^]^ SAMMIL,^[^ 32 ^]^ DTFDMIL,^[^ 33 ^]^ ABMIL,^[^ 34 ^]^ CLAM,^[^ 35 ^]^ Wagner et al.,^[^ 36 ^]^ and TransMIL.^[^ 37 ^]^ Details of all comparative models are provided in the Experimental Section. CPMP outperformed all comparative approaches in MP recurrence risk prediction in terms of systematic metrics, including AUROC, balanced accuracy, weighted average recall, precision, F1 scores, ROC curves, and precision‐recall curves (Figure 2c–f, Table S2, Supporting Information). The average AUROC value of the CPMP model is 0.824, while those of other methods, namely CEMIL, SAMMIL, DTFDMIL, CLAM, TransMIL, ABMIL, and Wagner et al., are 0.741 ± 0.04, 0.770 ± 0.04, 0.819 ± 0.03, 0.798 ± 0.04, 0.755 ± 0.06, 0.810 ± 0.03, and 0.806 ± 0.03, respectively. It is noteworthy that the Transformer‐based methods (Wagner et al. and TransMIL) obtained acceptable performance but fell short in comparison with our transformer‐based CPMP. We visualized the predicted risk probabilities from comparative methods alongside genomic MP risk scores of patients (Figure S4a–h, Supporting Information). The results indicate that these classification‐based methods tend to predict risk probabilities that are skewed toward both ends. Meanwhile, CPMP achieved performance improvements ranging from 4.8% (compared to DTFDMIL) to 33.6% (compared to CEMIL) in the Spearman correlation coefficient over the comparative methods (Figure S4a,b,d, Supporting Information). This is attributed to the regression‐based design in our method. The computational operations and runtime costs for inference were conducted on a workstation with RTX 4090 GPUs for comparison. We can see that CPMP reduces inference operations (G/MACs) by 7.50× times and increases inference time by 5.75× times, while improving AUC performance by 2.2% (p < 0.05, statistically significant difference), compared with the Wagner et al. transformer (Figure 2g). Also, CPMP requires fewer parameters than other methods (Figure 2h). CPMP attains the most superior efficiency‐AUC performance during inference, striking a balance between slide representation capability and computational complexity.

CPMP Highlights its Effectiveness in Leveraging Annotation‐Free WSIs

2.3

Blurry areas, pen markings, and non‐tissue regions are inevitably present in annotation‐free histopathological WSIs from real‐world clinical data, introducing noise into the training data. Thus, we investigated CPMP's sensitivity to noise by removing noisy tiles. Specifically, we performed zero‐shot classification to determine tissue types (tumor, adipose, stroma, immune infiltrates, gland, necrosis or hemorrhage, background or black, and non) across all tiles through the visual‐language foundation model PLIP^[^ 38 ^]^ (Figure S5a,b, Supporting Information). Local tile images along with their corresponding zero‐shot classification types were visualized (Figure S5c, Supporting Information). Tiles classified as “background or black” and “non” types are typically blurry, pen‐marked, or lacking morphological detail. In contrast, tiles identified as the six specific tissue types (tumor, adipose, stroma, immune infiltrates, gland, and necrosis or hemorrhage) contain rich morphological information and are thus utilized for training the other model. Evaluation results for both models indicate that the model with noisy tile filtering achieved comparable performance to the CPMP model without such filtering (Figure 3a, Table S3, Supporting Information), showing no statistically significant difference between the two. This reflects the robustness of our weakly supervised CPMP approach in handling local noisy information within annotation‐free slides, making it more suitable for clinical scenarios.

Effectiveness overview of CPMP in leveraging annotation‐free WSIs. a) Comparison of the performance of CPMP and the model with noisy tissue‐type filtering in terms of 5 evaluation metrics. The Wilcoxon signed‐rank test analysis was performed. b) AUROC values of models trained on different sets with a specific number of tiles randomly sampled from each slide for evaluation. The AUROC values were averaged across 5‐time experiments. c) Comparison of the AUROC performance of CPMP and the clinical‐based logistic regression model trained on clinicopathological features only. The Wilcoxon signed‐rank test analysis was performed. d) AUROC performance trend evaluation of the agent‐transformer models with a varying number of agent tokens. The light blue shading represents the standard deviation interval of performance across 5‐Time fivefold experiments, while the dashes are the AUROC performance lines of CPMP and the self‐attention transformer. e) Performance evaluation of the experimental models equipped with different positional encoding configurations, including PE, No‐PE, and PPEG. f) Performance evaluation of the experimental models equipped with different loss functions. CE: Classification‐based Cross‐Entropy loss, MSE: Regression‐based Mean‐Square‐Error loss. MSE+CE: Regression‐based MSE loss and CE loss with soft label, MSE+MSE: Regression‐based dual MSE loss strategy. g) Performance evaluation of the experimental models equipped with different feature extraction foundation models, including UNI, Virchow, H‐optimus‐0, Phikon, Prov‐GigaPath, and CTransPath. w/ = With, w/o = Without, ns = Statistically no significant difference, AUC = Area Under the receiver operating Characteristic, PE = Positional coordinate information Embedding, PPEG = Pyramid Position Encoding Generator, CE = Cross‐Entropy, MSE = Mean Square Error.

We next evaluated the scale flexibility of CPMP in the number of local tiles in WSIs. We conducted random sampling from the set of local tiles in each WSI, employing sample sizes of 1000, 5000, and 10 000 tiles as subsets for model training. The trained models on various subsets of tiles exhibit consistent performance trends on various numbers of tiles, with differences in AUROC remaining within 0.01 compared to the model using all tiles (p‐value > 0.05, indicating no statistically significant differences, Figure 3b, Table S4, Supporting Information). Notably, these models demonstrated robust performance with AUROC values exceeding 0.810 when making predictions based on as few as 100 tiles up to the full set of tiles. This suggests that CPMP is both scalable and efficient in recurrence risk inference using histopathological WSIs.

We further compared CPMP with a clinical‐based logistic regression model that was trained on clinicopathological features for MP recurrence risk prediction for patients. The detailed settings for this clinical‐based model are listed in the Experimental Section and Table S5 (Supporting Information). Clinical‐based model obtained an acceptable AUC value of 0.800 (Figure 3c, Table S6, Supporting Information), but it was relatively inferior to CPMP, especially given the fact that the clinical‐based logistic regression model requires molecular information such as hormone receptor status and Ki‐67 percentages, whereas CPMP relies solely on routine H&E‐stained histopathological slides.

We conducted ablation studies on CPMP with respect to attention mechanisms, spatial positional encoding schemes, predictor head strategies, and tile‐level feature representation models. The design outlines and the experimental configurations utilized for the ablation models are illustrated in Figure S6 (Supporting Information). Details about these modules are provided in the Experimental Section. The quantitative results reported that our CPMP method with the agent‐attention mechanism brought a 1.8% improvement in AUROC over the self‐attention transformer model, and it was insensitive to the number of agent tokens (Figure 3d, Table S7, Supporting Information). Our PE‐based model outperformed the PPEG‐based^[^ 37 ^]^ and No‐PE models in AUROC, weighted average recall, and F1 score (Figure 3e, Table S8, Supporting Information). Additionally, a similar improvement can be observed in our regression‐based mean square error supervision scheme when compared to the cross‐entropy approach (Figure 3f, Table S9, Supporting Information). In the ablation study of tile‐level feature representation, the Prov‐GigaPath^[^ 39 ^]^ and Virchow^[^ 40 ^]^ models excelled in UNI^[^ 41 ^]^ in terms of AUROC (0.827 vs 0.824), while H‐optimus‐0^[^ 42 ^]^ achieved the highest values across other metrics (Figure 3g, Table S10, Supporting Information). Additionally, the performance of CTransPath^[^ 43 ^]^ and Phikon^[^ 44 ^]^ was inferior to that of other models in representing tile embeddings for MP recurrence risk prediction in the TJMUCH‐MP cohort. Given that the embedding dimensions of tile representations of Prov‐GigaPath, Virchow, and H‐optimus‐0 are larger than those of UNI, and considering the balance with computational efficiency, we selected the UNI‐based model for the subsequent analyses reported in this study.

CPMP Reveals the Spatial Localization of Tumors at the Whole‐Slide Level

2.4

Our proposed weakly supervised CPMP model not only predicts patient recurrence risk using spatial histopathological WSIs but also infers tile‐level attention scores. These scores can offer insights into spatial factors that influence patient outcomes from the perspective of the whole‐slide level. We performed whole‐slide‐level attention heatmap inference on test cases. Spatial heatmap results were generated by mapping the attention scores of tiles to their corresponding locations within the slide. This allows us to visualize the contributions of different regions and display key morphological patterns. Details are provided in the Experimental Section. The representative histopathological WSIs selected from genomic MP low‐risk and high‐risk groups were presented, along with their spatial visualization results at both the whole‐slide and region‐of‐interest (ROI) levels (Figure 4a–c,e–g; Figure S7a–c,e–g, Supporting Information). Genomic MP diagnostic results and recurrence risk values predicted by the CPMP model were also marked in the box (Figure 4a,e; Figure S7a,e, Supporting Information). One can observe that CPMP exhibits varying levels of attention confidence in tissue regions, with high confidence indicated in red and low confidence in blue–purple, at the whole‐slide level for both low‐risk and high‐risk cases (Figure 4b,f; Figure S7b,f, Supporting Information). The pathologists confirmed that the areas identified as high confidence closely correspond to regions where tumor cells proliferate, based on their spatial locations. It can also be observed from the magnified ROIs that the high‐attention heatmap focuses on the tumor regions and effectively outlines the boundary between tumor and normal stroma tissue (Figure 4c,g; Figure S7c,g, Supporting Information). Hot regions, as rendered by the spatial heatmap visualization, are highly consistent with the tumor regions pre‐marked by experts (Figure S7e,f, Supporting Information). CPMP further identifies the top 10 local tiles with the highest attention scores from each slide (Figure 4d,h; Figure S7d,h, Supporting Information). These selected tiles display clear patterns of tumor morphology applicable to both low‐risk and high‐risk cases. To quantitatively validate the interpretability of CPMP, breast pathologists manually annotated tumor regions on WSIs, allowing us to quantify the concordance between CPMP‐generated high‐attention regions (threshold = 0.5) and pathologist‐labeled ground‐truth tumor regions (Figures S8,S9, Supporting Information). Multiple evaluation metrics confirmed the agreement: mean AUC = 0.946, recall ratio = 0.630 ± 0.171, and Dice coefficient = 0.622 ± 0.126 (Table S11, Supporting Information). These results demonstrate a desired alignment between model attention and expert annotations. A mean overlap ratio of 0.704 further indicates that the model's attention is mainly focused on regions marked by pathologists as tumor tissue, providing quantitative support for the reliability of CPMP attention maps. The spatial heatmap results indicate that our model can comprehensively pinpoint tumor growth locations and capture the spatial localization of tumor ROIs at the whole‐slide level. This capability of tumor spatial localization offers a visually explainable approach to recurrence risk prediction that complements the genomic MP test with valuable spatial insights.

Heatmap visualization from CPMP revealing the spatial localization of tumors at the whole‐slide level. a,e) Example H&E‐stained WSIs of patients with genomic MP low‐risk (a) and high‐risk (e) in the TJMUCH‐MP cohort. Genomic MP diagnostic risk index and corresponding risk prediction results from CPMP are marked in the box. Scale bar: 5 mm/3 mm. b,f) The overlaid spatial heatmap visualization for the corresponding low‐risk (b) and high‐risk (f) cases at the whole‐slide level. The tile attention scores that contributed to the final patient‐level risk prediction were yielded from the agent‐attention matrix of local tiles using the attention rollout method. Spatial heatmap results were then generated by mapping the attention scores of all local tiles to their corresponding locations within the WSIs, and they were superposed onto the histopathological WSIs for interpretability. Please see the Experimental Section. A colorbar was added on the lower right of the heatmap. Scale bar: 5 mm/3 mm. c,g) Fine‐grained pathological ROIs extracted from the WSIs and the corresponding spatial heatmap visualization results. Scale bar: 500 µm. d,h) Top 10 local tiles with the highest attention score from each slide for both MP low‐risk (d) and high‐risk (h) cases. Scale bar: 50 µm. WSI = Whole Slide Image, ROI = Region of Interest, MP = MammaPrint diagnostic test.

CPMP Deciphers Cellular Interaction Patterns within the Tumor Ecosystem

2.5

To gain deeper insights into the intercellular spatial relationships within the tumor ecosystem across MP recurrence risk groups, high‐attention ROIs localized by CPMP were extracted for simultaneous nuclear instance segmentation and classification (Figure S10a, Supporting Information). HoverNet^[^ 45 ^]^ was utilized for the identification of five types of cell nuclei: neoplastic (tumor) cells, inflammatory cells, necrotic (dead) cells, connective (stroma) cells, and non‐neoplastic epithelial cells. The contours of the identified nuclei were delineated on the fine‐grained histopathological ROIs. From the nuclei segmentation visualization results for both low‐risk and high‐risk ROIs (Figure 5a,b), we can observe that the identified cell nuclei are predominantly tumor cells (outlined in red), while stromal cells (indicated in blue) are sparsely distributed around the tumor tissue. The results of nuclei identification and visualization of the top 10 local tiles further support this finding (Figure 5c,e; Figure S10c–h, Supporting Information). It is noteworthy that high‐risk cases exhibit a more dispersed spatial distribution among tumor cells compared to low‐risk cases. We thus quantified the cellular composition of the five cell types in each local tile and defined the density of tumor cells (TcD) as the number of identified neoplastic cells per square millimeter (Figure 5d,f). We can see that there is a clear difference in tumor cell density between the two MP risk groups. The number of neoplastic cells in low‐risk tiles is generally larger than that in high‐risk tiles (median: 103 vs 53, low‐risk vs high‐risk). The measured TcD values for low‐risk tiles are consistently greater than 4000 tumor cells per mm^2^ (median: 5698, [range: 4149–7081]), while those for high‐risk tiles are all below 4000 neoplastic cells per mm^2^ (median: 2932, [range: 1604–3983]).

Intercellular interaction patterns within the tumor ecosystem. a,b) Visualization of nuclei segmentation results for MP low‐risk (a), high‐risk (b) ROIs and their corresponding fine‐grained regions. HoverNet was applied for simultaneous nuclei segmentation and classification. Each cell nucleus was classified as one of 5 cell types (neoplastic, inflammatory, necrotic, connective, and non‐neoplastic epithelial cells). The contours of identified nuclei were located and drawn on the fine‐grained pathological ROIs. Scale bar: 500 and 100 µm. c,e) Nuclei segmentation results of the top 10 local tiles with the highest attention scores for MP low‐risk (c) and high‐risk (e). Scale bar: 50 µm. d,f) The cellular composition quantitation of the five cell types in each local tile region. The number of cells for each type is labeled on the right side of the plot, and the TcD values representing the density of tumor cells are marked in the lower part of the plot. g,h) Visualization of cellular interaction topological graphs in the tumor ecosystem. Three main cellular components (neoplastic, inflammatory, and connective stromal cells) within the tumor ecosystem were retained to construct topological graphs among these detected cell nuclei. MP low‐risk (g), high‐risk (h). Scale bar: 500 and 100 µm. i–k) The feature distribution of graph‐edge‐length‐related MeanEdgeLength (the average edge lengths of a tumor cell interacting with tumor cells (i, Graph_T‐T_MeanEdgeLength), stromal cells (j, Graph_T‐S_MeanEdgeLength), and inflammatory cells (k, Graph_T‐I_MeanEdgeLength)) at the single‐cell level for both MP risk groups. The sc‐MTOP framework was leveraged for profiling the graph‐based features of tumor cells interacting with other cells. P‐value was calculated by the nonparametric Mann–Whitney U rank test. WSI = Whole Slide Image, ROI = Region of Interest, MP = MammaPrint diagnostic test, TcD = Tumor cell Density.

Further, neoplastic cells, connective cells, and inflammatory cells (the main cellular components in the tumor ecosystem^[^ 46 ^]^) were employed to construct topological graphs among these cell nuclei to facilitate our understanding of intercellular interactions within the tumor microenvironment (Figure 5g,h). The cellular topological graphs were visually characterized by interactions between tumor cells and stromal cells, with minimal evidence of lymphocyte presence and interaction with other cell types within both low‐risk and high‐risk OIs. To quantitatively characterize the spatial intercellular interaction patterns within the tumor ecosystem, the sc‐MTOP^[^ 47 ^]^ framework was utilized to extract the topological graph‐based features at the single‐cell level. We only focus on profiling the topological features of tumor cells that interact with other cells, specifically examining the connectivity graphs of tumor–tumor, tumor–stroma, and tumor–inflammatory interactions (Figure S10b, Supporting Information). The distribution of the MeanEdgeLength features (the average edge lengths of a cell interacting with other cells) related to graph‐edge lengths of tumor cells was analyzed and depicted in Figure 5i–k. Among low‐risk patients, the graph‐edge length distribution of tumor–tumor interaction graphs exhibits a higher peak, suggesting a denser and more abundant distribution of tumor cells. In contrast, high‐risk patients display a sparse distribution of tumor cells (Figure 5i and p < 0.0001, statistically significant difference). The edge length distribution for tumor–stroma and tumor–inflammatory interaction graphs remains generally consistent across both patient groups (Figure 5j and p = 0.2467, and Figure 5k and p = 0.2729, no statistically significant difference). These results provide evidence that CPMP harnesses whole‐slide‐level heatmap visualization for deciphering the patterns of intercellular interactions within the tumor ecosystem, which are associated with genomic MP recurrence risk groups.

CPMP Characterizes Morphological Phenotypes Associated with MP Risk Groups

2.6

Breast cancer ecosystems often exhibit both intra‐tumoral and inter‐tumoral heterogeneity, which influences tumor progression, metastasis formation, and patient prognosis.^[^ 48 ^]^ To this end, we aimed to characterize the tumor morphological patterns throughout the entire test set and explore the specific and common phenotypes associated with MP risk groups using CPMP. Specifically, the top 100 tiles with the highest attention scores in each slide were identified to create a collection of high‐confidence local regions. Each tile in this collection was also fed into CPMP for inference. It can be observed from the uniform manifold approximation and projection (UMAP) plot that tile instances display a clear distinction among different levels of the genomic MP risk (Figure S11a, Supporting Information). The recurrence risk prediction at both the WSI‐level and the single‐tile‐level exhibits similar distribution characteristics in the UMAP (Figure S11b,c, Supporting Information), in keeping with the observations from Figure S11a (Supporting Information). The model's capability of predicting recurrence risk at the single‐tile level is more pronounced, as it demonstrates a clear trajectory direction of recurrence risk development (indicated by the arrow in Figure S11c, Supporting Information). These results suggest that the high‐confidence local tiles identified by the CPMP model provide a sufficient representation of WSIs.

We further performed Leiden clustering on tile embeddings to characterize phenotypic diversity of tumor morphology and attached the Low‐risk or high‐risk group information for each subcluster based on the genomic MP risk group of the WSI from which the local tile originated. High‐attention tumor tiles were classified into 27 subclusters (named SC0‐SC26) with different morphological embeddings (Figure S11d, Supporting Information). Also, some subclusters are only present in a specific risk group, and others appear in both risk groups (Figure 6a–c; Figure S12a,b, Supporting Information). Our model identifies four subclusters (SC14, SC21, SC23, and SC24) that are exclusively associated with the MP high‐risk group (Figure 6a) and 10 subclusters (SC1, SC5, SC6, SC7, SC9, SC10, SC20, SC22, SC25, and SC26) specifically related to the MP low‐risk group (Figure 6c). The remaining 13 subclusters colocalize in both MP risk groups (Figure 6b), reflecting the consistent tumor morphological characteristics of early‐stage BC patients to whom the genomic MP test is applicable.

Characterization of MP‐specific and colocalized morphological phenotypes and their associated biological features. a–c) UMAP subplots of high‐specific (a), colocalized (b), and low‐specific (c) subclusters of embeddings of local tiles, derived from Leiden clustering of tile‐level representations. Subclusters are labeled with their inferred IDs (SC0‐SC26). The high‐specific subclusters (a) are exclusively associated with the MP high‐risk group, while the low‐specific subclusters (c) are specifically related to the MP low‐risk group. d–f) Depiction of representative H&E‐stained local tiles and corresponding nuclei segmentation results for selected subclusters demonstrating the phenotypic characteristics of MP risk groups. d) Subcluster 23 and 14 (high‐specific), e) Subcluster 18 and 2 (colocalized), and f) Subcluster 22 and 10 (low‐specific). Nuclei are classified as connective (blue), inflammatory (green), necrotic (yellow), neoplastic (red), and non‐neoplastic epithelial cells (orange). Scale bars: 50 µm. g) Heatmap depicting the cellular composition quantitation of the five cell types (connective, inflammatory, necrotic, neoplastic, and non‐neoplastic epithelial cells) across the identified 27 subclusters. Min–max normalization was applied to scale the quantitative features of the cellular composition. Hierarchical clustering was performed on 27 subclusters using the Ward variance minimization algorithm and the Euclidean distance metric. Hierarchical clustering reveals distinct cellular compositions among subclusters, with high‐specific and low‐specific subclusters forming separate clusters. h) KEGG pathway enrichment analysis of genes within the MP signature that showed a significant correlation with CPMP‐predicted risk scores, including nine significantly enriched pathways. i,j) Scatter plots showing the correlation between CPMP‐predicted risk score (x‐axis) and AUCell scores (y‐axis) for two key biological pathways: i) Cell cycle and j) DNA replication. The data points are colored by the corresponding genomic MP risk group (low‐risk in Spring green, high‐risk in Lilac). The Spearman correlation coefficient (R) and p‐value are reported for each pathway. The fitted linear regression line is also shown. A negative correlation is observed, suggesting that CPMP‐predicted “high‐risk” scores are associated with higher activity in these proliferation‐related pathways. UMAP = Uniform Manifold Approximation and Projection, MP = MammaPrint diagnostic test.

Representative local tiles from each subcluster, along with the corresponding nuclear identification results, were depicted to highlight the morphological and phenotypic characteristics linked to genomic MP risk groups (Figure 6d–f; Figures S12c,d,S13a, Supporting Information). The MP high‐specific subcluster phenotypes (Figure 6d; Figure S12c, Supporting Information) exhibit loosely distributed tumor cells interconnected with stromal cells, while the MP low‐specific subcluster phenotypes (Figure 6f; Figure S12d, Supporting Information) present a high percentage of aggregated tumor cells. These visually observed morphological patterns are also reflected in the colocalized subcluster phenotypes (Figure 6e; Figure S13a, Supporting Information). These results reinforce that the morphological differences among subcluster phenotypes mainly depend on the patterns of intercellular interactions and the aggregation abundance of tumor cells. We next quantified the cellular composition of the five cell types for each subcluster and performed hierarchical clustering on the 27 subclusters (Figure 6g; Figure S13b, Supporting Information). Four MP high‐specific subclusters are grouped together, while the MP low‐specific subclusters are clustered at the opposite end, forming four subgroups (SC26, SC25, and SC5; SC22, SC9, and SC20; SC6; and SC10, SC1, and SC7). The distinct cellular compositions between the high‐specific and low‐specific subcluster phenotypes underline the capacity of CPMP to disclose the morphological patterns related to molecular risk groups.

To further explore the potential link between morphological features in the tumor ecosystem and underlying molecular biology mechanisms, we performed a KEGG pathway enrichment analysis on the subset of MP genes that showed a correlation with the CPMP‐predicted risk score. Our analysis revealed that the most significantly enriched pathways (p‐value < 0.05) include DNA replication, cell cycle, purine metabolism, cellular senescence, mismatch repair, nucleotide excision repair, ether lipid metabolism, pyrimidine metabolism, and p53 signaling pathway (Figure 6h). These are consistent with the known biological functions^[^ 49 ^]^ (growth, proliferation, transformation, and cell death) of the 70‐gene MP profile. We then analyzed the correlation between CPMP‐predicted risk scores and pathway activity scores for each significantly enriched pathway (Figure 6i,j; Figure S14, Supporting Information). Significant negative correlations were observed for Cell cycle (R = −0.248, p‐value < 0.0001) and DNA replication (R = −0.188, p‐value = 0.0009), indicating that high‐risk patients are characterized by heightened cellular proliferation at the molecular level. This micro‐genomic proliferative activity presents co‐occurrence with loosely distributed tumor macro‐morphology at the whole‐slide level among high‐risk patients in our cohort.

CPMP Serves as a Potential Tool for Prognostic Risk Assessment

2.7

The predictive ability and generalizability of CPMP require validation in external cohorts. To this end, we collected and analyzed a new independent external cohort comprising 54 early‐stage HR+/HER2− BC patients (20 high‐risk, 34 low‐risk) from Tianjin Medical University Cancer Institute and Hospital, Binhai Hospital (TJMUCH‐BH), which was processed using the same scanning platform as the TJMUCH‐MP cohort. Without any model fine‐tuning, we applied the trained CPMP model to this independent cohort for external evaluation. The model achieved an AUROC of 0.772 and showed a positive correlation (Spearman R = 0.433, p = 0.001) between genomic MP risk values and CPMP‐predicted recurrence risk scores (Figure S15, Supporting Information). These results provide quantitative evidence of CPMP's robustness across clinical cohorts.

To further assess generalizability, we also evaluated CPMP in the Cancer Genome Atlas (TCGA)–BRCA cohort for recurrence risk and prognostic analysis (Figure S16a–c, Supporting Information). This analysis was motivated by the established influence of MP‐related gene activity on tumor progression and its association with patient outcomes. Whole‐slide‐level spatial heatmaps generated by CPMP in the TCGA‐BRCA cohort reveal its ability to capture tumor localization patterns that may be clinically relevant (Figure 7a,b; Figures S17,S18, Supporting Information). We further assessed the prognostic value of CPMP using clinical outcome endpoints. Disease‐free interval (DFI), progression‐free interval (PFI), and overall survival (OS) were derived from the TCGA Clinical Data Resource.^[^ 50 ^]^ We can observe a significant risk‐group stratification on DFI, PFI, and OS prognostic indicators (log‐rank test, p‐value < 0.05) from Kaplan–Meier survival curves (Figure 7c–e). Patients in the high‐risk group had a worse outcome than patients predicted as low‐risk by CPMP, exhibiting statistically significant findings in DFI (a hazard ratio (HR) of 1.91, 95% confidence interval (95% CI): 1.20–3.05, p‐value = 0.018), PFI (HR: 1.58, 95% CI: 1.11–2.26, p‐value = 0.022), and OS (HR: 1.54, 95% CI: 1.08–2.18, p‐value = 0.027). We performed a prognostic analysis of CPMP risk groups correlating with distant metastasis‐free survival (DMFS) in the TCGA‐BRCA dataset (Figure 7f). Patients in the high‐risk group experienced a higher probability of distant metastasis events than those classified as low‐risk (HR: 3.14, 95% CI: 1.83–5.37, p‐value = 0.0014). Especially, the distinction between patients in the high‐risk and low‐risk groups became more evident beyond the early survival times (≈5 years) as observed in the Kaplan–Meier (KM) survival curves for DMFS. This reveals the capability of CPMP in identifying late distant metastasis events. We investigated 16 patients suffering from distant metastasis beyond 5 years, of whom 14 (87.5%, a mean follow‐up time of ≈9.44 years) were identified as high‐risk by CPMP. We also evaluated the prognostic impact of the predicted risk groups on DMFS, DFI, PFI, and OS using multivariable Cox proportional hazards models (Figure 7g; Figure S19a–c, Supporting Information). The risk groups inferred by CPMP exhibited a significant prognostic impact on DMFS, with an HR of 2.76 (95% CI: 1.27–6, p‐value = 0.01, Figure 7g). It suggests that the CPMP‐derived risk group is an increased risk factor for BC distant metastasis in TCGA‐BRCA patients, demonstrating the prognostic capability of CPMP. It can be implied that CPMP effectively captures valuable prognostic features from tumor morphology in BC histopathological slides. This further highlights the potential of CPMP to be applied in clinical settings for prognostic risk assessment of BC patients.

Prognostic risk stratification performance of CPMP on the external TCGA‐BRCA data. a,b) Whole‐slide‐level spatial attention heatmap visualization results for tumor morphological patterns associated with prognostic events in the TCGA‐BRCA dataset. The H&E‐stained diagnostic WSIs of patients with (b, Scale bar: 5 mm) or without (a, Scale bar: 4 mm) distant metastasis event, the whole‐slide‐level overlaid spatial heatmap visualization derived from CPMP, and top 10 local tiles (Scale bar: 50 µm) with the highest attention score revealing the tumor morphological patterns of the slide are all shown. c–f) Kaplan–Meier curves for disease‐free Interval (c), Progression‐free Interval (d), Overall Survival (e), and Distant metastasis‐free Survival (f) of patient groups with high‐risk and low‐risk in the external TCGA‐BRCA dataset. Risk groups were directly predicted by CPMP trained in the TJMUCH‐MP cohort. HR and 95% CI were calculated by the Cox proportional hazard model, while the p‐value was calculated using the two‐sided log‐rank test. g) Forest plot of multivariate Cox regression analysis including the covariates age, AJCC pathologic tumor stage, PR status, ER status, HER2 status, LNp, and CPMP risk group, and their association with distant metastasis‐free survival. HR and 95% CI were calculated by the Cox proportional hazard model, and the p‐value was calculated using the two‐sided log‐rank test. ( indicates statistical significance, * p < 0.05, ** p < 0.01). WSI = Whole Slide Image, HR = Hazard Ratio, DMFS = Distant Metastasis‐Free Survival, CI = Confidence Intervals, AJCC = The American Joint Committee on Cancer, PR = Progesterone Receptor, ER = Estrogen Receptor, HER2 = Human Epidermal growth factor Receptor 2, LNp = The number of Lymph Nodes positive, HR+ = Hormone Receptor positive, LN0 = Lymph Nodes negative, LN1 = The number of Lymph Nodes positive≤ 3.*

Discussion

3

In this study, we presented the weakly supervised CPMP model for MP recurrence risk prediction from H&E‐stained histopathological WSIs of BC patients. The predictive capability of the CPMP model was shown by inferring both the continuous risk scores and the discrete recurrence risk groups in the real‐world TJMUCH‐MP clinical cohort. Compared to the existing weakly supervised MIL methods, our method achieved superior AUC‐efficiency performance and yielded a better alignment with genomic MP risk scores in the cohort.

To the best of our knowledge, this is the first work to explore spatially morphological tumor patterns associated with genomic MP for recurrence risk assessment and prognosis analysis in BC patients. Our model facilitated the characterization of the tumor pattern diversity in early‐stage BC patients for whom the genomic MP test is applicable. It enhanced the discovery of high‐specific, low‐specific, and colocalized morphological phenotypes among MP risk groups from routine histopathology slides. The intercellular interactions and quantized cellular composition patterns exhibited distinctions across diverse tumor phenotypes, which are related to genomic MP recurrence risk groups. Moreover, our study provided evidence that the CPMP model demonstrated superior generalization capabilities in identifying late distant metastasis events in the TCGA‐BRCA cohort. It also exhibited significant risk‐group stratification capabilities across multiple prognostic indicators. We propose that the prognostic value of CPMP arises from its ability to bridge the gap between macro‐morphology and micro‐genomic expression, by capturing the potential connections between spatial tumor morphology and underlying gene activities.

The genomic MP test provides significant clinical value and has been proven to be an effective diagnostic tool in current clinical applications. It is recommended for invasive BC patients who are HR+/HER2‐, staged T1‐2, and either lymph node‐negative or with 1–3 metastases.^[^ 29 ^]^ This test evaluates the 10‐year risk of distant metastasis and classifies patients into high or low recurrence risk groups. High‐risk patients are likely to benefit from adjuvant chemotherapy, while low‐risk patients with clinically high‐risk features can safely be exempted from chemotherapy. The genomic MP profiling test for recurrence risk diagnosis is performed on tumor specimens using targeted RNA next‐generation sequencing technology. However, it fails to provide the spatial and morphological characteristics of tumors. To make up for the limitations of traditional MP diagnostic tests in terms of spatial analysis, we implemented a WSI‐level attention heatmap visualization method in our model. This approach captures the spatial relationships and cellular interaction patterns within tumors at the whole‐slide level. This indicates that our spatially aware, weakly supervised learning CPMP model is capable of generating a whole‐slide‐level attention heatmap, facilitating the visualization and discovery of spatial morphological localization.

In addition, the time and financial costs of MP diagnostic tests are substantial, creating a barrier that prevents many patients from accessing precise therapeutic regimens. CPMP, an AI‐driven weakly supervised learning model, demonstrates remarkable capability in predicting distant metastasis risk for early‐stage BC patients using only routine histopathological slides. We experimentally validated that the CPMP model can infer a reliable risk score and a whole‐slide‐level spatial attention heatmap visualization within ≈5 min for a WSI of size 50 000 × 50 000 pixels (25 mm × 25 mm). This approach not only reduces the financial burden on patients but also enables rapid result delivery. Consequently, CPMP presents a viable complementary option for patients who cannot afford the genomic MP test and has the potential to serve as a cost‐efficient, AI‐powered system for pre‐screening BC patients in the clinical workflow (depicted in Figure S20, Supporting Information).

Our study brings favorable insights into the fields of computational pathology and biomarker discovery, but there are still certain limitations. For example, our weakly supervised CPMP model leverages general‐purpose foundation models^[^ 39, 40, 41, 42, 43, 44 ^]^ for feature extraction, which reduces computational costs. However, these foundation models were pretrained on large datasets whose data distribution may differ from ours due to variations in slides. As a result, one line of future work is to address these domain distribution challenges by fine‐tuning existing self‐supervised models.^[^ 51, 52 ^]^ While this approach has the potential to enhance predictive performance, the associated computational resources must be considered, and the trade‐off between performance gain and resource cost needs careful evaluation. Additionally, our model was developed using the TJMUCH‐MP cohort and is currently limited to predicting MP high and low‐risk groups for early‐stage BC patients. Recent studies^[^ 53, 54 ^]^ suggest that an ultra‐high (H2) subgroup exists within the high‐risk group, potentially benefiting from immunotherapy. However, the number of H2 patients in the TJMUCH‐MP cohort is limited (n = 11), which constrains the CPMP's ability to learn and identify this subgroup. We will expand the cohort through multi‐institutional collaborations to develop and validate a refined model capable of identifying this high‐impact patient population. Furthermore, although we have demonstrated the model's generalization capability in the external TCGA‐BRCA cohort, its robustness to domain shifts caused by variations in slide preparation, staining, and scanning protocols across different institutions and ethnic populations remains incompletely validated. For broader clinical application, multi‐center clinical trial studies involving diverse, multi‐ethnic cohorts will be essential to evaluate CPMP's real‐world impact on clinical outcomes such as treatment decisions and patient survival. To improve cross‐institutional robustness, future efforts could incorporate methodological advancements in domain adaptation. For instance, unsupervised domain adaptation^[^ 55 ^]^ can be leveraged to align the feature distributions of histopathological images between the source domain and new target domains, thereby mitigating domain shifts. Alternatively, self‐supervised fine‐tuning on a small set of unlabeled WSIs from a target institution may help the model adapt to local staining and scanning protocols. In parallel, future research should aim to further elucidate the intercellular morphological patterns associated with MP recurrence risk groups and investigate the causal links between tumor morphology and the MP genomic signature.

Altogether, our study revealed that AI‐driven weakly supervised computational pathology technology can effectively predict risk scores for prognostic distant metastasis hints using annotation‐free histopathological WSIs. This capability can further facilitate prognostic knowledge discovery, paving the way for advancements in precision treatment. These findings make it possible to achieve a complementary, computational, and cost‐efficient diagnostic tool for assessing recurrence risk for early‐stage BC patients from routine H&E‐stained slides, ultimately supporting better clinical decision‐making and patient management.

Experimental Section

4

Clinical Cohort Curation

The TJMUCH‐MP cohort examined female patients who were diagnosed with primary breast cancer and received treatment between January 2019 and December 2023 at TJMUCH, National Clinical Research Center for Cancer, China. Patients who underwent MP diagnostic tests were included in this retrospective analysis. Whole‐slide images were scanned using the Aperio Leica Biosystems GT450 v1.0.1 scanner at an apparent magnification of 40×. The genomic MP test was conducted by ZhenHe Genecast Biotechnology Ltd, the exclusive partner for the 70‐GS assay in China, as appointed by Agendia. All participants enrolled in this study met the StGallen guidelines for the MP test.^[^ 29 ^]^ Inclusion criteria comprised invasive BC that was HR+/HER2‐, staged T1‐2, and either lymph node‐negative or with 1–3 metastases. Additionally, tumor specimens were required for genomic analysis. Exclusion criteria included HR‐ or HER2+ tumors and patients receiving neoadjuvant therapy. Cases involving bilateral breast cancer and inconclusive tissue sections were also excluded. Clinical information and histopathological characteristics were extracted from medical records, while adverse events, such as recurrence and death, were tracked through follow‐up visits. FFPE tumor samples were collected from biopsies and excised surgical materials. Based on the genomic MP diagnostic test, continuous risk scores were yielded, ranging from −1 to 1. Patients were categorized into high‐risk and low‐risk groups using a cutoff value of 0, with scores ≤ 0 categorized as high‐risk and scores > 0 as low‐risk.^[^ 7 ^]^ For model training and prediction, the original MP risk scores were normalized to the range [0, 1] using the formula (x + 1)/2. After normalization, a threshold of 0.5 was applied for binary classification, where scores ≤ 0.5 were classified as high‐risk and scores > 0.5 as low‐risk. Ultimately, a HR+/HER2‐ invasive BC cohort was established, comprising digital WSIs, genomic MP results, and clinicopathological features of 477 female patients (Figures S1a–d,S2a,b, Table S1, Supporting Information). This cohort was utilized for subsequent model development and evaluation.

WSIs Preprocessing

Histopathological WSIs typically contained billions of pixels and possess high‐resolution, multi‐scale attributes. However, these images included areas of white background and non‐tissue regions that should be filtered out. To improve the computational efficiency of WSI analysis, data preprocessing was first performed on the H&E‐stained histopathological slides (Figure S2c, Supporting Information). Automated tissue segmentation techniques proposed by CLAM^[^ 35 ^]^ were utilized for effective tissue detection, resulting in segmented foreground contours. Following this, a non‐overlapping sliding window tiling strategy was employed to crop image tiles measuring 256 × 256 pixels from the identified tissue regions at a 20× magnification. The resulting stacks of image tiles, along with their spatial coordinates, were organized and stored using the hierarchical data format version 5 (HDF5).^[^ 35 ^]^ The distribution of the number of tiles obtained from each WSI varies widely, ranging from thousands to hundreds of thousands, as illustrated in Figure S2d (Supporting Information).

CPMP Model

This model employed the weakly supervised MIL framework for predicting recurrence risk from histopathological WSIs (Figure 1a). To clarify, the authors begin with outlining the classic MIL approach. In this framework, a dataset D={(Xi,Yi)}i=1N consists of a total of N bags Xi, each containing multiple instances Xi={tji}j=1ni. The primary objective of the MIL approach is to develop a function F(·) that can predict the bag‐level category Yi based on the multiple instances {tji}j=1ni within that bag. While the bag‐level labels {Yi} are known, the labels for the individual instances {tji} are unknown. The function is typically decomposed into three key components: 1) An instance‐level transformation function that transforms instances into feature vectors. 2) An aggregation function at the instance‐level that integrates the features of all instances to yield a global feature representation at the bag‐level. 3) A classifier head function that predicts the bag‐level results from this global feature representation. In alignment with the MIL paradigm, CPMP was composed of three components: a feature representation function T(·) operating at the tile‐level, a feature aggregation function S(·) that integrates features across tiles within a slide, and a regression‐based MLP predictor head R(·) at the slide‐level. These components were described in detail in the following section.

Feature Representation at the Tile‐Level

Feature representation, serving as the instance‐level transformation function in MIL, was executed on tiles cropped from WSIs. Recently, several self‐supervised models^[^ 39, 40, 41, 42, 43, 44 ^]^ pretrained on extensive datasets of histopathological WSIs across pan‐cancer pan‐tissue types have become publicly available, significantly facilitating advances in the field of computational pathology. These general‐purpose foundation models succeeded in capturing the diverse patterns in pathology images and held promise for integration into clinical diagnostical workflows, particularly in applications with limited data availability. Several models were explored, including UNI,^[^ 41 ^]^ Prov‐GigaPath,^[^ 39 ^]^ Virchow,^[^ 40 ^]^ H‐optimus‐0,^[^ 42 ^]^ Phikon,^[^ 44 ^]^ and CTransPath,^[^ 43 ^]^ for tile‐level feature extraction (Figure S2e, Supporting Information). The 256 × 256 tiles were resized to 224 × 224 pixels before inputting them into feature encoders for forward computation. This process allowed authors to represent each tile with low‐dimensional feature vectors: 1024D (UNI encoder), 1536D (Prov‐GigaPath), 2560D (Virchow), 1536D (H‐optimus‐0), and 768D (Phikon and CTransPath). Low‐dimensional feature representations at the tile level made it feasible to reduce computational complexity and aggregated diverse patterns from all tiles within a slide.

Feature Aggregation

Based on the feature embeddings of local tiles, the transformer^[^ 56 ^]^ architecture for feature aggregation was employed to derive a global feature representation at the slide‐level. The self‐attention module in the transformer was crucial, as it could capture long‐range dependencies across tiles. However, the global self‐attention mechanism incurred a high computational cost, particularly when the number of tiles in a single slide can reach up to 100 000. This excessive computation and redundancy limited the applicability of the self‐attention mechanism in high‐resolution scenarios. To this end, an agent‐attention mechanism^[^ 30 ^]^ was implemented to integrate the embedding sequences by introducing an agent token (Figure 1b,c). This modification transformed the traditional self‐attention (Q,K,V) into the agent‐attention paradigm (A,Q,K,V), balancing between the need for global representation capabilities and the requirement for computational efficiency.

Specifically, for the input tile features x={T(tj)}j=1n,x∈Rn×c, the general self‐attention module could be formulated as follows:

[eqn]

[eqn]

where n and c are the number of tile embeddings in one slide and the dimension of tile embedding, respectively. wQ,wK,wV∈Rc×d denote respective linear transformation layers for query, key, and value in the transformer block, where d is the output dimension of linear layers. Sim(·) represents the similarity function, commonly Sim(Q,K)=exp(QKTd) in the Softmax attention. The similarity function was abbreviated to σ(·) and the Softmax attention could be rephrased as follows:

[eqn]

Softmax attention computed a query‐key product, resulting in a computation complexity of O(n2). It was noteworthy that it did not consider multi‐heads here.

In the agent‐attention paradigm, an agent token A was added to the foregoing attention triplet (Q,K,V). This modification resulted in the agent‐attention paradigm of (A,Q,K,V), which was denoted as AAttn(A,Q,K,V). To be specific, the agent‐attention paradigm consisted of a stacked state of two Softmax attention operations SAttn(·). In the first phase, referred to as first‐aggregation, the agent token A was treated as a query Q in the first Softmax attention operation SAttn(A,K,V). Information was aggregated into the agent‐value Vagent from the value V, with the attention matrix computed based on agent A and key K. The second Softmax attention operation SAttn(Q,A,Vagent) was applied in the second phase, known as the second‐broadcast. Here, the agent‐value Vagent served as the output from the first Softmax attention. This process yielded the final output from the agent‐attention mechanism (Figure 1c).

[eqn]

[eqn]

The pairwise query‐key product computation was replaced by the agent‐key and query‐agent product, and the information flow was connected and preserved through the agent. Notably, the number of agent tokens k is a very small constant, which theoretically reduces the computational complexity of the agent‐attention to O(nd), where d ≪ n.

Using the agent attention mechanism described above, the agent‐attention transformer was employed for instance aggregation (Figure 1b). n tile embeddings were first transformed into a dimension of 256 through a linear projection layer, followed by the non‐linear activation function ReLU. To incorporate the spatial positional information of the tiles, a 2D absolute position encoding method was employed^[^ 56, 57 ^]^ to encode the tile coordinates. The resulting 256D positional embeddings were then added to the 256D tile embeddings. Next, a class token xcls∈R1×256 was initialized and concatenated with the spatially‐aware tile embeddings, yielding a fused representation {x,xcls}∈R(n+1)×256. An agent‐attention transformer block consisting of one layer with four heads for global information perception and aggregation was configured. The output dimensions of the latent projection layers for query, key, and value were set to 256, while the feed‐forward hidden linear projection layers produced an output dimension of 512.

A Regression‐Based MLP Predictor Head

Based on the agent‐attention transformer block, the class token *x_cls_

was utilized as the aggregated global representation at the slide‐level. This token was then passed into the MLP predictor head, which included a LayerNorm normalization followed by a linear layer. Given that the patient's MP risk was represented as a continuous score, binarizing these values might resulted in information loss. Recent studies^[^ 58, 59 ^]^ provided evidence that regression‐based approaches were superior to classification‐based methods for predicting continuous biomarkers. Therefore, a regression‐based strategy was adopted to output a continuous numerical value as the risk prediction result. The linear projection layer transformed the feature embedding of the class token *x_cls_
into a single‐neuron logit output, which was then passed through a Sigmoid activation function to yield a continuous risk value between 0 and 1. The mean square error loss was employed as the overall objective function L(·) to optimize CPMP.

[eqn]

[eqn]

where B is the batch size during training, *Y_i_

represents the slide‐level continuous MP risk score, and Y^i signifies the predicted continuous risk probability value for a patient. The parameters λ_1_ and λ_2_ are scale factors that balance the contributions of different loss components in the objective function. The indicator function 1_>0.5_(·) returns 1 when *Y_i_ * > 0.5, and 0 when *Y_i_ * ≤ 0.5. The optimization process is represented as argmin{θS}L(·), indicating that the optimization algorithm seeks to find the optimized parameters {θS} which minimizes the objective loss L(·). Here, {θS} refers to the set of optimized parameters in this model.

Spatial Attention Heatmap Visualization at the Whole‐Slide Level

The whole‐slide‐level attention heatmap visualization was conducted to explore spatial localization in tumors. It was implemented by mapping attention scores of tiles to their corresponding locations within WSIs. Specifically, data preprocessing was first performed on each WSI to detect tissue areas. The identified foreground tissue regions were tessellated into 256 × 256 tiles with an overlap rate of 0.1 at a 20× objective magnification. Subsequently, the self‐supervised foundation model UNI was leveraged for feature extraction from these tiles, resulting in 1024D feature vectors. The extracted tile embeddings were then passed through CPMP for inference, producing slide‐level prediction results alongside tile‐level attention scores that contributed to the final risk prediction. The attention scores at the tile‐level were calculated using the attention rollout method,^[^ 60 ^]^ based on the agent‐attention matrix *M_AAttn_ *(·) between tiles.

[eqn]

These attention scores were normalized to a range between 0 and 1, serving as attention weights. Finally, the normalized attention scores were arranged according to their corresponding spatial coordinates within the WSI, allowing for the visualization of an attention heatmap. This heatmap could be optionally overlaid on the original H&E‐stained WSI, providing insights into the spatial distribution of attention that aligned with tumor areas.

Cell Identification and Intercellular Graph Construction

To gain further insights into the cellular interactions within the tumor ecosystem of MP high and low risk groups, the high‐attention ROIs highlighted by the spatial attention heatmap at the whole‐slide level were first extracted. Subsequently, HoverNet was applied^[^ 45 ^]^ for simultaneous segmentation and classification of nuclei. The pretrained model weights trained on the public PanNuke dataset were utilized. The identified cells were classified into one of five cell categories: neoplastic (tumor), inflammatory, necrotic (dead), connective (stroma), and non‐neoplastic epithelial cells. The spatial coordinates of the centroids and contours of these cells were located in the ROIs (illustrated in Figure S10a, Supporting Information).

The density of tumor cells (TcD) was defined as the number of identified neoplastic cells per square millimeter. To further analyze the tumor ecosystem, the three main cellular components^[^ 46 ^]^ (tumor, inflammatory, and stromal cells) were selected to construct topological graphs among the detected cell nuclei. The sc‐MTOP framework^[^ 47 ^]^ was employed for profiling graph‐based features at the single‐cell level within the tumor microenvironment. The authors focused on the topological features of tumor cells interacting with other cells, specifically examined the connectivity graphs of tumor–tumor, tumor–inflammatory, and tumor–stroma interactions. Topological features related to graph‐edge length, especially MeanEdgeLength,^[^ 47 ^]^ were calculated to characterize the spatial intercellular relationships among different cell types within tumors. This metric represented the average edge lengths of each tumor cell interacting with other tumor, stromal, and inflammatory cells (illustrated in Figure S10b, Supporting Information).

Characterization of Phenotypic Diversity

To investigate phenotypes potentially associated with MP recurrence risk groups, the model demonstrating optimal performance from the experiments based on a patient‐level 5‐time fivefold cross‐validation scheme was first chosen. The histopathological WSIs of the testing cases were then tessellated into 256 × 256 tiles, which were characterized using the UNI foundation model for feature representation. These tiles were subsequently fed into the CPMP model for inference, yielding WSI‐level risk scores. In addition, the top 100 tiles with the highest attention scores from the slide, which were deemed representative of local regions within the tumor were identified and saved. Each instance in this tile collection was also processed through CPMP to infer tile‐level risk scores. The intermediate vector (cls_token) between the transformer block and the MLP classifier was detached, resulting in a 256D embedding vector for each tile instance.

The Scanpy toolkit was employed for further analysis of the embedding data. All tile embeddings were combined into an AnnData object, which contained a data matrix of the tile embeddings along with the additional metadata. The metadata included the spatial coordinates of the tiles within WSIs, genomic MP risk results corresponding to the slide from which the tile originated, WSI‐level risk scores, and tile‐level risk scores from the CPMP model. The nearest neighbors distance matrix and a neighborhood graph of tile embeddings were computed using the scanpy.pp.neighbors() function, employing cosine similarity as the distance measurement metric. The minimum size for local neighborhoods was set to 50 to balance between local information and global views. Based on the constructed neighborhood graph, Leiden clustering implemented by the scanpy.tl.leiden() function was performed for subcluster community detection on tile embeddings. Finally, UMAP dimensionality reduction was applied to visualize the tile embeddings and their subclusters in 2D space via the scanpy.tl.umap() function. The low‐dimensional embedding was initialized using coarse‐grained connectivity structures generated by the scanpy.tl.paga() function, which employed the partition‐based graph abstraction (PAGA) algorithm.

Based on the Leiden clustering on the tile embeddings, subclusters representing local morphological patterns were identified. The following analytic steps were undertaken for these subclusters. First, the respective submaps were extracted for low‐risk and high‐risk groups from the global subclusters, based on the MP risk group of the WSI from which the tile originated. The authors only preserved those subclusters whose number of tiles exceeds 25% of that in the corresponding global subcluster in a specific risk group. Next, the intersection of subcluster ID sets from both the high‐ and low‐risk groups was determined to identify subclusters that present in both MP risk categories, termed as colocalized subclusters. Subcluster IDs exclusive to the high‐risk group were identified as high‐specific subclusters, while those unique to the low‐risk group were designated as low‐specific subclusters. Subsequently, HoverNet was applied to all tiles within the high‐specific, low‐specific, and colocalized subclusters for nuclei segmentation and classification. This process identified five cell types (neoplastic, inflammatory, necrotic, connective, and non‐neoplastic epithelial cells), as well as their corresponding spatial coordinates within the tiles. Finally, the cellular composition for each subcluster was quantified by calculating the average number of cells for each cell type across all tiles within the subcluster. Then min–max normalization was applied to standardize these quantitative features of the cellular composition and conducted hierarchical clustering on all subclusters using the Ward variance minimization algorithm and the Euclidean distance metric.

Prognostic Analysis on TCGA‐BRCA

The CPMP model was applied to the external TCGA‐BRCA cohort for recurrence risk assessment and prognostic analysis. The TCGA‐BRCA cohort included 1056 patients with digital diagnostic slides as well as clinical follow‐up information (Figure S16a, Supporting Information). All digital slides (n = 1133) in the TCGA‐BRCA cohort were first preprocessed and tessellated into local tiles. Tile‐level feature representations were generated using UNI, and they were fed into the trained CPMP model for slide‐level distant metastasis risk score inference. The prediction results of multiple slides from the same patient were averaged into the patient‐level risk values. The continuous prediction risk scores were categorized into high‐risk and low‐risk groups based on a cutoff value determined by the trained model. Spatial attention heatmap inference at the whole‐slide level was also performed to visualize how regions with distinct tumor morphological patterns contribute to patient clinical outcomes. The prognostic value of the CPMP model within the TCGA‐BRCA cohort was assessed using the Kaplan–Meier survival curves and the multivariable Cox proportional hazards model. The clinical outcome endpoints including disease‐free interval, progression‐free interval, and overall survival were derived from the tcga clinical data resource.^[^ 50 ^]^

Comparative Methods

The CPMP model was compared with existing state‐of‐the‐art weakly supervised MIL methods in WSIs classification. Brief introductions of the comparative methods were summarized as follows:

ABMIL,^[^ 34 ^]^ an attention‐based MIL method, currently a MIL baseline for WSI classification in the field of CPATH.
CLAM,^[^ 35 ^]^ an instance‐level clustering‐constrained‐attention MIL algorithm for pathological subtyping. It relied on instance‐level clustering to refine the feature space in MIL.
DTFD‐MIL,^[^ 33 ^]^ a double‐tier MIL framework that introduced pseudo‐bags to enhance computational efficiency.
SAM‐MIL,^[^ 32 ^]^ a MIL framework that leveraged the segment anything model (SAM) to extract and integrate spatial context from WSIs into model training.
CEMIL,^[^ 31 ^]^ an instructor‐learner framework utilizing knowledge distillation for resource‐efficient, low‐computational‐cost learning.
The full Transformer‐based architecture model applied for end‐to‐end biomarker prediction from WSIs^[^ 36 ^]^ (denoted as Wagner et al.). The architecture was halved to a smaller scale with 4 attention heads due to data limitations in the experiments.
TransMIL,^[^ 37 ^]^ a correlated Transformer‐based MIL framework for pathology diagnosis on WSIs. It designed a PPEG for position encoding to capture both the local contextual information and the correlation between regions.

To verify the effectiveness of CPMP in using annotation‐free WSIs, the experiments were conducted on noisy tissue filtering, tile sampling, and clinical‐features‐only logistic regression. In the experiment of tissue filtering, it was aimed to investigate CPMP's sensitivity to noisy tiles. Zero‐shot classification was performed to determine tissue types for all tiles using the visual‐language foundation model PLIP^[^ 38 ^]^ (illustrated in Figure S5a, Supporting Information). Tiles were identified as one of eight tissue types: tumor, adipose, stroma, immune infiltrates, gland, necrosis or hemorrhage, background or black, and non (Figure S5b, Supporting Information). Tiles classified as “background or black” and “non‐tissue” types were filtered out. In contrast, tiles identified as tumor, adipose, stroma, immune infiltrates, gland, and necrosis or hemorrhage tissue types were preserved and utilized for training the other model. This model with noisy tile filtering was compared with the CPMP model. To evaluate the scale flexibility of CPMP on the number of local tiles in WSIs, random sampling from the set of local tiles in each WSI was conducted, employing sample sizes of 1000, 5000, and 10 000 tiles as subsets for model training. The trained models were further evaluated on multiple subsets of tiles with variant sizes.

A clinical‐based logistic regression model designed to predict patients' MP recurrence risk from clinicopathological features was used for comparison. This clinical‐based model was constructed using a Logistic Regression classifier, incorporating a set of eight clinical variables: “Surgery Option”, “T”, “N”, “AJCC”, “Histological grading”, “TILs”, “PR”, and “Ki67”. The clinical‐based model underwent training, validation, and independent testing on the same patient cases, utilizing a patient‐level 5‐time fivefold cross‐validation strategy, consistent with the approach used in the CPMP model. Detailed hyperparameter settings for the Logistic Regression classifier are provided in the Supporting Information. Among the eight clinical variables, “T”, “N”, “AJCC”, and “Histological grading” are categorical variables, while “TILs”, “PR”, and “Ki67” are continuous variables representing intensity percentages or positive proportions, with values ranging from 0 to 100. The “Surgery Option” variable was categorized by assigning numerical values to different surgical options, with the specific categorization rules outlined in Table S5 (Supporting Information).

Ablation studies were also performed to analyze modules in the spatially‐aware, regression‐based weakly supervised transformer architecture. The design outline and the experimental configurations utilized for the ablation models are illustrated in Figure S6a (Supporting Information). Specifically, the transformer‐based aggregation function was first analyzed with self‐attention or agent‐attention mechanisms. Regression‐based mean square error (MSE) loss strategy was compared with a classification‐based cross‐entropy (CE) loss. Positional coordinate information embedding (PE) was introduced into tile embeddings for spatial awareness in the CPMP model. PE, No‐PE, and PPEG‐based experimental configurations were thus adopted for comparison. In the ablation study of tile‐level feature representation, six self‐supervised general‐purpose foundation models, UNI,^[^ 41 ^]^ Prov‐GigaPath,^[^ 39 ^]^ Virchow,^[^ 40 ^]^ H‐optimus‐0,^[^ 42 ^]^ Phikon,^[^ 44 ^]^ and CTransPath^[^ 43 ^]^ were explored, for tile‐level feature extraction and experimental comparison.

Implementation Details

CPMP was implemented using Python 3.8 and the PyTorch 2.2 deep learning framework. All models, including comparative methods, were conducted on one workstation equipped with two 24GB NVIDIA GPUs (GeForce RTX 4090). The CUDA version was 12.2 and the GPU driver version was 535.171. The Adam algorithm was employed for parameter optimization. The momentum factor was 0.9, and the learning rate was initially set to 0.0001. The LinearLR scheduler aimed at enabling the learning rate decay was operated and was set to a limit of 0.00001 when the epoch count reached 500. The maximum number of epochs was set to 1000, and an early stopping strategy was utilized. The early stop was determined when the loss value of the validation set no longer decreased for 50 consecutive epochs. A gradient accumulation strategy was also adopted to address the issue of an inconsistent number of patches across WSIs. The gradient accumulation size was set to 32, which is equivalent to using a minibatch size of 32. The weights of all layers were initialized with the Kaiming uniform strategy. The optimized model with the lowest loss metric value on the validation set was recorded and utilized for evaluation on the test set. WSIs were processed by OpenSlide Toolkit v3.4.1.

For fair evaluation in case of data bias, a patient‐level 5‐time fivefold cross‐validation scheme (Figure S3a, Supporting Information) was employed in the TJMUCH‐MP cohort. In the experiment of each time, 20% of the samples were randomly divided for testing, while the remaining samples were evenly divided into 5 non‐overlapping subsets (folds). Slides from each patient were only assigned to one of these sets to prevent data leakage. Then the model on 4 subsets (fourfolds) was iteratively trained while using the remaining subset for validation. This process was repeated for each fold, ensuring that each fold served as the validation subset exactly once. The trained models corresponding to the fivefolds were evaluated on the independent testing cases, and the results were averaged. This experiment was repeated for 5 random times and the average results over the 5‐time experiments are reported.

Statistical Analysis

The original genomic MP risk scores (ranging from −1 to 1) were normalized to the range [0, 1] using the formula (x + 1)/2 for model training and prediction. This transformation was necessary to ensure compatibility. No other data transformation or normalization was applied. Data and results were presented as mean ± standard deviation (SD) where applicable, or as median and range for non‐normally distributed data (e.g., tumor cell density). The sample size for the performance evaluation was n = 477 (TJMUCH‐MP cohort). For survival analysis in the TCGA‐BRCA cohort, the sample size was n = 1056 patients. The nonparametric Spearman rank‐order correlation coefficient (R) was used to measure the relationship between two continuous variables. Spearman's R correlation coefficient varied between −1 and +1 with 0 indicating no correlation. Correlations of +1 or −1 implied an exact positive or negative relationship. Continuous variables were compared between independent groups using the nonparametric Mann–Whitney U rank test. Kaplan–Meier survival curves were generated to visualize survival differences between the high‐risk and low‐risk groups. The statistical significance of the survival differences was assessed using the log‐rank test. Multivariable Cox proportional hazards models were used to evaluate the prognostic impact of the CPMP‐derived risk group while adjusting for covariates (age, AJCC pathologic tumor stage, PR status, ER status, HER2 status, LNp). The proportional hazards assumption was checked and found to be valid for the models. Hazard ratios (HR) and 95% confidence intervals (CI) were reported. Multiple metrics, including the ROC analysis with AUC value, the PRC with the AUPRC value, the balanced accuracy, as well as the averaged Precision, Recall, and F1 score were employed to evaluate the performance of the approach and comparative methods on MP risk groups. The performance of CPMP was compared to that of comparative models and the clinical‐based logistic regression model using the Wilcoxon signed‐rank test on the AUROC metrics obtained from the 5‐time fivefold cross‐validation. All tests were two‐sided, and a p‐value < 0.05 was considered statistically significant. All statistical analyses were conducted using Python 3.8 and R 4.2.2.

Ethics Approval

This retrospective study was approved by the Ethics Committee of Tianjin Medical University Cancer Institute and Hospital (bc20240004). Informed consent was obtained from all participants in the cohort for all data acquisition. For TCGA, no formal ethics approval was required for a retrospective study of anonymous samples.

Conflict of Interest

The authors declare no conflict of interest.

Author Contributions

C.Y., L.L., and X.Q. are joint first authors and contributed equally to this work. H.L. curated the clinical cohort and designed the clinical investigation. J.L. conceived and designed the study. C.Y. developed the approach and conducted the experiments. L.L. and X.Q. assisted with clinical analysis of tumor spatial location experiment results. L.L., X.Q., Y.O., and Z.H. collected the histopathological slides and preprocessed the clinical data. Z.R., and W.X. assisted with image preprocessing and ablation experiments. C.Y. and J.L. prepared the manuscript with input from all authors. All authors read and approved the final manuscript.

Supporting information

Supporting Information

Bibliography60

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1F. Bray , M. Laversanne , H. Sung , J. Ferlay , R. L. Siegel , I. Soerjomataram , A. Jemal , Cancer J. Clin. 2024, 74, 229.10.3322/caac.2183438572751 · doi ↗ · pubmed ↗
2R. N. Pedersen , B. Ö. Esen , L. Mellemkjær , P. Christiansen , B. Ejlertsen , T. L. Lash , M. Nørgaard , D. Cronin‐Fenton , JNCI: J. Natl. Cancer Inst. 2022, 114, 391.34747484 10.1093/jnci/djab 202PMC 8902439 · doi ↗ · pubmed ↗
3M. Y. Kim , in Translational Research in Breast Cancer, (Eds: D.‐Y. Noh , W. Han , M. Toi ), Springer, Singapore, 2021, pp. 183–204.
4A. E. Giuliano , J. L. Connolly , S. B. Edge , E. A. Mittendorf , H. S. Rugo , L. J. Solin , D. L. Weaver , D. J. Winchester , G. N. Hortobagyi , Ca‐Cancer J. Clin. 2017, 67, 290.28294295 10.3322/caac.21393 · doi ↗ · pubmed ↗
5L. J. van 't Veer , H. Dai , M. J. van de Vijver , Y. D. He , A. A. M. Hart , M. Mao , H. L. Peterse , K. van der Kooy , M. J. Marton , A. T. Witteveen , G. J. Schreiber , R. M. Kerkhoven , C. Roberts , P. S. Linsley , R. Bernards , S. H. Friend , Nature 2002, 415, 530.11823860 10.1038/415530 a · doi ↗ · pubmed ↗
6A. M. Glas , A. Floore , L. J. Delahaye , A. T. Witteveen , R. C. Pover , N. Bakx , J. S. Lahti‐Domenici , T. J. Bruinsma , M. O. Warmoes , R. Bernards , L. F. Wessels , L. J. Van 't Veer , BMC Genomics 2006, 7, 278.17074082 10.1186/1471-2164-7-278PMC 1636049 · doi ↗ · pubmed ↗
7F. Cardoso , L. J. van't Veer , J. Bogaerts , L. Slaets , G. Viale , S. Delaloge , J.‐Y. Pierga , E. Brain , S. Causeret , M. Delorenzi , A. M. Glas , V. Golfinopoulos , T. Goulioti , S. Knox , E. Matos , B. Meulemans , P. A. Neijenhuis , U. Nitz , R. Passalacqua , P. Ravdin , I. T. Rubio , M. Saghatchian , T. J. Smilde , C. Sotiriou , L. Stork , C. Straehle , G. Thomas , A. M. Thompson , J. M. Van Der Hoeven , P. Vuylsteke , et al., N. Engl. J. Med. 2016, 375, 717.2 · doi ↗ · pubmed ↗
8M. Piccart , L. J. van 't Veer , C. Poncet , J. M. N. Lopes Cardozo , S. Delaloge , J.‐Y. Pierga , P. Vuylsteke , E. Brain , S. Vrijaldenhoven , P. A. Neijenhuis , S. Causeret , T. J. Smilde , G. Viale , A. M. Glas , M. Delorenzi , C. Sotiriou , I. T. Rubio , S. Kümmel , G. Zoppoli , A. M. Thompson , E. Matos , K. Zaman , F. Hilbers , D. Fumagalli , P. Ravdin , S. Knox , K. Tryfonidis , A. Peric , B. Meulemans , J. Bogaerts , et al., Lancet Oncol. 2021, 22, 476.33721 · doi ↗ · pubmed ↗