MMHC-OCPR: Prediction of Platinum Response and Recurrence Risk in Ovarian Cancer with Multimodal Deep Learning

Enyu Tang; Haoming Xia; Zhenlong Yuan; Yuting Zhao; Shengnan Wang; Zhenbang Ye; Shangshu Gao; Ziqi Zhou; Yuxi Zhao; Jia Zeng; Nenan Lyu; Jing Zuo; Ning Li; Jianming Ying; Lingying Wu

PMC · DOI:10.3390/biomedicines14020348·February 2, 2026

MMHC-OCPR: Prediction of Platinum Response and Recurrence Risk in Ovarian Cancer with Multimodal Deep Learning

Enyu Tang, Haoming Xia, Zhenlong Yuan, Yuting Zhao, Shengnan Wang, Zhenbang Ye, Shangshu Gao, Ziqi Zhou, Yuxi Zhao, Jia Zeng, Nenan Lyu, Jing Zuo, Ning Li, Jianming Ying, Lingying Wu

PDF

Open Access

TL;DR

This paper introduces a deep learning model that predicts platinum response and recurrence risk in ovarian cancer patients using pathology images and clinical data, aiming to improve personalized treatment.

Contribution

A novel multimodal deep learning model (MMHC-OCPR) is developed for predicting platinum resistance and recurrence risk in ovarian cancer.

Findings

01

The model achieved an AUC of 0.914 for platinum response prediction when integrating metastatic images and clinical data.

02

Recurrence risk prediction reached a C-index of 0.838 with multimodal input.

03

Patients were stratified into three risk groups with distinct 2-year progression-free survival rates.

Abstract

Background/Objectives: Ovarian cancer has the highest mortality among gynecological malignancies, with platinum resistance significantly contributing to poor prognosis. We aimed to develop a multimodal model (MMHC-OCPR) to predict platinum response and recurrence risk, enabling earlier personalized treatment and improved outcomes. Methods: This multicenter retrospective study included a combined cohort of 431 patients, comprising 1182 whole slide images (WSIs) curated from two independent datasets. The primary cohort consisted of 376 patients from the National Cancer Center (China), which was further partitioned into training, validation and internal test sets to ensure model development and evaluation. An additional external test cohort was incorporated using publicly available data from TCGA, enhancing the generalizability of our findings. We implemented a weakly supervised multiple…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes2

MUC16 TBCE

Proteins2

Species1

Homo sapiens(human · species)

Chemicals4

Platinum cAMP H&E bevacizumab

Diseases17

ovarian cancer gynecologic malignancy Cancer psammoma calcifications IDS primary toxicity injury to inflammatory VTT metastases III death WSI AI PDS CRS

Figures6

Click any figure to enlarge with its caption.

Funding1

—CAMS Innovation Fund for Medical Sciences (CIFMS)

Keywords

ovarian cancerMMHC-OCPRclustering-constrained attention multiple instance learningUNI2-hplatinum responserecurrence risk

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOvarian cancer diagnosis and treatment · AI in cancer detection · Radiomics and Machine Learning in Medical Imaging

Full text

1. Introduction

Ovarian cancer remains the most lethal gynecologic malignancy [1]. Global statistics from 2020 reported 313,959 new cases of ovarian, fallopian tube, and primary peritoneal cancers, with 207,252 attributable deaths [2]. Approximately 70% of ovarian cancers are diagnosed at advanced stages. The conventional first-line treatment for advanced disease consists of primary or interval debulking surgery (PDS/IDS) followed by platinum-based combination chemotherapy, but median overall survival and progression-free survival (PFS) remain limited at approximately 30 and 12 months, respectively [3]. The incorporation of targeted agents such as bevacizumab and poly ADP-ribose polymerase inhibitors (PARPi) into first-line chemotherapy or maintenance therapy has significantly improved survival outcomes [4,5,6]. Emerging therapies including antibody-drug conjugates and immune checkpoint inhibitors show promise for platinum-resistant cases, defined as recurrence within 6 months post-chemotherapy [7,8,9].

Current first-line therapies face several clinical challenges: (1) Significant prognostic heterogeneity after platinum chemotherapy, particularly between platinum-sensitive and -resistant subgroups, coupled with a lack of reliable predictive biomarkers; (2) Potential selection bias in maintenance therapy due to the inclusion of platinum-sensitive patients who might benefit from chemotherapy alone [3]; (3) Substantial physiological and psychological burdens associated with maintenance agents like PARPi and bevacizumab, resulting from prolonged use and notable adverse effects [10,11]. These challenges highlight the urgent need for early predictive models to optimize personalized treatment and reduce toxicity.

Current biomarkers for platinum response, including BRCA1/2 mutations, HRD status, CA125, and imaging, demonstrate suboptimal predictive performance [12]. Histopathology encodes comprehensive disease information for diagnosis, classification, and prognostication. Although the chemotherapy response score (CRS) system enables pathologists to assess platinum response through residual tumor cell evaluation and fibroinflammatory changes [13], its clinical utility is limited by interobserver variability. Artificial intelligence (AI)-enhanced computational pathology improves the detection of subcellular and spatial features, using deep learning to identify prognostic patterns beyond human perception [14,15].

Recent advances in AI have extended its applications in pathology from diagnostic and molecular prediction to sophisticated outcome forecasting [16,17,18]. AI systems demonstrate growing capabilities in prognostic risk stratification and therapy response prediction through multimodal data integration [19,20]. Current models excel not only in baseline survival estimation but also in forecasting responses to chemotherapy, immunotherapy, and other novel treatments [21,22]. Tumors resulting from ovarian cancer metastasis to other sites may harbor a greater number of pathological features associated with invasiveness compared to the primary ovarian tumors, and these characteristics are closely linked to drug sensitivity and survival prognosis [23]. This study pioneers the integration of whole-slide image features from primary/metastatic lesions and clinicopathological variables using a weakly-supervised multiple instance learning framework (clustering-constrained attention multiple instance learning, CLAM) [24,25], building a multimodal model that predicts platinum response and recurrence risk in advanced ovarian cancer. As a clinical decision-support tool, it optimizes first-line therapy selection by enabling the early prediction of platinum resistance for timely treatment adjustment and stratifying recurrence risk to guide maintenance therapy—thus avoiding overtreatment, particularly in patients with excellent outcomes after surgery and platinum chemotherapy alone.

2. Materials and Methods

2.1. Patient Cohort

We retrospectively enrolled patients with advanced-stage (III–IV) high-grade serous ovarian carcinoma (HGSOC) treated at the National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences between January 2010 and December 2015 (NCC dataset, Figure 1). Inclusion and exclusion criteria are described in Supplementary Note S1. This study was registered with the Chinese Clinical Trial Registry (date of the retrospective registration: 17 December 2025; Registration ID: ChiCTR2500114794; https://www.chictr.org.cn/bin/userProject (accessed on 30 September 2025)).

After rigorous screening, the NCC dataset, comprising 376 patients and containing a total of 1127 WSIs of H&E-stained tissue sections, was selected for model development (Supplementary Figure S1). This included 750 WSIs of primary ovarian cancer (with 374/376 patients contributing 2 WSIs each and 2/376 patients contributing 1 WSI each) and 377 WSIs of metastatic tumors (with 375/376 patients contributing 1 WSI each and 1/376 patients contributing 2 WSIs). All metastatic tumors originated from the peritoneum. All slides were derived from FFPE tissue blocks and were digitized using a digital pathology scanner (NanoZoomer, Hamamatsu Photonics K.K., Hamamatsu City, Japan). Patients from the NCC dataset were randomly allocated into training, validation, and internal test cohorts in a ratio of 0.65:0.2:0.15. To evaluate the model’s generalizability on external data, the TCGA-OV dataset from the TCGA database (https://portal.gdc.cancer.gov/ (accessed on 9 February 2025)) was employed as an external test cohort for external validation. This dataset consists of 55 FFPE-derived WSIs of primary ovarian cancer from 55 patients. The TCGA dataset provides only primary WSIs; therefore, external validation of multimodal models was limited to single-modality components, while models integrating metastatic WSIs and clinical variables were evaluated only through internal validation.

Patients in the NCC dataset received platinum-based adjuvant therapy (specific surgical and detailed chemotherapeutic regimens are provided in Supplementary Note S2). They underwent clinical and/or radiological examinations periodically for efficacy and recurrence assessment. If the relapse occurred ≥6 months after completing prior platinum-based chemotherapy, the disease was defined as platinum-sensitive, and otherwise as platinum-resistant. Disease recurrence was defined as the objective evidence of confirmed disease progression based on radiographic, histological, or biomarker criteria. PFS was defined as the time from completion of prior platinum-based chemotherapy to the first documented progression or death from any cause. The follow-up cutoff date for all patients was 24 May 2024. The primary endpoints of this study were the platinum response (platinum-sensitive/resistant) and PFS.

2.2. Construction of MMHC-OCPR

2.2.1. Preprocessing of Pathological Images

The images from the National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and those from TCGA-OV were acquired using different scanning protocols. To mitigate the discrepancies introduced by varying scanning devices and protocols, we applied a standardized preprocessing pipeline: First, whole-slide images (WSIs) were converted to RGB, and a uniform size was enforced: image patches were resized to the target dimensions (224 × 224 pixels) using bicubic interpolation to ensure consistent input size. Subsequently, normalization was performed: pixel values were scaled from [0, 255] to [0, 1] and then standardized using the mean and standard deviation corresponding to the pretrained model. This normalization strategy maps images from different sources into a unified feature space, effectively reducing the impact of device and protocol variations on model performance and enhancing the model’s generalizability across multi-institutional data. The preprocessed images were then fed into a pretrained feature encoder (such as UNI or CONCH) to extract deep features, which were subsequently used within the multiple-instance learning framework for further analysis.

2.2.2. Tile Extraction Parameters

During the tile extraction stage, a standardized pipeline was employed to ensure consistency and efficiency. All whole-slide images were processed at the highest resolution level, corresponding to the native scanning resolution under a 40× objective (approximately 0.25 μm/pixel). The multi-resolution pyramid structure of each WSI was automatically identified using the OpenSlide library, from which image tiles were extracted. The tile size was uniformly set to 256 × 256 pixels, corresponding to a tissue area of approximately 6.4 μm × 6.4 μm, with a default stride of 256 pixels (no overlap) to balance computational efficiency and feature coverage. Tissue masks were generated via an automated segmentation pipeline: images were first converted to HSV color space, and the saturation channel was extracted. Median filtering (kernel size mthresh = 7) was applied for noise reduction, followed by binarization using a fixed threshold (sthresh = 8) or Otsu’s adaptive thresholding. Morphological closing (close = 4) was then performed to connect fragmented regions. Artifact removal was implemented through multiple steps: white-area filtering excluded background regions based on a saturation threshold; black-area filtering removed over-stained regions based on an RGB mean threshold; and a four_pt strategy was used to inspect tissue mask coverage around the tile center, ensuring that tiles primarily resided within valid tissue regions.

2.2.3. Stain Normalization and Color Enhancement Strategy

To enhance the model’s cross-institutional generalizability, we adopted a stain normalization strategy based on a pretrained encoder rather than traditional H&E-specific methods. This normalization approach maps images from different sources into a unified feature space. The advantage of this method lies in the robust feature representations learned by the pretrained model from diverse natural images, enabling it to adapt to color variations caused by different staining protocols and thereby reducing interference from inter-institutional differences. All slides were converted to RGB format and normalized as described above prior to training, ensuring consistency in feature space. Regarding color enhancement, no additional augmentation techniques (such as color jitter or contrast adjustment) were introduced during training in order to preserve the original biological information of tissue morphology and avoid introducing artificial artifacts that could interfere with model learning.

2.2.4. Definition of Tumor Regions

Tumor regions were defined entirely through an automated pipeline, without requiring pathologists to manually delineate regions of interest (ROIs). We performed fully automatic tissue segmentation on each WSI to identify all tissue-containing regions, from which image tiles were extracted. This approach ensures that the model can leverage morphological information from the entire slide and is suitable for whole-slide analysis tasks. Automated segmentation not only improves efficiency but also avoids subjective bias inherent in manual annotation, making it highly compatible with weakly supervised learning frameworks.

2.2.5. Handling of Multiple WSIs per Patient During Training

During training, although each patient may contribute multiple WSIs (e.g., from primary and metastatic sites), we addressed the issue of repeated measures using a slide-level strategy. Specifically, the dataset was constructed with WSIs as the unit rather than patients: for each individual WSI, the attention mechanism of the MMHC-OCPR model aggregated features from all of its internal tiles to generate a slide-level prediction. During the evaluation phase, patient-level information was used only for subsequent analysis (e.g., aggregating predictions from multiple WSIs via mean or median to derive a patient-level outcome). However, the training process itself did not employ patient-level aggregation, thereby avoiding the risk of data leakage. This design ensures that the model learns content-relevant features rather than being biased by patient frequency.

2.2.6. Workflow of Model Construction

The construction of the MMHC-OCPR model comprises two stages. Stage 1 involves the selection of the optimal WSI encoder, and Stage 2 entails the performance evaluation of the model using multimodal and metastatic site data as input. Firstly, we individually evaluated various open-source, pre-trained histopathology image encoders (including CONCH [26], CTransPath [27], GigaPath [28], Phikon-v2 [29], ResNet50, UNI [30], and UNI2-h [31]) using primary ovarian cancer WSIs. This evaluation assessed their feature extraction and predictive capabilities for the platinum response classification and recurrence prediction tasks, aiming to identify the optimal histopathology image encoder for the subsequent model development.

Subsequently, we innovatively constructed the multimodal model, integrating histopathology images with clinical baseline data. We first utilized primary ovarian cancer WSIs to evaluate the performance advantage conferred by the multimodal architecture for both prediction tasks. Considering that each patient had paired primary and metastatic samples, we further integrated primary ovarian cancer WSIs with metastatic WSIs to assess the potential performance gain from incorporating metastatic WSIs. This process aimed to determine the optimal predictive input configuration for the MMHC-OCPR model.

2.2.7. MMHC-OCPR Architecture

Figure 2 presents an overview of the MMHC-OCPR model. This model extends the CLAM architecture [31] by innovatively integrating dual-modality data comprising histopathological images and clinical baseline information to enhance predictive performance for both platinum response classification and recurrence prediction tasks. Furthermore, the model incorporates dedicated attention networks for distinct surgery regimens (IDS/PDS). Each network employs a gated attention mechanism to discern the relative importance of individual image patches within the overall prediction task, generating representative image-level feature vectors via weighted aggregation. Subsequently, the model employs a gated fusion mechanism to jointly fuse WSI features and clinical features. This mechanism dynamically computes fusion weights based on the characteristics of both modalities, enabling adaptive balancing of the contributions from histopathological imaging and clinical data according to individual patient profiles and specific task requirements.

The integrated feature vector is fed into the output layer for final prediction. For the classification task, a dual-output structure is utilized to discriminate between different classes, optimized with a multi-objective loss function that combines bag-level cross-entropy loss and instance-level clustering loss to balance global and local feature learning. For the recurrence risk prediction task, a single-output structure generates continuous risk scores, with model optimization based on the Cox proportional hazards loss function. This framework not only fully leverages the spatial semantic information from WSIs and the clinical relevance of variables but also establishes an interpretable interaction mechanism between multimodal data. It provides a deep learning solution for personalized prognostic assessment and treatment decision-making in ovarian cancer.

We propose a multimodal attention-based deep learning framework that integrates WSI features with clinical data for predicting treatment response and recurrence in ovarian cancer. Our approach extends the CLAM architecture to handle dual-modality data through a gated fusion mechanism. Our model leverages several pretrained encoders, including CONCH, CTransPath, GigaPath, Phikon-v2, ResNet50, UNI, and UNI2-h, each specialized in extracting high-level features from WSIs. These encoders capture diverse tissue characteristics, enabling the effective integration of WSI features with clinical data through a gated fusion mechanism. The model employs two separate attention networks corresponding to different pre-treatment protocols (IDS or PDS), where each attention network follows a gated attention mechanism to compute attention weights $[eqn]$ for each WSI patch. The weighted bag representation $[eqn]$ (where $[eqn]$ represents the $[eqn]$ -th patch feature and $[eqn]$ is the total number of patches) is obtained through the weighted sum of patch features. To effectively integrate WSI features with clinical data, we designed a gated fusion mechanism that computes adaptive fusion weights $[eqn]$ (where $[eqn]$ denotes clinical features and $[eqn]$ is the sigmoid function) based on both modalities, allowing the model to dynamically balance the contribution of WSI and clinical features. The fused representation $[eqn]$ is then passed through an output head to produce final predictions. The classification task employs a dual-output head, while the recurrence task uses a single-output head to generate the risk score.

The model employs distinct training configurations for classification and survival tasks. For classification task, a multi-objective loss function combines bag-level cross-entropy loss ( $[eqn]$ ) and instance-level clustering loss ( $[eqn]$ ): $[eqn]$ ( $[eqn]$ = 0.9). Optimization uses the Adam optimizer (learning rate $[eqn]$ , weight decay $[eqn]$ ), with training limited to 50 epochs and early stopping (patience 5). A dropout rate of 0.5 is applied to attention networks and fusion modules. For recurrence analysis, training leverages the Cox proportional hazards loss:

[eqn]

This incorporates event indicators δ, survival times t, and predicted risks $[eqn]$ . Optimization employs Adam (learning rate $[eqn]$ , weight decay $[eqn]$ ) with a step scheduler, early stopping (patience 5), and dropout rate of 0.25 across feature extraction, clinical encoding, and survival layers. Both tasks use NVIDIA GeForce RTX 4080 GPU implementation, with fixed random seeds for reproducibility.

For each patient, the model processes multiple WSIs (including primary and metastatic lesions). After feature extraction, each WSI generates an independent prediction score (e.g., probability of platinum response or recurrence risk score). To consolidate this multi-slide information into a single patient-level prediction, we employed a statistical aggregation strategy. For classification tasks (such as platinum response prediction), we calculated the mean or median of the prediction scores across all WSIs to derive the final patient-level label. For survival analysis tasks (e.g., progression-free survival prediction), the risk scores from different WSIs were similarly aggregated by taking their mean or median.

2.2.8. Heatmap Visualization

During the forward inference of the model, an attention module assigns an attention score to each patch, reflecting its contribution to the overall classification outcome. The features of all patches are then aggregated into a global representation through attention-weighted pooling, which is subsequently used for slide-level classification to produce the probability distribution across classes. Finally, the attention scores of individual patches are mapped back to their original spatial coordinates and overlaid onto the WSI to generate a heatmap, where the color intensity indicates the level of model attention.

2.3. Statistics

Quantitative variables are reported as median with interquartile range (IQR), and comparisons were made using the Kruskal–Wallis test, depending on the data distribution. Categorical variables were analyzed using the chi-square test or Fisher’s exact test. A p-value < 0.05 was considered statistically significant for all comparisons. The genomic profiling and functional enrichment analysis were performed utilizing R software (version 4.5.0) with the key packages including DESeq, clusterProfiler, xCell, and others. Model evaluation metrics included AUC or C-index with 95% CI. During the prediction model training phase, all models underwent 10 independent trials with different random seeds for validation and test set evaluations. For classification models, AUC served as the primary evaluation metric, with secondary metrics including precision, recall, F1-score, and specificity. Survival models were assessed using the C-index. The predictive performance across different models was compared based on the mean (95% CI) of the 10 training outcomes. Bootstrap resampling was employed to quantify the differences in the C-index and AUC and to generate corresponding two-sided p-values and 95% confidence intervals. This work has been reported in line with the STROCSS criteria [32].

3. Results

3.1. Baseline Characteristics

Baseline characteristics of the training, validation, and internal test sets derived from the NCC dataset are summarized in Table 1. The distribution of platinum-based chemotherapy response remained balanced across the groups, with platinum-sensitive patients accounting for 62.0% (152/245) in the training set, 61.8% (47/76) in the validation set, and 61.8% (34/55) in the internal test set. In the external TCGA-OV test set, platinum-sensitive patients constituted 72.7% (40/55).

Across the datasets, the median PFS was as follows: 8.8 months (95% CI: 7.6–10.7 months) in the training set, 9.6 months (95% CI: 8.0–17.6 months) in the validation set, 11.6 months (95% CI: 9.1–17.4 months) in the internal test set, and 14.0 months (95% CI: 12.0–17.6 months) in the external test set. The median PFS for the entire cohort was 10.2 months (95% CI: 9.1–11.7 months). The 1 year, 3 year, and 5 year PFS rates were 43.9% (95% CI: 39.4–48.8%), 13.6% (95% CI: 10.6–17.4%), and 7.1% (95% CI: 4.9–10.3%), respectively.

3.2. MMHC-OCPR: Development and Performance Evaluation of Platinum Response Prediction Model

To identify the optimal WSI feature encoder, we first evaluated the performance of pre-trained WSI encoders using primary ovarian cancer WSIs for both platinum-response prediction and recurrence prediction tasks. As shown in Supplementary Tables S1 and S2, the UNI2-h model demonstrated superior predictive performance in both tasks. For the platinum-response task, the AUC values in the training, validation, internal test, and external test sets were 0.956 (95% CI: 0.947–0.965), 0.909 (95% CI: 0.895–0.923), 0.884 (95% CI: 0.852–0.917), and 0.878 (95% CI: 0.839–0.917), respectively. For the recurrence prediction task, the C-index values were 0.836 (95% CI: 0.822–0.850), 0.782 (95% CI: 0.772–0.798), 0.762 (95% CI: 0.746–0.778), and 0.764 (95% CI: 0.751–0.777), respectively. Based on these results, UNI2-h was selected as the WSI encoder for subsequent development of the multimodal MMHC-OCPR model.

The MMHC-OCPR model integrates both WSI and baseline clinical data within a dual-modal framework and incorporates a dual-path architecture (IDS/PDS) to accommodate different treatment strategies. We compared the predictive performance of the multimodal MMHC-OCPR model with that of the unimodal UNI2-h pathology model for platinum-response classification in the NCC dataset. Given that each patient had two primary ovarian cancer WSIs, the prediction results from both slides were averaged to generate a patient-level prediction. As summarized in Table 2, MMHC-OCPR achieved higher AUC values than UNI2-h across all datasets: 0.967 (95% CI: 0.957–0.977) vs. 0.956 (95% CI: 0.947–0.965) in the training set, 0.929 (95% CI: 0.915–0.943) vs. 0.909 (95% CI: 0.895–0.923) in the validation set, and 0.903 (95% CI: 0.870–0.936) vs. 0.884 (95% CI: 0.852–0.917) in the internal test set, underscoring the performance advantage of the multimodal architecture.

Furthermore, we evaluated whether integrating primary and metastatic ovarian cancer WSIs within the pathological modality of the multimodal MMHC-OCPR model could enhance its performance in platinum-response prediction. As presented in Table 2 and the confusion matrix (Supplementary Figure S1), the multimodal MMHC-OCPR model incorporating both primary and metastatic WSIs demonstrated further improved predictive performance. The optimal results were achieved when multiple WSIs were aggregated at the patient level using median fusion, yielding AUC values of 0.960 (95% CI: 0.946–0.975) in the training set, 0.933 (95% CI: 0.920–0.946) in the validation set, and 0.912 (95% CI: 0.886–0.938) in the internal test set. Therefore, the multimodal MMHC-OCPR model, which integrates both primary and metastatic WSIs along with baseline clinical data, represents the optimal configuration for platinum-response prediction.

3.3. MMHC-OCPR: Development and Performance Evaluation for Recurrence Risk Prediction Model

To evaluate the PFS prediction capability of the MMHC-OCPR model, we compared the multimodal MMHC-OCPR model—which integrates primary ovarian cancer WSIs and baseline clinical data—against the unimodal UNI2-h pathology model in the NCC dataset. Predictions from two WSIs per patient were averaged to generate a patient-level risk score. As shown in Table 3, MMHC-OCPR demonstrated superior C-index performance compared to UNI2-h across all sets: training set, 0.868 (95% CI: 0.854–0.883) vs. 0.836 (95% CI: 0.822–0.850); validation set, 0.804 (95% CI: 0.794–0.814) vs. 0.782 (95% CI: 0.772–0.798); and internal test set, 0.793 (95% CI: 0.767–0.818) vs. 0.762 (95% CI: 0.746–0.778).

Subsequently, we incorporated metastatic WSIs into the MMHC-OCPR model. By aggregating predictions from multiple WSIs using median fusion to derive a patient-level risk score (designated as the MMHC-OCPR score), the model achieved optimal predictive performance (Table 3), with C-index values of 0.868 (95% CI: 0.852–0.883) in the training set, 0.822 (95% CI: 0.809–0.834) in the validation set, and 0.825 (95% CI: 0.796–0.854) in the internal test set. Thus, the multimodal MMHC-OCPR model integrating both primary and metastatic WSIs along with baseline clinical data represents the optimal configuration for PFS prediction.

We further assessed the prognostic value of the MMHC-OCPR score using Cox regression analysis with testing of Proportional Hazards (Supplementary Table S3). Univariate Cox regression analysis confirmed its significant association with PFS (Supplementary Table S4). Moreover, the MMHC-OCPR score remained an independent prognostic factor after adjusting for high-risk clinical variables, including high CA125 level, vascular tumor thrombus (VTT), FIGO stage IV, and suboptimal cytoreduction (Supplementary Table S5). To enhance clinical applicability and interpretability, patients were stratified into three risk groups—low-, intermediate-, and high-risk—based on tertile thresholds of the risk score derived from the training set. A progressive enrichment of high-risk clinical features was observed from the low-risk to the high-risk group (Supplementary Figure S2). Furthermore, the MMHC-OCPR risk groups retained independent prognostic value even after adjustment for clinical risk factors (Supplementary Table S6).

We subsequently evaluated the PFS predictive performance of the MMHC-OCPR risk groups. In the entire NCC dataset, the model achieved a C-index of 0.75 (95% CI: 0.73–0.77) and a 2 year time-dependent AUC of 0.87 (95% CI: 0.84–0.91). Across the training, validation, and internal test sets, the C-index values were 0.80 (95% CI: 0.78–0.81), 0.71 (95% CI: 0.66–0.76), and 0.74 (95% CI: 0.68–0.79), respectively, with corresponding 2 year time-dependent AUCs of 0.90 (95% CI: 0.87–0.93), 0.92 (95% CI: 0.85–0.99), and 0.88 (95% CI: 0.80–0.95).

Kaplan–Meier survival curves, shown in Figure 3, demonstrated clear prognostic stratification by the MMHC-OCPR groups across all datasets. Detailed survival data for patients stratified using the MMHC-OCPR model are provided in Supplementary Tables S5 and S7. In the entire NCC dataset (Supplementary Table S5), both median PFS and 2 year PFS rates exhibited a pronounced decreasing trend from the low-risk to the high-risk group: median PFS was 29.2, 10.2, and 4.1 months for the low-, intermediate-, and high-risk groups, respectively, and the corresponding 2 year PFS rates were 61.4%, 15.7%, and 1.0%. A consistent trend was observed in the training, validation, and internal test sets (Table 4), reflecting robust intergroup prognostic stratification.

Furthermore, to validate the model’s generalizability across different patient populations, we evaluated the performance of the MMHC-OCPR model within various clinical subgroups. As summarized in Supplementary Table S8, the model demonstrated great generalizability across diverse subpopulations.

3.4. Refined Risk Stratification Based on FIGO Staging

To validate the clinical applicability of the MMHC-OCPR risk groups, we compared their prognostic performance with the guideline-recommended FIGO staging system (Figure 4 and Supplementary Table S9). The MMHC-OCPR model demonstrated significantly superior predictive performance across all datasets. To assess clinical utility, we compared decision curve analysis (DCA) curves of MMHC-OCPR and FIGO staging at the 1, 2, and 3 year time points. As shown in Supplementary Figure S3, the MMHC-OCPR model consistently provided the highest net benefit across all risk thresholds.

Furthermore, the MMHC-OCPR model enabled more refined risk stratification than FIGO staging. As illustrated in Figure 5, a substantial proportion of FIGO stage III patients (30.50–64.06%) across the datasets were up-classified into the MMHC-OCPR high-risk group, while 20.31–35.42% were reassigned to the intermediate-risk group. Conversely, among FIGO stage IV patients, 0–17.78% were down-classified to the low-risk group and 8.33–35.56% to the intermediate-risk group. Histograms further illustrated the quantitative redistribution of patients. In addition, compared with FIGO staging (Stage III: 2 year PFS rate 22.90–31.20%, median PFS 10.1–15.3 months; Stage IV: 2 year PFS rate 0–8.30%, median PFS 6.0–7.0 months; Supplementary Tables S10 and S11), the MMHC-OCPR risk groups exhibited more discriminative 2 year PFS rates and median PFS across all datasets (Table 4). To provide further support regarding model calibration and overall clinical reliability, Supplementary Tables S12–S14 show the calibration curves, Brier scores, and detailed censoring data distributions at key clinically relevant time points corresponding to the MMHC-OCPR model for each dataset.

3.5. Model Interpretation and Significant Features for the Prediction

In collaboration with pathologists, we systematically analyzed histomorphological differences between the predicted platinum-sensitive and resistant groups using highly focused region heatmaps from decision-prediction classification from internal test set pathological images (Figure 6a). In PDS specimens (Figure 6b), both groups displayed solid and complex glandular patterns. However, resistant cases exhibited significantly higher proportions of these features, along with distinctive micropapillary architectures. In contrast, sensitive cases primarily showed papillary-solid patterns accompanied by tumor cell degeneration and neutrophil infiltration. For IDS cases, predicted resistant tumors maintained prominent micropapillary components and atypical tumor giant cells, whereas sensitive cases demonstrated treatment-related changes including psammomatous calcifications, sclerotic stroma, and edematous stroma with hyalinization.

Transcriptomic analysis of TCGA cases revealed significant pathway enrichment differences between the predicted groups. Differential gene expression analysis demonstrated significant enrichment in developmental processes through GO analysis, particularly in pattern specification processes, regionalization, and cell fate commitment. KEGG pathway analysis identified three key signaling pathways: neuroactive ligand–receptor interactions, neuroactive ligand signaling, and cAMP signaling pathway (Figure 6c–e). xCell deconvolution analysis showed no significant differences in immune cell infiltration between platinum-sensitive and resistant groups (Figure 6f), aligned with the well-established immune-cold tumor microenvironment in high-grade serous ovarian carcinoma.

4. Discussion

Ovarian cancer, the most challenging gynecologic malignancy to treat, often exhibits drug resistance as a key factor contributing to poor prognosis. Accurate prediction of therapeutic response at an early treatment stage is therefore crucial for timely treatment adjustment and personalized management. This study employed a systematic approach: first, we evaluated seven pre-trained encoder models using the CLAM framework, identifying UNI2-h as the optimal architecture for subsequent development. For platinum response prediction, a multi-phase optimization strategy significantly improved the model performance by progressively increasing input slide numbers (incorporating both primary ovarian and metastatic lesions) and integrating key clinical features as multimodal data. The optimized model achieved an AUC of 0.914 in internal testing, demonstrating notable generalizability. Furthermore, our recurrence risk prediction model successfully stratified patients into distinct risk groups, with significant survival differences (p < 0.001) consistently observed between low-, intermediate-, and high-risk groups across the validation and test sets, providing reliable guidance for clinical decision-making.

By integrating pathologists’ expert assessments with histomorphological analysis, we developed an interpretable predictive model. In PDS samples, the platinum-resistant group predicted by the model exhibited aggressive pathological features, including marked cellular atypia, high proportions of micropapillary and solid patterns—findings consistent with known poor prognostic markers [33]. In contrast, the platinum-sensitive group showed tumor cell degeneration and neutrophil infiltration, suggesting enhanced chemotherapy efficacy through inflammatory responses [34]. For IDS samples after adjuvant chemotherapy, the model demonstrated similar discriminative power. The predicted platinum-resistant group retained micropapillary dominance and atypical tumor giant cells, while the platinum-sensitive group exhibited treatment-responsive features such as psammoma calcifications, stromal sclerosis, and hyalinized edematous stroma—pathological changes indicative of therapeutic efficacy [13,35]. GO analysis revealed key differentiated enriched pathways, including the pattern specification process, regionalization, and cell fate commitment, in platinum-sensitive vs. resistant recurrent cases, implicating disrupted spatial organization and differentiation in tumor initiation and heterogeneity [36,37]. Furthermore, KEGG analysis identified neuroactive ligand–receptor interaction and cAMP signaling pathways as potential drivers of proliferation, metastasis, and treatment resistance [38,39,40].

Employing the CLAM architecture with attention-based learning, our model identified diagnostically critical subregions in WSIs and refined feature representation through instance-level clustering [30]. UNI2-h is an enhanced version of UNI, a large-scale vision transformer (ViT-Large) pretrained on more than 100 million pathology images. This self-supervised model outperforms existing methods in computational pathology, supporting cross-resolution classification, few-shot learning, and cancer subtype generalization, thereby advancing AI applications in diverse clinical diagnostics [31,41]. By integrating metastatic tumor tissue and key clinical factors, we developed a highly accurate predictive model for platinum response and recurrence risk in ovarian cancer. This facilitates early intervention in platinum-resistant cases and identifies best-prognosis patients who may safely avoid maintenance therapy, thereby reducing treatment-related toxicity and psychological burden.

Unlike prior models that primarily rely on primary tumor pathology or limited clinical variables [24,25,42,43], our approach uniquely integrates whole-slide images from both primary and metastatic lesions, capturing the tumor’s spatial heterogeneity. Recognizing that ovarian cancer metastases might exhibit distinct pathological features associated with aggressiveness compared to the primary tumor [23], we have validated that incorporating whole-slide images from metastatic lesions further improves the predictive accuracy of our model beyond that achieved using primary tumor pathology alone. Furthermore, it incorporates key intraoperative and pathological determinants such as cytoreductive surgery (CRS) score and completeness of cytoreduction, which are strongly prognostic but often omitted in purely image-based models. This multimodal integration of dual-site pathology and granular clinical data provides a more comprehensive biological and clinical profile, which likely contributes to the notable performance observed in our test set. Our work thus proposes a more holistic framework for prognostication, moving beyond conventional single-source models.

This study has several limitations. First, the model was developed exclusively on a cohort from China, while external validation employed a more heterogeneous population from the United States with differing demographic and clinical characteristics. Although the model demonstrated notable performance, this geographical and clinical disparity may affect its generalizability to other healthcare settings. Furthermore, the external validation cohort comprised only 55 patients. Therefore, the relatively large hazard ratios (HRs) observed in the Kaplan–Meier analysis should be interpreted with consideration for the limited sample size. Future multi-center, multinational studies are warranted to further validate and calibrate the model across diverse populations. Second, our model incorporates only histopathological and clinical data; including additional modalities such as genomic or radiomic features could further enhance predictive performance. Finally, as maintenance therapy has become standard in advanced ovarian cancer, extending our model to predict responses to maintenance therapy could refine personalized treatment strategies and better identify patients who are most likely to benefit.

5. Conclusions

In summary, this study developed an advanced AI model that integrates histopathological images and clinical data to accurately predict treatment outcomes in ovarian cancer patients after cytoreductive surgery and platinum-based chemotherapy. By leveraging a reinforced multimodal architecture based on CLAM and the pre-trained encoder UNI-2h, the model achieved high predictive accuracy through the deep integration of heterogeneous data modalities, yielding clinically actionable insights. The model serves as an effective clinical decision-support tool. It optimizes frontline therapeutic strategy by enabling the early prediction of platinum resistance and the stratification of recurrence risk, thus facilitating timely treatment modifications and informed maintenance therapy decisions.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Siegel R.L. Miller K.D. Wagle N.S. Jemal A. Cancer statistics, 2023 CA Cancer J. Clin.202373174810.3322/caac.2176336633525 · doi ↗ · pubmed ↗
2Sung H. Ferlay J. Siegel R.L. Laversanne M. Soerjomataram I. Jemal A. Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries CA Cancer J. Clin.20217120924910.3322/caac.2166033538338 · doi ↗ · pubmed ↗
3Li H. Sheng J.J. Zheng S.A. Liu P.W. Wu N. Zeng W.J. Li Y.H. Wang J. Platinum-resistant ovarian cancer: From mechanisms to treatment strategies Genes Dis.20251310180110.1016/j.gendis.2025.10180141376855 PMC 12688694 · doi ↗ · pubmed ↗
4Monk B.J. Lorusso D. Fujiwara K. Sehouli J. Optimal bevacizumab treatment strategy in advanced ovarian cancer: A review Cancer Treat. Rev.202513710294510.1016/j.ctrv.2025.10294540349571 · doi ↗ · pubmed ↗
5Ray-Coquard I. Leary A. Pignata S. Sehouli J. Olaparib plus bevacizumab first-line maintenance in ovarian cancer: Final overall survival results from the PAOLA-1/ENGOT-ov 25 trial Ann. Oncol.20233468169210.1016/j.annonc.2023.05.00537211045 · doi ↗ · pubmed ↗
6Li N. Zhang Y. Wang J. Zhu J. Wang L. Wu X. Yao D. Wu Q. Liu J. Tang J. Fuzuloparib Maintenance Therapy in Patients With Platinum-Sensitive, Recurrent Ovarian Carcinoma (FZOCUS-2): A Multicenter, Randomized, Double-Blind, Placebo-Controlled, Phase III Trial J. Clin. Oncol.2022402436244610.1200/JCO.21.0151135404684 · doi ↗ · pubmed ↗
7Li W. Zhang K. Wang W. Liu Y. Huang J. Zheng M. Li L. Zhang X. Xu M. Chen G. Combined inhibition of HER 2 and VEGFR synergistically improves therapeutic efficacy via PI 3K-AKT pathway in advanced ovarian cancer J. Exp. Clin. Cancer Res.2024435610.1186/s 13046-024-02981-538403634 PMC 10895844 · doi ↗ · pubmed ↗
8Ghisoni E. Morotti M. Sarivalasis A. Grimm A.J. Kandalaft L. Laniti D.D. Coukos G. Immunotherapy for ovarian cancer: Towards a tailored immunophenotype-based approach Nat. Rev. Clin. Oncol.20242180181710.1038/s 41571-024-00937-439232212 · doi ↗ · pubmed ↗