Development of a hybrid 2.5D deep learning model for glioma survival prediction using T1-weighted MRI from the CGGA database
Kai Jin, Caixing Sun, Liang Xia

TL;DR
A new non-invasive deep learning model using MRI scans improves survival prediction for glioma patients, combining imaging and clinical data.
Contribution
A hybrid 2.5D deep learning model for glioma survival prediction using T1CE MRI and clinical data is developed and validated.
Findings
The combined model achieved a testing C-index of 0.804, outperforming other models.
Time-dependent AUC values ranged from 0.851 to 0.906 over 1–5 years.
Stratified survival curves showed distinct prognostic groups with log-rank p < 0.001.
Abstract
Current glioma survival prediction relies on invasive molecular profiling. To overcome this, a non-invasive deep learning framework using T1-weighted contrast-enhanced MRI (T1CE) was developed to predict overall survival. This framework addresses computational limitations associated with the volumetric analysis while preserving important spatial information. We designed a hybrid 2.5D convolutional neural network to process multi-slice inputs, including the center slice and its adjacent slices, from 217 patients in the CGGA database. Transfer learning using ResNet and DenseNet architectures were employed to initialize the models. These models were subsequently fine-tuned with the Cox proportional hazards loss function. After the fine-tuning process was completed, the imaging signature was combined with clinical and molecular variables, including IDH and 1p19q status, to build an…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Glioma Diagnosis and Treatment · Brain Tumor Detection and Classification
Introduction
1
Gliomas are the most common type of primary brain tumors. They exhibit considerable variability in their biological behavior and clinical outcomes [1]. This variability highlights the need for effective prognostic tools that can help tailor treatment strategies for individual patients. Currently, prognosis is largely based on the WHO classification, which combines histopathological evaluations with molecular profiling. This molecular profiling includes the assessment of IDH mutation status and 1p/19q codeletion to categorize risk levels. However, acquiring these biomarkers typically requires invasive procedures like surgical resection or biopsy. These procedures carry risks including infection and neurological damage, and may not be suitable for all patients [2]. As a result, there is a growing focus on developing non-invasive prognostic tools derived from magnetic resonance imaging (MRI), which is commonly used in clinical settings.
Multiparametric MRI techniques include T1-weighted contrast-enhanced imaging, T2-weighted imaging, FLAIR, diffusion, and perfusion sequences. These methods enable detailed in vivo analysis of glioma characteristics by revealing tumor structure, blood-brain barrier integrity, cellular density, and blood flow dynamics [3], [4]. Among these, T1-weighted contrast-enhanced imaging is particularly useful for identifying disruptions in the blood-brain barrier, which are signs often associated with more aggressive tumor types. Volumetric MRI data provide rich information but pose significant computational challenges when used for deep learning-based survival predictions. consequently, the computational demand to process full three-dimensional datasets often exceeds the capabilities of available hardware. As a result, researchers either work with individual two-dimensional slices, which can lose important spatial information, or use network architectures that are less effective at capturing three-dimensional spatial features [5].
Traditional radiomics and machine learning methods have shown some initial promise in predicting outcomes for glioma patients. However, their application in clinical settings faces challenges due to inconsistencies in methodology, varying imaging protocols, and limited abilities to generalize across different populations. Neural network-based survival models, such as DeepSurv and NMTLR, show potential by effectively capturing complex, non-linear relationships in high-dimensional data and by offering advantages over standard Cox regression models. Although deep learning has advanced survival prediction in other cancer imaging fields, its application in glioma prognosis—particularly with advanced multiparametric MRI—remains underexplored, primarily because 3D data demands high computational cost and hardware resources [6], [7], [8]. To address this issue, we propose a novel 2.5D deep learning framework that leverages multiparametric MRI, with particular emphasis on T1ce sequences, to predict overall survival in glioma patients [9].
By integrating slice information from multiple adjacent slices, 2.5D deep learning models represent a significant evolution from their 2D predecessors. This approach benefits medical imaging by providing a more comprehensive understanding of anatomical structures. The 2.5D model effectively bridges the gap between the simplicity of 2D CNNs and the high computational expense of full 3D models. It offers a balanced solution that leverages depth information without requiring excessive computational resources. A prime example of 2.5D CNN model is its application in segmenting contrast-enhanced lesions in brain MRI scans [10], [11]. Here, "2.5D" refers to a method that processes individual 2D slices but incorporates contextual information from adjacent slices to capture some 3D spatial information. This method balances computational efficiency with the preservation of critical spatial context within the imaging data. Our framework aims to create a reliable, non-invasive prognostic tool that can improve clinical decision-making.
Methods
2
Patients
2.1
The research flowchart is illustrated in Fig. 1. The dataset used in this work is collected from CGGA (Chinese Glioma Genome Atlas) [12], which is a single-center dataset. The collected data were normalized, and samples with defective clinical and follow-up information were eliminated. The images with poor quality or lack of contrast-enhanced series were also excluded. There were 217 samples left after filtration. This study divided the samples into two cohorts randomly: a training cohort, comprising 70 % of the data, and a testing cohort, consisting of the remaining 30 % (Fig. 2). Results of the two cohort comparisons (p > 0.05) indicated that there were no significant differences between the two groups, that we will conduct further analysis after increasing the sample size in the future (Table 1).Fig. 1. Workflow of this study.Fig. 1. Fig. 2Flowchart of the criteria for patient selection.Fig. 2. Table 1Baseline characters of our cohorts.Table 1. FeatureALLtesttrainP-valueAge43.89 ± 12.2643.39 ± 11.6344.11 ± 12.560.754Histology0.635Non-GBM82(37.79)27(40.91)55(36.42)GBM135(62.21)39(59.09)96(63.58)Grade0.76WHO II76(35.02)23(34.85)53(35.10)WHO III59(27.19)16(24.24)43(28.48)WHO IV82(37.79)27(40.91)55(36.42)Gender0.295Female82(37.79)21(31.82)61(40.40)Male135(62.21)45(68.18)90(59.60)Radiotherapy0.872No30(13.82)10(15.15)20(13.25)Yes187(86.18)56(84.85)131(86.75)Chemotherapy0.652No79(36.41)26(39.39)53(35.10)Yes138(63.59)40(60.61)98(64.90)IDH0.969Wildtype104(47.93)31(46.97)73(48.34)Mutant113(52.07)35(53.03)78(51.66)1p19q codeletion0.249No185(85.25)53(80.30)132(87.42)Yes32(14.75)13(19.70)19(12.58)MGMTp methylation0.652No138(63.59)40(60.61)98(64.90)Yes79(36.41)26(39.39)53(35.10)
Image acquisition
2.2
MRI scans for these patients were performed using 3.0-T Trio scanners (Siemens). The protocol typically included axial T1-weighted scans with repetition times of 156–2520 ms, echo times of 2–19.7 ms, and slice thicknesses of 3–5 mm. Additionally, contrast-enhanced (CE) scans were acquired following the administration of 0.1 mmol/kg of Gd-DTPA contrast agent (Beijing Beilu Pharmaceutical Co., Beijing, China), with repetition times of 560–2520 ms, echo times of 2.3–19.7 ms, and slice thicknesses of 3–5 mm.
ROI Segmentation: In our study, two experienced radiologists used ITK-SNAP to manually delineate the ROI on the training dataset images. To ensure accuracy, a senior radiologist with over 15 years of experience reviewed the annotations made by both radiologists and addressed any discrepancies that arose. This meticulous manual delineation process served as the foundation for the training data used to develop our automated segmentation model.
Radiomics procedure
2.3
Feature extraction
2.3.1
Handcrafted features were divided into three main categories: Geometry, related to tumor shape; Intensity, concerning voxel intensity distribution; and Texture, which analyzes advanced intensity patterns, including those from the Gray Level Co-occurrence Matrix (GLCM). The process of feature extraction adhered to the guidelines set forth by the Imaging Biomarker Standardization Initiative (IBSI) and utilized the PyRadiomics tool, specifically version 3.0.1. To maintain consistency in texture feature analysis, images were resampled using cubic interpolation to achieve isotropic voxel spacing of 1 mm × 1 mm × 1 mm. Before feature extraction, intensity values were discretized into 64 gray levels. For multi-center standardization studies, the Standardized Environment for Radiomics Analysis (SERA), a MATLAB-based framework compliant with the IBSI standards was employed to ensure consistency across datasets. To focus the analysis on relevant tumor characteristics, ROIs were manually delineated. From each ROI, a total of 225 radiomic features were extracted.
Feature selection
2.4
Intraclass Correlation Coefficient (ICC): We evaluated the consistency of feature extraction amid segmentation uncertainties through test-retest and inter-rater reliability analyses. Specifically, one rater performed two segmentations on the same randomly selected sample of 30 patients to assess test-retest reliability. For inter-rater reliability, two independent raters segmented the subregions within the Volumes of Interest (VOIs) in a separate group of 30 patients. The ICC was employed to assess the reliability of radiomic features derived from these segmented subregions. Features with ICC ≥ 0.85 were considered reliable despite segmentation variability.
Correlation Analysis: Pearson's correlation coefficient was used to analyze features with high test-retest reliability. If two features had a correlation coefficient greater than 0.8, one was retained based on its clinical relevance as determined by expert knowledge, to avoid redundancy. Then, a recursive feature elimination strategy removed the feature with the highest average correlation to others in each iteration.
Univariate Cox Regression: To streamline the extensive feature set, we implemented univariate Cox regression analysis. We ranked features by their p-values and selected a number of top features equal to the size of the training sample. This choice helps to balance model complexity and to avoid overfitting. Our experiments showed that this feature selection method outperformed using all features whose p-values were below 0.05.
Lasso-Cox Regression: The final feature set for the radiomic signature was determined using LASSO (Least Absolute Shrinkage and Selection Operator) Cox regression. This technique identifies irrelevant features by zeroing their coefficients, which depends on the regularization parameter λ. The optimal regularization parameter λ was determined by 10-fold cross-validation, selecting the value that minimized the cross-validated partial likelihood deviance.
Deep learning procedure
2.5
Data preparation
2.5.1
Data Preprocessing: This study addressed challenges in medical image analysis by standardizing voxel spacing in Volumes of Interest (VOIs) to 1 × 1 × 1 mm using a linear interpolation resampling method. This standardization improved the consistency of image quality and the accuracy of feature extraction in the analysis.
Model training
2.5.2
Image Preparation: To effectively leverage 2.5D data (e.g., volumetric medical images with limited slices), we adopted a multi-view ROI extraction strategy. For each 3D volume, ROIs were extracted ±1 from the central slice and include the central slice. These 2D slices were then stacked along the channel dimension to form a multi-channel input, preserving spatial context and enabling compatibility with standard 2D convolutional architectures. This approach combines cross-sectional features while reducing the memory requirements associated with full 3D processing.
Transfer learning: We evaluated five well-established architectures—ResNet50, ResNet101, Inception-V3, DenseNet201, and DenseNet161—initialized with weights pretrained on ImageNet. Transfer learning was employed to adapt these models to our domain-specific task, with the following modifications:
- Input Layer Adjustment: The first convolutional layer was reconfigured to accept multi-channel inputs (e.g., 3 channels for 1 planes × 3 slices per ROI).
- Feature Fusion: Late-stage features from all architectures were fed into a fully connected layer with L2 regularization, followed by a single-node output for log-risk prediction.
We conducted a comparative analysis to identify the optimal backbone neural network by evaluating training stability, convergence speed, and discriminative performance. The discriminative performance was assessed using the concordance index (C-index).
Objective Function: Cox Proportional Hazards Loss: The model was trained to minimize the negative partial log-likelihood (Cox loss), which quantifies the alignment between predicted risks and observed survival times:
Training Protocol:
- Warm-up: Early layers were frozen for 5 epochs to stabilize feature extraction.
- Fine-tuning: All layers were unfrozen, with gradient clipping (norm=1.0) to prevent explosion.
- Early Stopping: Monitored validation loss (patience=10 epochs).
Signature building
2.5.3
Deep learning signature (DL Signature): is derived from a neural network trained using the partial likelihood loss function of the Cox proportional hazards model to capture non-linear relationships between imaging-derived features and survival outcomes. The training hyperparameters of 2D and 2.5D deep learning were set as a batch size of 32, epoch of 40, and stochastic gradient descent (SGD) optimizer with a learning rate of 0.01.
Metrics
2.5.4
Our research employed sophisticated statistical and computational techniques to address prevailing challenges in medical image analysis. Specifically, we constructed Cox proportional hazards models with L2 regularization for survival analysis. We then performed Kaplan-Meier analysis on samples stratified into high-risk and low-risk groups according to their predicted hazard ratios (HRs). The significance of group separation was assessed via a multivariate log-rank test.
Construction of the fusion model
2.6
The current study employed feature-level fusion strategies to establish the fusion model. Feature-level fusion, also known as early fusion, involves concatenating features from different data modalities into a single feature vector. The radiomics features of the primary tumor were extracted using PyRadiomics, while the 2D/2.5D_DL features were obtained through deep convolutional neural networks (DCNNs), as described in the Methods section. These features, along with clinical and radiological characteristics, were standardized using z-score normalization. Subsequently, Spearman correlation analysis, ICC, and LASSO regression were performed to select the most relevant features. Finally, the support vector machine (SVM) classifier was trained to construct the feature-level fusion model, known as the combined model.
Statistical analysis
2.7
The normality of clinical features was assessed using the Shapiro-Wilk test. Continuous variables were subjected to the independent samples t-test or the Mann-Whitney U test based on their distribution characteristics. Categorical variables were examined using Chi-square (χ²) tests. The baseline characteristics across all cohorts are detailed in Table 1. Notably, p-values exceeding 0.05 between cohorts indicated no significant differences, which suggests that the division between cohorts was unbiased.
All data analyses were performed on the OnekeyAI platform version 4.9.1 utilizing Python 3.7.12. Statistical assessments were conducted with Statsmodels version 0.13.2, and radiomics features were extracted using PyRadiomics version 3.0.1. Machine learning models, such as SVM were implemented with Scikit-learn version 1.0.2. Our deep learning frameworks were developed using PyTorch version 1.11.0, with performance improvements achieved through optimization leveraging CUDA version 11.3.1 and cuDNN version 8.2.1.
Results
3
Clinical signature
3.1
In our study, we initially evaluated clinical features, from which multivariate analysis identified 3 significant characteristics (Table 2). These features were instrumental in constructing the clinical model.Table 2. Univariable and multivariable analysis of clinical features.Table 2. FeatureUnivariate analysisMultivariate analysisHR95 %CIp-valueHR95 %CIP-valueHistology0.2860.19–0.429< 0.050.7050.488–1.0810.062Grade2.3751.84–3.065< 0.051.3681.103–1.697< 0.05Gender0.7640.512–1.1410.189Age1.0481.031–1.066< 0.051.0211.008–1.034< 0.05Radiotherapy1.5220.783–2.9580.215Chemotherapy1.180.772–1.8040.444IDH mutation0.2990.197–0.453< 0.050.6780.482–0.952< 0.051p19q codeletion0.2590.114–0.588< 0.050.6350.371–1.0870.098MGMTp methylation0.8690.572–1.320.510
Survival analysis
3.2
Among the five models, DenseNet-161 showed the best prediction performance based on its output scores. The Area Under the Curve (AUC) values of different indicators were calculated separately in the training and test cohorts to assess their predictive performance. In the training cohort, the Combined indicators-which integrate clinical, radiomics, and deep learning features—demonstrated the highest AUC at 0.819 (95 % CI: 0.758–0.880), followed by the Clinical indicator at 0.797 (95 % CI: 0.733–0.861). Radiomics indicators showed an AUC of 0.764 (95 % CI: 0.696–0.832), while the 2D_DL and 2.5D_DL indicators showed AUCs of 0.685 (95 % CI: 0.611–0.759) and 0.739 (95 % CI: 0.669–0.809), respectively. Meanwhile, in the test cohort, the Combined indicators maintained the highest AUC at 0.804 (95 % CI: 0.708–0.900), with Clinical indicators closely following at 0.799 (95 % CI: 0.703–0.896). Moreover, Radiomics indicators exhibited an AUC of 0.712 (95 % CI: 0.603–0.821), and both 2D_DL and 2.5D_DL indicators showed AUCs of 0.710 (95 % CI: 0.600–0.819) and 0.739 (95 % CI: 0.633–0.845), respectively (Table 3, Fig. 3). The results indicate that the Combined indicator consistently outperformed individual indicators, and these findings highlight the potential utility of combining multiple indicators for improved diagnostic performance in clinical applications.Table 3C-index of different signature.Table 3. ClinicalRadiomics2D_DL2.5D_DLCombinedCohort0.797(0.733–0.861)0.764(0.696–0.832)0.685(0.611–0.759)0.739(0.669–0.809)0.819(0.758–0.880)train0.799(0.703–0.896)0.712(0.603–0.821)0.710(0.600–0.819)0.739(0.633–0.845)0.804(0.708–0.900)testFig. 3KM of Combined Signature.Fig. 3
Grad-CAM
3.3
To investigate the recognition capabilities of the deep learning models across various samples, we employed the gradient-weighted class activation mapping (Grad-CAM) technique to visualize the activations in the final convolutional layer associated with predictions of cancer-types. As shown in Fig. 4, Grad-CAM highlights the image regions that contributed most strongly to model predictions. This visualization provides clearer insights into the interpretability of the models by revealing which image features influence the models' decisions.Fig. 4. Grad-CAM visualizations for 2 representative samples.Fig. 4
Time-dependent ROC analysis
3.4
The AUC (Area Under the Curve) values for the different indicators varied across Clinical, Radiomics, 2D_DL, 2.5D_DL, and Combined models. For the Clinical model, AUC ranged from 0.850 to 0.868, indicating high discriminative performance across different survival periods (1–5 years). The Radiomics model showed AUC values between 0.759 and 0.840, suggesting moderate to high discriminative performance. The 2D_DL model exhibited lower AUC values, ranging from 0.660 to 0.765, indicating limited discriminative performance compared to other models. In contrast, the 2.5D_DL model had AUC values between 0.753 and 0.830, indicating moderate discriminative performance and representing a distinct model that integrates 2.5-dimensional input. Furthermore, the Combined models demonstrated the highest AUC values, ranging from 0.851 to 0.906, suggesting excellent discriminative performance across all survival periods. Notably, the combined models consistently outperformed individual Clinical, Radiomics, 2D_DL, and 2.5D_DL models (Table 4) and conferred greater clinical net benefits in both training and validation cohorts, as shown by decision curve analysis (DCA) (Fig. 5). The AUC analysis revealed that the combined signatures had the highest performance in predicting survival. These findings highlight the importance of combining multiple indicators that enhance predictive accuracy in survival analysis. Moreover, the consistent performance of the combined signatures across different survival periods suggests their robust predictive ability. Consequently, this consistency indicates their potential utility in clinical decision-making.Table 4. Metrics in train and test cohorts for predicting the risk of survival at 1–5 Years.Table 4modelsAccuracyAUC95 % CISensitivitySpecificityPPVNPVSurvivalCohortClinical0.8120.850.7726–0.92740.8320.7330.9250.5241YearsTrainRadiomics0.7580.7930.7062–0.88040.7480.80.9370.4441YearsTrain2D_DL0.7990.660.5482–0.77110.8990.40.8560.51YearsTrain2.5D_DL0.7580.7530.6547–0.85180.7820.6670.9030.4351YearsTrainCombined0.7250.8510.7847–0.91670.6810.90.9640.4151YearsTrainClinical0.7850.8610.7651–0.95610.7450.9290.9740.51YearsTestRadiomics0.80.7590.6149–0.90330.8240.7140.9130.5261YearsTest2D_DL0.80.7590.6162–0.90200.8430.6430.8960.5291YearsTest2.5D_DL0.8310.8140.6740–0.95340.8820.6430.90.61YearsTestCombined0.8770.8740.7698–0.97810.9020.7860.9390.6871YearsTestClinical0.7850.8680.8073–0.92820.740.8750.9220.6272YearsTrainRadiomics0.7920.80.7209–0.87890.8120.750.8670.6672YearsTrain2D_DL0.7570.7340.6469–0.82190.8960.4790.7750.6972YearsTrain2.5D_DL0.7360.7890.7108–0.86640.6980.8120.8820.5742YearsTrainCombined0.8190.8850.8319–0.93810.7810.8960.9370.6722YearsTrainClinical0.7970.8580.7688–0.94800.8050.7830.8680.6922YearsTestRadiomics0.8280.7660.6276–0.90370.9270.6520.8260.8332YearsTest2D_DL0.6250.7590.6405–0.87810.415110.4892YearsTest2.5D_DL0.750.7890.6699–0.90800.7320.7830.8570.6212YearsTestCombined0.8120.8620.7680–0.95630.8290.7830.8720.722YearsTestClinical0.7940.8620.8012–0.92280.7850.8060.8380.7463YearsTrainRadiomics0.8090.8330.7641–0.90270.8730.7260.8020.8183YearsTrain2D_DL0.7230.7650.6843–0.84490.8350.5810.7170.7353YearsTrain2.5D_DL0.7660.830.7625–0.89780.7470.790.8190.713YearsTrainCombined0.830.9010.8525–0.94860.8350.8230.8570.7973YearsTrainClinical0.8120.8750.7873–0.96360.6760.9670.9580.7253YearsTestRadiomics0.7970.7820.6619–0.90280.9410.6330.7440.9053YearsTest2D_DL0.7340.7540.6347–0.87310.9410.50.6810.8823YearsTest2.5D_DL0.7660.7830.6697–0.89700.7060.8330.8280.7143YearsTestCombined0.8590.8850.8009–0.96970.9410.7670.8210.923YearsTestClinical0.8050.8650.8026–0.92750.8390.7750.7650.8464YearsTrainRadiomics0.7890.840.7724–0.90730.9030.690.7180.8914YearsTrain2D_DL0.6920.7510.6685–0.83310.8060.5920.6330.7784YearsTrain2.5D_DL0.7670.8140.7416–0.88720.7740.7610.7380.7944YearsTrainCombined0.8570.9060.8551–0.95680.790.9150.8910.8334YearsTrainClinical0.8570.8980.8168–0.97840.7590.9410.9170.8214YearsTestRadiomics0.7780.8120.7053–0.919510.5880.67414YearsTest2D_DL0.6980.7910.6823–0.899910.4410.60414YearsTest2.5D_DL0.7620.780.6655–0.89430.7240.7940.750.7714YearsTestCombined0.8250.8990.8258–0.97130.9660.7060.7370.964YearsTestClinical0.7830.8470.7690–0.92540.8250.760.6470.8915YearsTrainRadiomics0.7570.840.7693–0.91130.9250.6670.5970.9435YearsTrain2D_DL0.6960.7470.6585–0.83550.750.6670.5450.8335YearsTrain2.5D_DL0.7740.8280.7483–0.90770.80.760.640.8775YearsTrainCombined0.870.9050.8452–0.96480.80.9070.8210.8955YearsTrainClinical0.8570.880.7801–0.97920.7270.9410.8890.8425YearsTestRadiomics0.750.8060.6931–0.919210.5880.61115YearsTest2D_DL0.7140.8290.7235–0.93430.9090.5880.5880.9095YearsTest2.5D_DL0.7860.810.6973–0.92300.7730.7940.7080.8445YearsTestCombined0.8040.890.8094–0.97140.9550.7060.6770.965YearsTestAUC: Area Under Curve; CI: Confidence Interval; PPV: Positive Predictive Value; NPV: Negative Predictive Value.Fig. 5ROC Curves, Calibration Curve and DCA (decision curve analysis) in Training and Testing Cohorts for 2-Years and 5-Years Time-Dependent Analysis.Fig. 5
Deep learning radiomics nomogram (DLRN) construction
3.5
The chart presents multiple indicators, including Points, Grade, Age, IDH, Radiomics, 2D_DL, 2.5D_DL, Total Points, and survival probabilities for 1- to 5- year periods. Points range from 0 to 100. Grade takes values of 2, 3, and 4; Age ranges from 10 to 80; and IDH takes values of 0 and 1. Radiomics ranges from 0 to 9; 2D_DL and 2.5D_DL take values from −2–1.5; and Total Points range from 0 to 400. The survival probabilities for 1- to 5-year periods correspond to different cutoff thresholds of Total Points (Fig. 6). The 1-Year indicator shows a relatively wide range of survival probabilities, indicating potential variability. However, the 5-Year indicator demonstrates more consistent and concentrated survival probability estimates, suggesting that the model's predictions for 5-Year survival are more stable. This stability implies that the model has a better ability to accurately predict long-term survival, making it the best-performing model among the analyzed indicators.Fig. 6. The constructed nomogram for the combined model.Fig. 6
Discussion
4
This research introduces a multi-source deep learning framework aimed at predicting survival outcomes for glioma patients. The framework utilizes T1-weighted contrast-enhanced MRI (T1CE) and integrates both 2D and pseudo-3D deep learning architectures — the latter, also known as 2.5D, analyzes multiple adjacent 2D slices to capture volumetric information without full 3D processing [13]. This design leverages the strengths of each imaging dimension. Moreover, our approach shows that combining deep learning-extracted radiomic features with established clinical and molecular biomarkers significantly improves survival prediction. This outperforms conventional imaging techniques and common clinical assessment methods in neuro-oncology.
The innovative 2.5D model represents a significant technical leap, effectively capturing the three-dimensional structure of tumors while addressing the computational challenges associated with full 3D convolutional networks [14], [15], [16]. By encoding orthogonal planes as multi-channel inputs, this architecture maintains crucial spatial relationships without the need for isotropic volumetric processing. This makes it a practical solution for clinical settings with limited resources where high-resolution 3D analysis is not feasible [17]. These models successfully address challenges posed by datasets with small sample sizes, a common issue in medical imaging caused by the high cost and complexity of data acquisition. These advancements not only pave the way for more sophisticated and accurate diagnostic tools but also demonstrate how deep learning technologies are evolving to become more integrated and context-aware. Building upon the 2.5D model architecture, we employed transfer learning from pre-trained convolutional neural networks (CNNs), such as ResNet50 and DenseNet121, to extract subtle malignancy features indicative of tumor aggressiveness. This approach resulted in strong prognostic performance, outperforming unimodal benchmarks—models that rely on a single data modality.
We implemented a rigorous feature refinement process by applying stability thresholding (ICC ≥ 0.85) to reduce variability caused by segmentation, while using sparsity-constrained Cox modeling (L1-penalized regression) to eliminate redundant features without losing biological significance. This dual-filter strategy contrasts with previous radiomic studies that often led to overfitting in diverse patient cohorts due to insufficient feature selection. We improved predictive accuracy and aligned our model with current glioma classification systems by explicitly including molecular markers such as IDH mutation status and 1p/19q codeletion, along with clinical variables. This integration with established classification systems enhances its applicability in clinical practice [18], [19]. To evaluate the model, internal validation was performed using demographically and clinically balanced cohorts divided into training and testing sets at a ratio of 7:3. All baseline characteristics showed no significant differences (p-values > 0.05) between the training and testing groups, confirming the model's generalizability. Time-dependent ROC analyses further supported these results, demonstrating area under the curve (AUC) values exceeding 0.80 across 1- to 5- year survival horizons, consistently outperforming both isolated radiomic features and clinicopathological predictors. Collectively, these results suggest that multi-source integration of radiomic features, molecular, and clinical data captures the biological complexity of gliomas more effectively than single-modality feature sets.
Our findings are consistent with emerging evidence indicating that deep learning techniques demonstrate a superior predictive performance compared to traditional linear models, such as the Cox proportional hazards model. These techniques are particularly effective in identifying nonlinear determinants of survival. The superiority of the 2.5D architecture over 2D methods aligns with new multiplanar sampling paradigms, which better characterize tumor heterogeneity [20]. Importantly, this approach advances the field of survival analysis by integrating clinical and imaging data in a bidirectional manner. This means that information flows both ways between these data types, setting it apart from studies that focus solely on imaging features.
Additionally, the combined model showed better calibration performance, as indicated by the Hosmer-Lemeshow test. It also demonstrated increased clinical utility, as shown by the decision curve analysis. The improvements in calibration and clinical utility directly address the discrepancy between algorithmic performance and clinical implementation by enhancing the model's calibration accuracy. This improvement, in turn, supports practical aspects such as clinical decision-making, patient management, and treatment planning. Furthermore, this development aligns with Decuyper's emphasis on the need for prognostic tools that balance statistical rigor with clinical practicality [21].
Our study has several limitations. Firstly, the retrospective nature of data collection may introduce selection bias and limit the ability to infer causality from the observed relationships. Moreover, the generalizability of our findings to other populations or settings remains to be validated through well-designed prospective studies. Secondly, constraints related to imaging techniques, such as limited resolution or variability in feature extraction methods, could affect the reproducibility of results in different clinical environments. Thirdly, this study included only T1CE sequences and did not incorporate advanced imaging modalities such as diffusion-weighted imaging (DWI) and perfusion-weighted imaging (PWI). These advanced techniques can evaluate microenvironmental factors like cellularity and angiogenesis, and their inclusion would significantly improve the biological relevance of the findings [22]. While internal validation helps reduce selection bias, it is crucial to conduct external tests across various institutions with differing imaging protocols to confirm the robustness of the results across different scanners. Future research could incorporate treatment-response biomarkers, such as MGMT methylation status, which would help refine risk-adapted therapeutic strategies and build on the foundational work by Shreyesh, who demonstrated the prognostic value of integrating molecular markers with imaging features [23].
This framework provides a non-invasive preoperative decision-support tool that is particularly beneficial in cases where diagnoses are ambiguous or when surgery is not advisable. By generating personalized survival estimates, the framework helps to select appropriate management strategies across the risk spectrum, ranging from active surveillance for low-risk cases to aggressive adjuvant therapies for high-risk patients. To enhance the clinical feasibility of the approach, standardized preprocessing techniques such as isotropic resampling, are employed, along with open-source tools like PyRadiomics and PyTorch.
Conclusions
5
In conclusion, combining 2.5D deep learning with rigorous radiomic data processing significantly improves survival prediction for glioma patients, and adding multi-source data further enhances this predictive capability. This research addresses challenges related to both generalizability and clinical interpretability, which facilitates the adoption of AI-driven prognostic systems in routine neuro-oncology. Consequently, these advancements in predictive modeling enable the development of personalized therapeutic approaches, including treatment plans based on individual patient profiles.
CRediT authorship contribution statement
jin kai: Writing – original draft, Software, Project administration, Methodology, Formal analysis, Conceptualization. Caixing Sun: Writing – review & editing, Investigation, Funding acquisition, Data curation. Liang Xia: Writing – review & editing, Funding acquisition.
Ethical statement
Ethical approval was not required for this study because it utilized publicly available, de-identified data from the Chinese Glioma Genome Atlas (CGGA). These datasets contain aggregated population-level statistics and do not involve individual patient information or direct human/animal experimentation.
Funding
The present study was supported by Zhejiang Provincial National Science Foundation of China and Zhejiang Health Science and Technology Plan (LY21H160007, LY22H160043, and 2022KY679),and the Medical Science and Technology Project of Zhejiang Province (2022RC116, 2024KY805).
Declaration of Competing Interest
All authors have no conflicts of interest to declare.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Luo T.Liu L.Wang H.Wen S.Protein lactylation and immunotherapy in gliomas: a novel regulatory axis in tumor metabolism (review)Int. J. Oncol.67202511310.3892/ijo.2025.5764 PMC 1222112140539457 · doi ↗ · pubmed ↗
- 2Sisodiya S.Kasherwal V.Khan A.Roy B.Goel A.Kumar S.Liquid biopsies: emerging role and clinical applications in solid tumours Transl. Oncol.35202310171610.1016/j.tranon.2023.101716 PMC 1028527837327582 · doi ↗ · pubmed ↗
- 3Scola E.Del Vecchio G.Busto G.Bianchi A.Desideri I.Gadda D.Conventional and advanced magnetic resonance imaging assessment of non-enhancing peritumoral area in brain tumor Cancers 152023299210.3390/cancers 1511299237296953 PMC 10252005 · doi ↗ · pubmed ↗
- 4Rollin N.Guyotat J.Streichenberger N.Honnorat J.Tran Minh V.-A.Cotton F.Clinical relevance of diffusion and perfusion magnetic resonance imaging in assessing intra-axial brain tumors Neuroradiology 48200615015910.1007/s 00234-005-0030-716470375 · doi ↗ · pubmed ↗
- 5Rastogi D.Johri P.Donelli M.Kadry S.Khan A.A.Espa G.Deep learning-integrated MRI brain tumor analysis: feature extraction, segmentation, and survival prediction using replicator and volumetric networks Sci. Rep.15202510.1038/s 41598-024-84386-0PMC 1171825439789043 · doi ↗ · pubmed ↗
- 6Li X.Bao H.Shi Y.Zhu W.Peng Z.Yan L.Machine learning methods for accurately predicting survival and guiding treatment in stage I and II hepatocellular carcinoma Medicine 1022023 e 3589210.1097/md.0000000000035892 PMC 1063752937960763 · doi ↗ · pubmed ↗
- 7Wang S.Shao M.Fu Y.Zhao R.Xing Y.Zhang L.Deep learning models for predicting the survival of patients with hepatocellular carcinoma based on a surveillance, epidemiology, and end results (SEER) database analysis Sci. Rep.14202410.1038/s 41598-024-63531-9PMC 1116300438853169 · doi ↗ · pubmed ↗
- 8Yan L.Gao N.Ai F.Zhao Y.Kang Y.Chen J.Deep learning models for predicting the survival of patients with chondrosarcoma based on a surveillance, epidemiology, and end results analysis Front Oncol.12202210.3389/fonc.2022.967758 PMC 944203236072795 · doi ↗ · pubmed ↗
