MRI-derived radiomics for risk stratification of tumor deposits in rectal cancer: a dual-center study
Changjiang Zhang, Xiaojuan Deng, Zehong Cao, Feng Shi, Yi Yang, Yutong Chen, Huan Zhao, Xiaojing He, Xinjie Liu, Yindeng Luo

TL;DR
MRI-based radiomics can accurately predict the number of tumor deposits in rectal cancer patients, helping doctors plan personalized treatments.
Contribution
A fusion model combining tumor and nodule features outperforms individual models and radiologists in predicting tumor deposit risk.
Findings
The fusion model achieved AUCs of 0.873 (test) and 0.858 (validation), outperforming individual models and radiologists.
Tumor and nodule features effectively stratify rectal cancer patients into three TD categories.
MRI-derived radiomics provides a non-invasive method for preoperative risk assessment in rectal cancer.
Abstract
To evaluate the diagnostic utility of MRI-based radiomics in stratifying the risk of tumor deposits (TD) in patients with rectal cancer (RC). This study retrospectively analyzed 729 patients with RC from two institutions (January 2018–August 2024). Patients were classified into three groups according to the number of TD: no TD (TD0), 1–2 TD (TD1-2), and ≥ 3 TD (TD3+). Radiomics features were extracted from the tumor and the largest nodule within the rectal mesentery on MRI images. Predictive models were developed with the XGBoost algorithm. Model performance was evaluated using the receiver operating characteristic curve, area under the curve, confusion matrix, precision, accuracy, recall, and F1 score. Three hundred seventy-six patients were ultimately included and allocated into training, test, and validation sets. The tumor model (developed using tumor features) achieved AUCs of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Colorectal Cancer Surgical Treatments · Colorectal and Anal Carcinomas
Introduction
Colorectal cancer remains a significant global health challenge. In China, its incidence has risen sharply, making it the second most common malignancy, with rectal cancer (RC) being the most prevalent subtype [1, 2]. Tumor deposit (TD), defined as isolated tumor nodules lacking identifiable lymph node or vascular structures within the lymphatic drainage area of the primary tumor, are a key pathological feature of RC [3]. Extensive research has identified TD positivity as an independent prognostic factor in CRC. Reflecting this clinical relevance, the 8th edition of the AJCC cancer staging manual introduced a specific N1c category in the TNM staging system for patients with TD-positive lesions in the absence of lymph node metastasis [4].
However, growing clinical evidence has begun to challenge this binary classification approach. Recent studies suggest that quantifying the number of TD offers provides more accurate prognostic stratification than simply noting their presence or absence. Increasing TD counts have been consistently associated with worse outcomes, regardless of lymph node involvement. These findings underscore the limitations of current staging criteria and emphasize the biological significance of TD burden [5–7].
At present, clinical evaluation of TD relies predominantly on postoperative histopathological assessment, with no standardized protocols available for preoperative risk stratification based on TD quantification. MRI, recognized for its excellent soft tissue contrast, serves as the gold standard for preoperative locoregional staging and comprehensive risk evaluation in RC. Although morphology-based MRI criteria are currently the main imaging method for preoperative TD detection, this approach is limited by suboptimal diagnostic accuracy and significant variability between observers.
Radiomics, a quantitative imaging technique used to assess tumor microenvironment heterogeneity, has gained considerable attention for its ability to characterize the biological aggressiveness of RC. It has shown strong predictive performance for pathological risk factors, such as metastatic lymph node (MLN) involvement and extramural vascular invasion [8–10]. Critically, microenvironmental biomarkers manifest spatial distribution patterns transcending tumor margins into adjacent peritumoral anatomical compartments. Several previous studies have highlighted that advanced radiomic analysis of mesorectal nodule structures, particularly through detailed quantification of nodule features, can significantly improve the predictive capacity for TD assessment [10, 11]. This approach addresses limitations of earlier radiomics techniques by expanding feature extraction beyond the primary tumor area. We hypothesized that radiomic features derived from MRI, specifically extracted from both the primary tumor and the largest mesorectal nodule, hold significant discriminative potential for stratifying TD burden in RC. These distinct anatomical regions may capture complementary microenvironmental and morphological characteristics associated with TD development and progression.
Accordingly, by integrating features from both the tumor and adjacent mesorectal nodules, we aimed to enhance accuracy in preoperative evaluation of TD status.
Materials and methods
Study population
The study protocol was approved by the Ethics Committees of the Second Affiliated Hospital (approval number: 2024-79) and the Third Affiliated Hospital of Chongqing Medical University (approval number: 2024-66), with a waiver of informed consent. A total of 729 patients who underwent radical resection for RC between January 2018 and August 2024 at two institutions (designated A and B) were initially screened. Inclusion criteria were: (1) histologically confirmed rectal adenocarcinoma with documented TD counts, (2) no prior anticancer therapy, and (3) preoperative MRI performed within two weeks before surgery. Exclusion criteria were: (1) invisible tumor on T2-weighted imaging (T2WI) (n = 32), (2) no nodules with short-axis diameter (SD) ≥ 3 mm (n = 156), (3) poor image quality (n = 59), (4) missing TD status (n = 68), and (5) prior malignancy (n = 38)
A total of 376 patients were ultimately included. Based on TD counts from histopathology, patients were grouped into TD0 (no TD), TD1–2 (1–2 TDs), and TD3+ (≥ 3 TDs). Patients from Institution A were randomly split into training and internal test sets (8:2 ratio), while patients from Institution B served as the external validation cohort. The clinical and pathologic features for all enrolled patients were retrospectively obtained from the institutional medical records. The inclusion process is shown in Fig. 1.Fig. 1. The workflow for patient enrollment and assignment
Histopathological analysis
Two gastrointestinal pathologists (> 8 years of experience) assessed TD status on hematoxylin and eosin (H&E)-stained slides. TDs were defined as irregular, discrete carcinoma nodules in the pericolorectal adipose tissue lacking residual lymphoid architecture typical of lymph nodes.
Image acquisition and preprocessing
Institution A used a 3.0-T MRI scanner (Ingenia CX, Philips Healthcare) and a 1.5-T scanner (Avanto Eco, Siemens Healthineers), while Institution B used a 3.0-T MRI scanner (Discovery 750W, GE Healthcare). At both institutions, non-fat-suppressed T2WI was performed in the oblique axial plane, perpendicular to the involved rectal segment. Detailed MRI acquisition parameters are detailed in Table 1.Table 1MRI acquisition parametersInstitution AInstitution BPHILIPSSIEMENSGEIngenia CXAvanto ecoDiscovery 750wField strength (T)3.01.53.0Fat suppressionNonNonNonRepeat time (ms)154115004046Echo time (ms)1579770Number of average222Echo train length1306016Slice thickness (mm)0.923Slice gap (mm)000.5
All T2WI images underwent N4 bias field correction and were resampled to a voxel size of 1 mm × 1 mm × 1 mm
Image analysis
Two radiologists (with 3 and 5 years of gastrointestinal imaging experience, respectively) independently segmented regions of interest (ROIs) on T2WI using ITK-SNAP (v3.6.1, http://www.itksnap.org). They delineated the primary tumor and the largest SD mesorectal nodule, blinded to histopathology and guided by multiparametric sequences. Inter-observer reproducibility was evaluated on a random subset of 100 training cases.
Additionally, to assess model performance, Radiologists A and B received training from a senior radiologist (> 10 years of experience) and independently reviewed preoperative MRI from the validation set. They categorized TD status (TD0, TD1–2, or TD3+) based on established MR–TD morphological criteria, defined as an irregular nodule within or along a venous channel, continuous with major mesorectal venous branches but distinct from the primary tumor [12, 13]. As shown in Fig. 2.Fig. 2. Representative MRI and corresponding histopathological images from patients with different TD statuses. a A 75-year-old male with the largest nodule measuring 5.5 mm in SD on T2WI. Histopathological examination identified no TD; the largest nodule was confirmed as a normal lymph node. b A 55-year-old female with the largest nodule measuring 8.5 mm in SD on T2WI. Histology revealed two TDs, and the largest nodule was an MLN. c A 66-year-old female with the largest nodule measuring 12.6 mm in SD on T2WI. Histology identified five TDs, and the largest nodule was pathologically confirmed as a TD
Radiomics feature extraction and selection
A total of 1743 radiomics features were extracted from each ROI (tumor and nodule) using Pyradiomics (v3.1.0, https://pypi.org/project/pyradiomics). The features were categorized as follows: first-order statistics (n = 342), textural features (n = 1387) including gray-level co-occurrence matrix (GLCM), gray-level dependence matrix (GLDM), gray-level run-length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighboring gray-tone difference matrix (NGTDM), and 3D morphological features (n = 14). A pre-fusion strategy combined features from both regions. All features were standardized using Z-score normalization before modeling.
Each feature set (tumor, nodule, and fusion) underwent an identical preprocessing pipeline. Feature selection comprised three sequential steps: (1) initial filtration using two-tailed t-tests (p < 0.05) and inter-observer coherence index (ICC ≥ 0.75); (2) elimination of redundant features via Pearson correlation (|r| ≥ 0.9); and (3) final selection with least absolute shrinkage and selection operator regression using ten-fold cross-validation to optimize the feature subset.
Radiomics modeling and evaluation
To address class imbalance among TD subgroups, the synthetic minority over-sampling technique (SMOTE) was applied during training to augment the TD1–2 and TD3+ groups. Three models—tumor, nodule, and fusion—were subsequently developed using the extreme gradient boosting (XGBoost) algorithm, with hyperparameters optimized through five-fold cross-validation.
Model performance was evaluated within a multidimensional framework. Receiver operating characteristic (ROC) curves were generated, and key metrics—including area under the curve (AUC), precision, accuracy, recall, and F1-score—were derived from confusion matrices using a micro-averaging approach to mitigate class imbalance. The overall workflow is illustrated in Fig. 3.Fig. 3. The radiomics workflow of this study
Statistical analysis
Statistical analyses were conducted using Python (v3.12.1, https://www.python.org) and SPSS 26.0 (IBM) software. Continuous variables were assessed for normality using the Kolmogorov–Smirnov test and expressed as mean ± standard deviation or median (interquartile range), as appropriate. Group comparisons were performed using analysis of variance (ANOVA) for continuous variables and the Pearson chi-square test for categorical variables. A two-tailed p < 0.05 was considered statistically significant.
Results
Patients’ characteristics
This study included 376 patients (211 males and 165 females) from two institutions (Institution A: n = 274; Institution B: n = 102). Based on TD count, patients were categorized into TD0 (n = 172), TD1–2 (n = 124), and TD3+ (n = 82) groups. The mean ages in the training, test, and validation sets were 64 ± 12, 60 ± 14, and 61 ± 14 years, respectively. Among baseline characteristics, only the T stage differed significantly across the three datasets (p = 0.010) (Table 2).Table 2. Baseline characteristics of the study populationCharacteristicsTraining set (n = 219)Test set (n = 55)Validation set (n = 102)p valueAge (years)64 ± 1260 ± 1461 ± 140.069Gender0.743 Male125 (55%)32 (58%)54 (53%) Female94 (45%)23 (42%)48 (48%)Tumor location0.715 Low58 (26%)18 (33%)34 (33%) Middle120 (55%)27 (49%)49 (48%) High41 (19%)10 (18%)19 (19%)SD of the largest nodules (mm)0.7695.9 (4.7, 8.3)6.3 ± 2.46.3 (4.6, 7.7)T stage0.010 T1–T231 (14%)15 (27%)27 (26%) T3–T4188 (86%)40 (73%)75 (74%)N stage0.410 N044 (20%)11 (20%)27 (26%) N1–N2175 (80%)44 (80%)75 (74%)EMVI status0.716 Negative103 (47%)25 (46%)43 (42%) Positive116 (53%)30 (54%)59 (58%)TDs status TD098 (47%)26 (47%)48 (47%) TD1–275 (31%)17 (31%)32 (31%) TD3+48 (22%)12 (22%)22 (22%)Tumor location: low (< 5 cm from the anal verge), middle (5–10 cm), high (≥ 10 cm)SD short-axis diameter, TD tumor deposit, EMVI extramural vascular invasion
Radiomic modeling features
Following feature selection, three optimized radiomics feature sets were obtained (Table S1). The tumor, nodule, and fusion models incorporated 20, 16, and 21 features, respectively. The tumor model included 5 first-order, 5 GLCM, 1 GLDM, 3 GLRLM, 3 GLSZM, 2 NGTDM, and 1 shape feature. The nodule model comprised 1 first-order, 1 GLCM, 2 GLDM, 3 GLRLM, 6 GLSZM, 2 NGTDM, and 1 shape feature. The fusion model contained 3 first-order, 6 GLCM, 2 GLDM, 8 GLSZM, and 2 NGTDM features.
Performance of radiomics models
Tables 3 and 4 summarize the diagnostic performance of the three radiomics models, evaluated on the training, testing, and validation cohorts, respectively. Of these, the fusion model outperformed the tumor and nodule models across most metrics in the training set. It achieved the highest accuracy (0.872 vs 0.775 and 0.670), AUC (0.961 vs 0.930 and 0.926), precision (0.854 vs 0.808 and 0.776), and F1-score (0.861 vs 0.816 and 0.771). Although its recall (0.849) was slightly lower than that of the nodule model (0.909), it was comparable to the tumor model (0.863), indicating strong overall performance.Table 3. Overall performance of different models based on the training setModelsAUC (95% CI)AccuracyPrecisionRecallF1Tumor model0.930 (0.911–0.950)0.8080.7750.8630.816Nodule model0.926 (0.906–0.946)0.7760.6700.9090.771Fusion model0.961 (0.947–0.975)0.8540.8720.8490.861AUC the area under the receiver operating characteristic curve, CI confidence intervalTable 4Overall performance of different models based on test and validation setsModelsTumor modelNodule modelFusion modelPrecisionRecallF1PrecisionRecallF1PrecisionRecallF1Test set (n = 55) TD00.8330.7690.8000.9440.6540.7230.8150.8460.830 TD1–20.6670.7060.6860.5000.8240.6220.7860.6470.710 TD3+0.6920.7500.7200.4290.7500.5451.0000.5830.737 Micro0.7640.7640.7640.5710.8000.6670.8040.7450.774 AUC (95% CI)0.871 (0.811–0.931)0.839 (0.775–0.902)0.873 (0.812–0.933) Accuracy0.7450.6730.800Validation set (n = 102) TD00.7550.8330.7920.7860.6880.7330.8540.7290.787 TD1–20.7040.5940.6440.5380.6560.5920.7000.6560.677 TD3+0.4290.9550.5920.3530.8180.4930.5290.8180.643 Micro0.6880.7350.7100.6140.6860.6480.7120.7750.742 AUC (95% CI)0.848 (0.803–0.893)0.804 (0.753–0.856)0.858 (0.809–0.907) Accuracy0.7160.6370.784
Consistent trends were observed in the internal test and external validation sets, where the fusion model achieved superior accuracy (test: 0.800; validation: 0.784) and AUC (test: 0.873; validation: 0.858), with balanced precision, recall, and F1-scores. In contrast, the tumor (accuracy: 0.745/0.716; AUC: 0.871/0.848) and nodule (accuracy: 0.673/0.637; AUC: 0.839/0.804) models showed lower diagnostic performance. The ROC curves and confusion matrices across the three datasets demonstrated that the fusion model achieved the best overall classification performance, exhibiting the highest prediction accuracy and AUC values for TD0 through TD3+ (Figs. 4 and 5). In subgroup classification, the fusion model achieved the highest F1-scores in the test set for TD0 (0.830), TD1–2 (0.710), and TD3+ (0.734). In the validation set, it maintained comparable results, although slightly underperformed the tumor model for TD0 classification (F1-score: 0.787 vs 0.792). The fusion model preserved strong validation performance for TD1–2 (F1 = 0.677) and TD3+ (F1 = 0.643) groups. Despite reduced performance across all subgroups in the validation set relative to training, it consistently achieved higher AUCs than the other models, underscoring its superior discriminative capability.Fig. 4ROC curves for different models on training, test, and validation sets. a, d, g Show tumor model ROC curves for training, internal test, and external validation sets. b, e, h Display nodule model results across corresponding datasets. c, f, i Present fusion model performance in these three phasesFig. 5Confusion matrix for different models on training, test, and validation sets. a, d, g Show tumor confusion matrix for training, internal test, and external validation sets. b, e, h display nodule model results across corresponding datasets. c, f, i Present fusion model performance in these three phases
Comparison of the best radiomics model and radiologists
In the external validation set, the fusion model demonstrated superior classification performance compared with both radiologists (Table 5). It achieved higher accuracy (0.783) than Radiologist A (0.589) and Radiologist B (0.676). The model also exhibited greater sensitivity across all subgroups (TD0: 0.729 vs A/B: 0.708/0.771; TD1–2: 0.626 vs 0.500/0.625; TD3+: 0.818 vs 0.455/0.545). In addition, it achieved higher specificity for TD0 (0.889) and TD1–2 (0.871) compared with both radiologists, although its specificity for TD3+ (0.800) was slightly lower than that of Radiologist A (0.825) and Radiologist B (0.900).Table 5. Performance of radiologists and radiomics model on validation set (n = 102)Fusion modelRadiologist ARadiologist BSensitivitySpecificitySensitivitySpecificitySensitivitySpecificityTD00.7290.8890.7080.8150.7710.833TD1-20.6260.8710.5000.7430.6250.814TD3+0.8180.8000.4550.8250.5450.900Accuracy0.7840.5890.676
Discussion
Clinical studies have consistently shown that a higher TD burden is strongly associated with poorer clinical outcomes, with increasing TD counts correlating with progressively lower postoperative survival rates [14–17]. In this study, we evaluated the diagnostic value of radiomic features extracted from the primary tumor on T2WI and the largest mesorectal nodule (based on maximum SD) for preoperative TD risk stratification. The fusion model integrating features from both regions achieved superior classification performance compared with single-region models, with accuracies of 0.800 (44/55 cases) in the test set and 0.784 (80/102 cases) in the validation set. Moreover, it outperformed two radiologists, demonstrating accuracy improvements of 0.195 and 0.108, respectively. These results highlight the effectiveness of the dual-region radiomics model in preoperatively stratifying TD risk in RC patients and its potential to enhance clinical decision-making and treatment planning.
Current preoperative TD assessment in RC primarily relies on MRI-based morphological evaluation [12, 18]. This conventional approach depends on subjective image interpretation, resulting in high interobserver variability and limited diagnostic consistency. These limitations underscore the need for objective, standardized, and reproducible tools to improve the accuracy and reliability of TD risk stratification in RC management.
Previous studies on radiomics for preoperative TD assessment have mainly focused on binary patient-level classification. Atre et al [19] analyzed texture features from 2D-ROI on T2WI of the largest TD and MLN, identifying skewness as a key differentiator (AUC = 0.70, sensitivity = 70%, specificity = 72%). Yang et al [20] developed a T2WI-based model using tumor dimension features, achieving improved performance (AUC = 0.80, sensitivity = 72%, specificity = 94%). Zhang et al [10] further advanced this approach by employing contrast-enhanced CT arterial-phase images with 3D-ROI from both the primary tumor and perirectal nodules. Their nodule-based nomogram outperformed tumor-based models, demonstrating higher diagnostic accuracy (AUC = 0.918, sensitivity = 88.3%, specificity = 82.6%) and underscoring the predictive value of features derived from peritumoral nodules.
In this study, a combined feature extraction strategy incorporating both tumor and maximal nodule regions was applied to explore multiregional radiomics characteristics on T2WI for three-category TD classification. Through a rigorous feature selection process, three optimized models were developed: the tumor model included 20 features, the nodule model comprised 16, and the fusion model integrated 21 features across both regions (11 from the tumor and 10 from the nodule). The three models exhibited different predictive performance levels, reflecting the distinct contributions of tumor- and nodule-based features. The nodule model demonstrated the lowest overall classification performance, which is likely attributed to the fact that not all the largest mesorectal nodules visible on T2WI corresponded directly to TD in this study. Some represented metastatic or benign enlarged lymph nodes, which reduced model specificity. However, the primary objective of this study was to stratify the overall TD-related risk in RC rather than determine the pathological nature of individual nodules. Notably, a significant difference was observed in the maximum SD of mesorectal nodules across TD groups: TD0 = 5.2 (4.0, 6.3) mm, TD1–2 = 6.4 (5.2, 7.7) mm, and TD3 + = 8.7 ± 3.0 mm (p < 0.001), suggesting that an increase in the SD of the largest mesorectal nodule is associated with elevated TD risk. Selecting the largest nodule for analysis improved both stratification accuracy and reproducibility. Importantly, the superior performance of the fusion model likely stems from both the primary tumor and the largest mesorectal nodule, supporting the hypothesis that tumor microenvironmental remodeling extending from the primary tumor lesion into adjacent perirectal tissues contributes to TD formation [21, 22].
Additionally, we compared the diagnostic performance of the fusion model with that of radiologists, and the fusion model demonstrated clear superiority (accuracy: 0.784 vs 0.589/0.676). This highlights its potential to reduce diagnostic subjectivity, an important advancement given the prognostic significance of TD count in the clinical staging of RC [23–25]. When the SD of the largest mesorectal nodule is ≥ 3 mm, the fusion model significantly improves the accuracy of preoperative TD risk assessment, thereby facilitating a more precise prognostic evaluation and potential clinical decision optimization. For instance, patients in the TD3+ group could benefit from intensified neoadjuvant therapy or wider surgical margins, while those in the TD0 group, lacking additional risk factors, may avoid unnecessary overtreatment [26].
This study has several limitations. Firstly, because this was a retrospective study, selection bias may have been introduced. Secondly, due to the lack of a direct correlation between the images and pathological sections, the features of the largest nodule do not necessarily reflect the true TD characteristics. Thirdly, due to the short follow-up period for most patients, the prognostic value of the radiomics model could not be evaluated. Lastly, the model’s predictions should be interpreted alongside clinical information, as a supplement to radiologists’ interpretations. Future studies with larger samples, advanced imaging technologies, and deep learning will improve its accuracy and applicability.
Conclusion
Our study demonstrates that a T2WI-based radiomics model effectively preoperatively stratifies TD risk in RC. By integrating features from both the primary tumor and the largest mesorectal nodule via the XGBoost algorithm, the fusion model captures complementary tumor microenvironment information and achieves superior predictive performance over single-region models.
Supplementary information
ELECTRONIC SUPPLEMENTARY MATERIAL
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Pub Med (2024) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Pub Med. https://pubmed.ncbi.nlm.nih.gov/33538338/. Accessed 1 Feb 202410.3322/caac.2166033538338 · doi ↗ · pubmed ↗
- 2Weiser MR (2018) AJCC 8th edition: Colorectal cancer. Ann Surg Oncol. 10.1245/s 10434-018-6462-110.1245/s 10434-018-6462-129616422 · doi ↗ · pubmed ↗
- 3Li H, Chen X-L, Liu H et al (2023) MRI-based multiregional radiomics for preoperative prediction of tumor deposit and prognosis in resectable rectal cancer: a bicenter study. Eur Radiol. 10.1007/s 00330-023-09723-910.1007/s 00330-023-09723-937160427 · doi ↗ · pubmed ↗
- 4Zhao W, Xu H, Zhao R et al (2023) MRI-based radiomics model for preoperative prediction of lateral pelvic lymph node metastasis in locally advanced rectal cancer. Acad Radiol. 10.1016/j.acra.2023.07.01610.1016/j.acra.2023.07.01637643928 · doi ↗ · pubmed ↗
- 5Wang Q, Lin Y, Ding C et al (2024) Multi-modality radiomics model predicts axillary lymph node metastasis of breast cancer using MRI and mammography. Eur Radiol. 10.1007/s 00330-024-10638-210.1007/s 00330-024-10638-238337068 · doi ↗ · pubmed ↗
