Predicting intrahepatic recurrence of colorectal cancer liver metastases after curative hepatectomy using a machine learning model with data integration of ultrasound radiomics and clinicopathological parameters

Ting Hu; Zhong Liu; Pengpeng Kuang; Yunyun Li; Weixuan Kong; Wei Zheng; Jianjun Li; Guangjian Liu; Han Zhang; Xin Chen; Ruhai Zou

PMC · DOI:10.1186/s13244-026-02227-2·March 16, 2026

Predicting intrahepatic recurrence of colorectal cancer liver metastases after curative hepatectomy using a machine learning model with data integration of ultrasound radiomics and clinicopathological parameters

Ting Hu, Zhong Liu, Pengpeng Kuang, Yunyun Li, Weixuan Kong, Wei Zheng, Jianjun Li, Guangjian Liu, Han Zhang, Xin Chen, Ruhai Zou

PDF

Open Access

TL;DR

A machine learning model combining ultrasound imaging data and clinical factors improves the prediction of cancer recurrence in liver metastases patients after surgery.

Contribution

A novel machine learning model integrating ultrasound radiomics and clinicopathological data for predicting intrahepatic recurrence in CRLM patients.

Findings

01

The cRadiomics model achieved AUC values of 0.811 and 0.784 in main and external cohorts, outperforming clinical and radiomics-only models.

02

Six clinical parameters and seven radiomics features were identified as strong predictors of intrahepatic recurrence.

03

The model shows potential to improve clinical decision-making and personalized follow-up for CRLM patients.

Abstract

To develop and validate a machine learning model integrating ultrasound radiomics and clinicopathological parameters to predict intrahepatic recurrence in colorectal cancer liver metastases (CRLM) patients after curative hepatectomy. This retrospective study enrolled 278 eligible CRLM patients (age, 55 ± 12 years; male, 188) from two centers, including a main cohort (n = 224, July 2010–February 2021) and an external cohort (n = 54, February 2015–October 2020). Patients were stratified by recurrence status during a 2-year follow-up. Preoperative ultrasound images and clinicopathological parameters were collected. Radiomics features were extracted from liver metastases, peri-tumor areas, and disease-free liver parenchyma. Using least absolute shrinkage and selection operator (LASSO) analysis and support vector machine algorithms, three predictive models were developed: clinical,…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures7

Click any figure to enlarge with its caption.

Workflow for model development and evaluation

Bar plots of the coefficients for correlations between intrahepatic recurrence and selected features according to Spearman correlation analysis. GLCM, gray-level co-occurrence matrix; GLSZM, gray-level size zone matrix; GLRLM, gray-level run length matrix; CRLM, colorectal cancer liver metastases; CA19-9, carbohydrate antigen 19-9Fig. 4Heatmap showing the correlation degree for each pair of selected features. GLCM, gray-level co-occurrence matrix; GLSZM, gray-level size zone matrix; GLRLM, gray-level run length matrix; CRLM, colorectal cancer liver metastases; CA19-9, carbohydrate antigen 19-9

ROC and calibration curves of various models in the internal validation: a, c Present respectively the ROC and calibration curves on the training data of the main cohort (the sample size of the training data is 180 × 20, i.e., the number of patients for four training folds in each FFCV multiplied by the number of FFCV runs). b, d Correspond respectively to the ROC and calibration curves on the testing data of the main cohort (the sample size of the testing data is 44 × 20, i.e., the number of patients for one testing fold in each FFCV multiplied by the number of FFCV runs). The circle points i

ROC and calibration curves of various models in the external validation: a, c show respectively the ROC and calibration curves on the training data of the main cohort (the sample size of the training data is 224, i.e. the total number of patients in the main cohort); b, d show respectively to the ROC and calibration curves on the testing data of the external cohort (the sample size of the testing data is 54, i.e. the total number of patients in the external cohort). The circle points in a denote the cutoff points identified according to the Youden index. ROC, receiver-operating-characteristic;

Funding3

—http://dx.doi.org/10.13039/501100001809National Natural Science Foundation of China
—http://dx.doi.org/10.13039/501100002858China Postdoctoral Science Foundation
—Shenzhen Basic Science Research (Key Program)

Keywords

Colorectal cancer liver metastasesUltrasound radiomicsIntrahepatic recurrenceMachine learning

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHepatocellular Carcinoma Treatment and Prognosis · Radiomics and Machine Learning in Medical Imaging · Liver Disease Diagnosis and Treatment

Full text

Introduction

Colorectal cancer (CRC) is the third most commonly diagnosed cancer worldwide and the second leading cause of cancer-related mortality [1]. The liver is the preferred site of metastasis for CRC due to its anatomical proximity and connection to the portal systemic circulation. Nearly 50% of patients develop liver metastases during the course of the disease [2, 3]. Therapeutic options for CRC liver metastases (CRLM) include hepatectomy, systemic chemotherapy, radiotherapy, hepatic artery embolization, and thermal ablation techniques, such as microwave coagulation therapy and radiofrequency ablation [4]. Hepatectomy remains the standard treatment for CRC liver metastases (CRLM), providing long-term survival and potential cure in 20%–50% of patients who undergo complete resection [5]. However, previous studies have shown that only 10%–20% of patients with CRLM are initially eligible for curative resection [6, 7]. For resectable liver metastases, patients should receive perioperative chemotherapy, which comprises three months of FOLFOX4 (5-fluorouracil, folinic acid, and oxaliplatin) before surgery, followed by another three months postoperatively. For unresectable liver metastases, neoadjuvant chemotherapy is administered to convert initially unresectable CRLM into resectable cases [8, 9]. However, prolonged chemotherapy can result in chemotherapy-induced liver injury, such as hepatic steatosis, which can impair the sensitivity of preoperative imaging modalities in detecting lesions [10, 11]. Despite therapeutic advancements having reduced the risk of recurrence after surgery, approximately 50%–70% of patients still experience recurrence, with the majority occurring within two years post-resection. One-third of these patients have a recurrence confined to the liver [12–14]. This high recurrence rate is associated with a reduced 5-year survival rate of only 37%–58% [15]. Thus, timely diagnosis and reliable recurrence prediction are critical for precision medicine and personalized follow-up, thereby ultimately improving outcomes and quality of life.

Radiomics extracts a wide range of quantitative features from medical images to detect subtle microstructural changes within tumors that are hardly perceptible to the naked eye, providing insights correlating with tumor biology, treatment response, potential recurrence risk, and prognosis [16, 17]. Based on preoperative whole-liver enhanced computed tomography (CT) images, a radiomics model showed promising performance [18] for predicting metachronous metastases within two years following rectal cancer surgery. Simpson et al demonstrated that radiomics characteristics based on CT may identify patients at risk of recurrence after hepatectomy, including early CRLM recurrence within 6 months [19, 20].

Recent studies report that ultrasound-based radiomics, as a promising branch of radiomics, has been applied to assess focal liver lesions, such as hepatocellular carcinoma [21], liver metastases [22], and diffuse hepatic diseases [23]. This study aims to develop and validate a machine learning model to predict intrahepatic recurrence in patients with colorectal liver metastases after curative hepatectomy. The model integrates ultrasound radiomics features from tumors, peritumoral regions, and grossly disease-free liver parenchyma with clinicopathological parameters. The findings of this study may contribute to improving clinical decision-making and enabling personalized management and risk-adapted follow-up for CRLM patients.

Materials and methods

Study population

This study was conducted in accordance with the Helsinki Declaration and was approved by the Medical Ethics Committee of Sun Yat-Sen University Cancer Center (No. B2023-605-01). Informed consent for this retrospective study was waived due to its observational design. However, each participant provided written informed consent for the Collection and Application of Clinical Samples and Medical Data, certified and approved by the ethics committee of Sun Yat-Sen University Cancer Center upon hospital admission. Consecutive patients who underwent initial resection of CRLM between July 2010 and February 2021 were retrospectively recruited from the clinical database of Sun Yat-Sen University Cancer Center (SYSUCC, Guangzhou, China). Among these patients, eligible individuals were selected to form the main cohort for model training and internal evaluation. The inclusion criteria were: (1) age ≥ 18 years old, (2) pathologically confirmed colorectal adenocarcinoma, (3) curative resection for the liver metastases, and (4) a preoperative liver ultrasound examination. The exclusion criteria were: (1) extrahepatic metastases to the lung, bone, brain, peritoneum, and distant lymph nodes, (2) inadequate clinical information, (3) poor quality of ultrasound images, (4) extrahepatic relapse, and (5) history of other malignancies. From February 2015 to October 2020, eligible patients from the Sixth Affiliated Hospital of Sun Yat-Sen University (SYSUASH, Guangzhou, China) were identified based on the same criteria, constituting an independent cohort for external validation of the model derived from the main cohort. A flowchart illustrating the patient screening process is presented in Fig. 1.Fig. 1. Workflow for patient screening

Clinical data collection

Clinical data for patients with CRLM, including patient-related factors, primary tumor factors, liver metastasis factors, and treatment-related factors, were retrospectively collected from medical records. Synchronous liver metastases were defined as those detected at or before the diagnosis of the primary lesion, while metachronous metastases were identified as those detected after CRC surgery [24]. Some CRLM patients underwent not only hepatectomy but also intraoperative open ablation. “Complete ablation” was defined as the absence of enhancement in the entire ablated area on follow-up contrast-enhanced CT or magnetic resonance imaging (MRI) at 1 month post-ablation [25]. Preoperative levels of carcinoembryonic antigen (CEA) and carbohydrate antigen 19-9 (CA19-9) were obtained from laboratory tests. Intrahepatic recurrence, defined as the detection of new intrahepatic lesions or metastases with imaging features typical of colorectal liver metastasis on abdominal contrast-enhanced CT or contrast-enhanced MRI, or atypical findings confirmed by histopathological examination, within two years of curative resection.

Ultrasound image acquisition and preprocessing

Preoperative liver grayscale ultrasound images in DICOM format, acquired using Philips IU22 (Philips Medical Systems), GE Logiq E9 (GE Healthcare), and similar diagnostic instruments, were also collected from the medical recording system. Under the supervision of a senior radiologist with 10 years of ultrasound experience, a junior radiologist with 3 years’ experience delineated the tumor, peritumoral, and liver parenchyma region of interest (ROI) for all patients. For each lesion, an ROI was manually delineated around the tumor boundary on the grayscale ultrasound image of the largest cross-section. A peritumoral ROI was automatically expanded by 1 cm from the lesion boundary, while a parenchymal ROI with a specified diameter (2 cm) was manually plotted in areas of the liver parenchyma, ensuring that the selected area was free of visible disease or anatomical structures. Cases for which a 1 cm peritumoral expansion was infeasible, because of the proximity to the liver capsule or other anatomical boundaries, were excluded. All ROIs were reviewed by the senior radiologist, and adjustments were made as necessary to ensure accurate delineation. Subsequently, all images underwent standardized preprocessing, including intensity discretization using a fixed bin width. Radiomics features were extracted using the PyRadiomics extension in 3D Slicer (version 5.0.3) and standardized using z-score to ensure comparability across patients.

Model development

Prognostic models were established using data from the main cohort. During model development, radiomics features and clinicopathological parameters, were screened using least absolute shrinkage and selection operator (LASSO) in order to identify strong predictors of intrahepatic recurrence. Spearman analysis was performed to assess the correlations between each pair of feature variables screened out by LASSO, and those between the variables and the outcome of recurrence, with the sole purpose of evaluating both the utility of the LASSO-selected features and their redundancy. Thereafter, the machine learning algorithm of support vector machine was used to build three prognostic models: a clinical model based on clinicopathological parameters, a radiomics model based on ultrasound radiomics features, and a combined model (cRadiomics) integrating both. Figure 2 presents an overview of the model development procedure.Fig. 2. Workflow for model development and evaluation

Model evaluation

Both internal and external validations were performed to evaluate the models’ classification performances. For internal validation, twenty-five-fold cross-validation (FFCV) was performed on the main cohort (Multiple runs of FFCV enable robust comparisons, as each run involves random data splitting, and repeating the process accounts for such randomness to enhance result reliability.). After completing twenty rounds of FFCV, the predicted probabilities for samples involved in internal training or testing of FFCV were aggregated together to generate a composite receiver operating characteristic (ROC) curve [26], from which a set of quantitative metrics, including accuracy, sensitivity, specificity, and the area under the ROC curve (AUC), were calculated for performance evaluation. For external validation, all models were retrained on the entire main cohort data; the resulting models were then evaluated on the external cohort using metrics identical to those of internal validation (Since multiple models were trained on the main cohort across repeated FFCV runs during internal validation, these internal validation-derived models were excluded from the external test to alleviate storage and computational burdens in real-world deployment scenarios. Instead, retraining on the entire main cohort produced a single model for external validation—a process that mirrors real-world practices, ensuring efficient model development and deployment. See supplementary material for more details on the internal/external validations. Apart from assessing the models’ classification performance, we also evaluated their calibrations by means of calibration plots, calibration slope/intercept, and Brier score [27].

Statistical analysis

Continuous clinical variables were presented as mean ± standard deviation and compared using the Mann–Whitney U-test, while categorical variables were shown as frequencies (proportion, %) and analyzed using the Chi-square test, except preoperative CEA (< 200 or > 200 ng/mL), which was assessed using Fisher’s exact test. Spearman analysis was conducted to assess the correlations between each pair of feature variables screened out by LASSO, and those between the variables and the outcome of recurrence. The 95% confidence intervals (CIs) for the accuracy, sensitivity, specificity, and AUC were computed using the Clopper–Pearson method [28]. The DeLong test was used to measure the significance of differences between AUC values. A two-tailed p-value less than 0.05 was considered statistically significant. Data processing and model development were conducted using scikit-learn (version 0.23.2) in Python (version 3.8.1).

Results

Patient characteristics

A total of 224 patients (age, 54 ± 11 years; male 154) from SYSUCC were included. The main cohort of patients consisted of 154 (68.8%) males and 70 (31.2%) females, all of whom underwent curative liver resection with or without additional intraoperative open ablation. By the end of follow-up, 129 (57.6%) patients experienced intrahepatic relapse, while 95 (42.4%) showed no evidence of relapse. Patients with intrahepatic recurrence were more likely to present with synchronous liver metastases (n = 109, 84.5% vs n = 63, 66.3%; p = 0.001), bilobar liver metastases (n = 68, 52.7% vs n = 21, 22.1%; p < 0.001), and a tumor number ≥ 3 (n = 79, 61.2% vs n = 37, 38.9%; p = 0.001). These patients also tend to receive adjuvant chemotherapy (n = 122, 94.6% vs n = 74, 77.9%; p < 0.001), targeted therapies (n = 57, 44.2% vs n = 25, 26.3%; p = 0.006), and intraoperative ablation (n = 48, 37.2% vs n = 19, 20%; p = 0.005). Furthermore, pathological lymph node positivity (N1–N2) (n = 75, 58.2% vs n = 37, 38.9%; p = 0.005) and preoperative CA19-9 > 200 U/mL (n = 13, 10.1% vs n = 2, 2.1%; p = 0.018) were common in patients with intrahepatic recurrence. No significant differences were observed between the two groups in terms of age, gender, primary tumor location, maximum diameter of CRLMs, and other clinical variables (all p > 0.05). Due to a relatively high proportion of missing values (23.0%) of microsatellite instability (MSI) across both cohorts, it was not included in groupwise statistical comparisons or multivariable modeling.

By screening patients from SYSUASH, a total of 54 patients (age, 58 ± 13 years; male 34) were included as the external cohort, among which 51 patients were diagnosed as synchronous CRLM, and 44 patients underwent simultaneous resection. Most patients (74.1%) had a tumor number <3. Thirty-eight patients experienced intrahepatic relapse within two years after liver resection. Patient baseline characteristics are summarized in Table 1.Table 1. Baseline characteristics for patients in the main and external cohortsVariableMain cohortExternal cohortTotalIntrahepatic recurrence (n = 129)Without intrahepatic recurrence (n = 95)pTotalIntrahepatic recurrence (n = 38)Without intrahepatic recurrence (n = 16)pAge, years22453 ± 12 (25–73)55 ± 11 (29–83)0.2035458 ± 13 (27–81)59 ± 11 (38–81)0.631Age0.5830.808 ≤ 6014690 (69.8)63 (66.3)2920 (52.6)9 (56.3) > 607839 (30.2)32 (33.7)2518 (47.4)7 (43.7)Sex0.6230.964 Male15487 (67.4)67 (70.5)3424 (63.2)10 (62.5) Female7042 (32.6)28 (29.5)2014 (36.8)6 (37.5)Primary tumor location0.2390.843 Right colon4227 (20.9)15 (15.8)2218 (47.4)4 (25) Left colon9448 (37.2)46 (48.4)94 (10.5)5 (31.3) Rectum8854 (41.9)34 (35.8)2316 (42.1)7 (43.7)Timing of liver metastasis0.0010.547 Synchronous172109 (84.5)63 (66.3)5135 (92.1)16 (100) Metachronous5220 (15.5)32 (33.7)33 (7.9)0Number of CRLMs0.0010.515 < 310850 (38.8)58 (61.1)4027 (71.1)13 (81.3) ≥ 311679 (61.2)37 (38.9)1411 (28.9)3 (18.7)Maximum diameter of CRLMs0.8590.306 < 3 cm11266 (51.2)46 (48.4)3724 (63.2)13 (81.3) 3–5 cm7139 (30.2)32 (33.7)1310 (26.3)3 (18.7) > 5 cm4124 (18.6)17 (17.9)44 (10.5)0Location of CRLMs< 0.0010.08 Unilobar13561 (47.3)74 (77.9)3622 (57.9)14 (87.5) Bilobar8968 (52.7)21(22.1)1816 (42.1)2 (12.5)Simultaneous resection0.2470.249 No14680 (62)66 (69.5)109 (23.7)1 (6.3) Yes7849 (38)29 (30.5)4429 (76.3)15 (93.7)Preoperative chemotherapy< 0.0010.947 No287 (5.4)21 (22.1)2417 (44.7)7 (43.7) Yes196122 (94.6)74 (77.9)3021 (55.3)9 (56.3)Use of targeted drugs0.0060.474 No14272 (55.8)70 (73.7)4329 (76.3)14 (87.5) Yes8257 (44.2)25 (26.3)119 (23.7)2 (12.5)Intraoperative ablation0.0051 No15781 (62.8)76 (80)5337 (97.4)16 (100%) Yes6748 (37.2)19 (20)11 (2.6)0Pathological T stage0.380.342 1-23920 (41.9)19 (20)3728 (73.7)9 (56.3) 3-4185109 (58.1)76 (80)1710 (26.3)7 (43.7)Pathological N stage0.0050.3 011254 (41.8)58 (61.1)1713 (34.2)4 (25) 1–211275 (58.2)37 (38.9)3725 (65.8)12 (75)High MSINANA No160100 (77.5)60 (63.2)4936 (94.7)13 (81.3) Yes42 (1.6)2 (2.1)101 (6.3) Unknown6027 (20.9)33 (34.7)42 (5.3)2 (12.4)Preoperative CEA, ng/mL0.507 ≤ 56942 (32.6)27 (28.4)3221 (55.3)11 (68.7)0.357 > 515587 (67.4)68 (71.6)2217 (44.7)5 (31.3)Preoperative CA19-9, U/mL0.9050.751 ≤ 35173100 (77.5)73 (76.8)3826 (68.4)12 (75) > 355129 (22.5)22 (23.2)1612 (31.6)4 (25)Preoperative CEA, ng/mL0.243NA ≤ 200217123 (95.3)94 (98.9)5438 (100)16 (100) > 20076 (4.7)1 (1.1)000Preoperative CA19-9, U/mL**0.0*180.306 ≤ 200209116 (89.9)93 (97.9)4933 (86.8)16 (100) > 2001513 (10.1)2 (2.1)55 (13.2)0Continuous and categorical variables were compared using the Mann–Whitney U-test and chi-square test, respectively. The asterisk denotes that Fisher’s exact test was utilized. “Unknown” MSI status indicates missing or unrecorded data. Bold values indicate statistically significant differences (p < 0.05)CRLM colorectal cancer liver metastases, CAE carcinoembryonic antigen, CA19-9 carbohydrate antigen 19-9, NA not applicable, MSI microsatellite instability

Selected features

The most significant predictors were identified through LASSO, while Spearman correlation analysis was used to measure the correlation between each pair of the identified variables. Preoperative CA19-9 > 200 U/mL, pathological lymph node positivity, the location and timing of liver metastases, preoperative chemotherapy and use of targeted drugs were correlated well with intrahepatic recurrence, while the bilobar liver metastases were the strongest risk factor, showing the highest Spearman correlation coefficient (r = 0.29) (Fig. 3). A total of 837 radiomics features were extracted, including 162 first-order features and 675 texture features [29]. Finally, three liver tumor texture features (autocorrelation, size zone nonuniformity, short run emphasis), two peritumoral texture features (cluster prominence, zone nonuniformity), and two liver parenchyma texture features (small area emphasis, uniformity) were selected. Autocorrelation and uniformity demonstrated negative correlations with intrahepatic recurrence after liver resection, with correlation coefficients of −0.12 and −0.24, respectively. Short-run emphasis and uniformity show a strong negative correlation value of −0.65. Preoperative CA19-9 levels greater than 200 U/mL show a moderate positive correlation with pathological N staging (Fig. 4).Fig. 3. Bar plots of the coefficients for correlations between intrahepatic recurrence and selected features according to Spearman correlation analysis. GLCM, gray-level co-occurrence matrix; GLSZM, gray-level size zone matrix; GLRLM, gray-level run length matrix; CRLM, colorectal cancer liver metastases; CA19-9, carbohydrate antigen 19-9Fig. 4Heatmap showing the correlation degree for each pair of selected features. GLCM, gray-level co-occurrence matrix; GLSZM, gray-level size zone matrix; GLRLM, gray-level run length matrix; CRLM, colorectal cancer liver metastases; CA19-9, carbohydrate antigen 19-9

Model performance and comparisons

In internal validation, cRadiomics achieved an AUC of 0.841 (95% CI: 0.836–0.845) on the training data of the main cohort, compared to 0.784 (0.779–0.789) and 0.749 (0.744–0.754) for Radiomics and the Clinical model, respectively (ps < 0.01) (Fig. 5a and Table 2). Regarding calibration in this comparison, cRadiomics presented a calibration slope of 0.974, an intercept of 0.015, and a Brier score of 0.163, which were more favorable than those of the other two models, specifically 0.950, 0.023, and 0.195 for Radiomics, and 0.908, 0.055, and 0.181 for the Clinical model (Fig. 5c and Table 2). On the testing data of the main cohort, cRadiomics achieved an AUC of 0.811 (95% CI: 0.755–0.861), compared to 0.744 (0.683–0.801) and 0.706 (0.641–0.764) for Radiomics and the Clinical model, respectively (ps < 0.01) (Fig. 5b and Table 3). For calibration in this context, compared to the other models, cRadiomics exhibited a lower calibration slope (0.811 vs 0.832 and 0.960) and a higher intercept (0.108 vs 0.102 and 0.085), whereas its Brier score was higher (0.183 vs 0.192 and 0.204) (Fig. 5d and Table 3). Boxplots of the AUCs yielded by multiple runs of FFCV of each model demonstrated the superior performance of cRadiomics over the others (Fig. S1).Fig. 5ROC and calibration curves of various models in the internal validation: a, c Present respectively the ROC and calibration curves on the training data of the main cohort (the sample size of the training data is 180 × 20, i.e., the number of patients for four training folds in each FFCV multiplied by the number of FFCV runs). b, d Correspond respectively to the ROC and calibration curves on the testing data of the main cohort (the sample size of the testing data is 44 × 20, i.e., the number of patients for one testing fold in each FFCV multiplied by the number of FFCV runs). The circle points in a denote the cutoff points identified according to the Youden index. ROC, receiver-operating-characteristic; AUC, area under the ROC curve; cRadiomics, clinical and radiomics feature-combined modelTable 2Predictive performances of various models on the training data of the main cohort during internal validationModelACC (95% CI)SEN (95% CI)SPE (95% CI)AUC (95% CI)Cutoff valueCalib slopeCalib interceptBrier scoreClinical model0.684 (0.679–0.690)0.648 (0.640–0.655)0.745 (0.737–0.753)0.749 (0.744–0.754)0.4540.9080.0550.181Radiomics model0.714 (0.708–0.719)0.691 (0.684–0.698)0.751 (0.742–0.759)0.784 (0.779–0.789) p < 0.0010.4430.9500.0230.195cRadiomics model0.765 (0.760–0.770)0.745 (0.738–0.751)0.799 (0.791–0.806)0.841 (0.836–0.845) p < 0.0010.4410.9740.0150.16395% CIs were computed using the Clopper–Pearson method. p values measure the differences of AUC values with that of the Clinical model as baseline. Cutoff values were derived from the optimal cutoff point identified based on the Youden index of the ROC curves on the training data. Calibration (Calib) slope and intercept were derived by line fitting to the sample points of the calibration curveACC accuracy, SEN sensitivity, SPE specificity, AUC area under the receiver operating characteristic curve, CI confidence interval, cRadiomics clinical and radiomics feature-combined modelTable 3Predictive performances of various models on the testing data of the main cohort during internal validationModelACC (95% CI)SEN (95% CI)SPE (95% CI)AUC (95% CI)Cutoff valueCalib slopeCalib interceptBrier scoreClinical model0.663 (0.599–0.727)0.679 (0.594–0.761)0.641 (0.537–0.738)0.706 (0.641–0.764)0.4540.8600.0850.204Radiomics model0.674 (0.608–0.735)0.674 (0.586–0.754)0.675 (0.570–0.766)0.744 (0.683–0.801) p < 0.010.4430.8320.1020.192cRadiomics model0.735 (0.674–0.793)0.738 (0.652–0.810)0.731 (0.625–0.813)0.811 (0.755–0.861) p < 0.010.4410.8110.1080.18395% CIs were computed using the Clopper–Pearson method. p values measure the differences of AUC values with that of the Clinical model as baseline. Cutoff values were derived from the optimal cutoff point identified based on the Youden index of the ROC curves on the training data of the main cohort. Calibration (Calib) slope and intercept were derived by line fitting to the sample points of the calibration curveACC accuracy, SEN sensitivity, SPE specificity, AUC area under the receiver operating characteristic curve, CI confidence interval, cRadiomics clinical and radiomics feature-combined model

In external validation, cRadiomics achieved an AUC of 0.820 (95% CI: 0.765–0.869) on the training data of the main cohort, which was significantly higher than the other two models [0.790 (0.731–0.842) and 0.737 (0.674–0.793) for Radiomics and the Clinical model, respectively, ps < 0.05] (Fig. 6a and Table 4). Regarding calibration in this comparison, cRadiomics presented a calibration slope of 1.016, an intercept of 0.010, and a Brier score of 0.184, which were better than those of the other models, specifically 0.952, 0.066, and 0.210 for Radiomics, and 0.908, 0.084, and 0.224 for the Clinical model (Fig. 6c and Table 4). For testing on the external cohort, cRadiomics achieved an AUC of 0.784 (95% CI: 0.644–0.880), which was also significantly higher than those achieved by the other two models [AUCs: 0.724 (0.584–0.835) and 0.696 (0.564–0.820) for Radiomics and the Clinical model, respectively, ps < 0.05] (Fig. 6b and Table 5). For calibration in this context, cRadiomics presented a calibration slope of 0.784, an intercept of 0.122, and a Brier score of 0.250, which were more favorable than those of the other two [0.446, 0.294, and 0.285 for Radiomics, and 0.772, 0.211, and 0.336 for the Clinical model (Fig. 6d and Table 5).Fig. 6ROC and calibration curves of various models in the external validation: a, c show respectively the ROC and calibration curves on the training data of the main cohort (the sample size of the training data is 224, i.e. the total number of patients in the main cohort); b, d show respectively to the ROC and calibration curves on the testing data of the external cohort (the sample size of the testing data is 54, i.e. the total number of patients in the external cohort). The circle points in a denote the cutoff points identified according to the Youden index. ROC, receiver-operating-characteristic; AUC, area under the ROC curve; cRadiomics, clinical and radiomics feature-combined modelTable 4Predictive performances of various models on the training data of the main cohort during external validationModelACC (95% CI)SEN (95% CI)SPE (95% CI)AUC (95% CI)Cutoff valueCalib slopeCalib interceptBrier scoreClinical model0.708 [159/224] (0.646–0.768)0.690 [89/129] (0.603–0.768)0.738 [70/95] (0.636–0.822)0.737 (0.674–0.793)0.4640.9080.0840.224Radiomics model0.737 [165/224] (0.674–0.793)0.755 [97/129] (0.668–0.824)0.708 [67/95] (0.603–0.794)0.790 (0.731–0.842) p < 0.050.4280.9520.0660.210cRadiomics model0.766 [172/224] (0.707–0.821)0.769 [99/129] (0.685–0.837)0.762 [72/95] (0.659–0.840)0.820 (0.765–0.869) p < 0.010.4511.0160.0100.184Data in brackets denote the raw number. 95% CIs were computed using the Clopper–Pearson method. p values measure the differences of AUC values with the one of the Clinical model as baseline. Cutoff values were derived from the optimal cutoff point identified by maximizing the Youden index of the ROC curves yielded by training on the main cohort for external validation. Calibration (Calib) slope and intercept were derived by line fitting to the sample points of the calibration curvecRadiomics, clinical and radiomics feature-combined modelACC accuracy, SEN sensitivity, SPE specificity, AUC area under the receiver-operating-characteristic curve, CI confidence intervalTable 5Predictive performances of various models on the testing data of the external cohort during external validationModelACC (95% CI)SEN (95% CI)SPE (95% CI)AUC (95% CI)Cutoff valueCalib slopeCalib interceptBrier scoreClinical model0.695 [38/54] (0.564–0.820)0.667 [25/38] (0.486–0.804)0.765 [12/16] (0.476–0.927)0.696 (0.564–0.820)0.4640.7720.2110.336Radiomics model0.678 [37/54] (0.544–0.805)0.667 [25/38] (0.486–0.804)0.706 [11/16] (0.413–0.890)0.724 (0.584–0.835) p < 0.050.4280.4460.2940.285cRadiomics model0.763 [41/54] (0.624–0.865)0.762 [29/38] (0.598–0.886)0.765 [12/16] (0.476–0.927)0.784 (0.644-0.880) p < 0.050.4510.7840.1220.250Data in brackets denote the raw number. 95% CIs were computed using the Clopper–Pearson method. p values measure the differences of AUC values with that of the Clinical model as baseline. Cutoff values were derived from the optimal cutoff point identified by maximizing the Youden index of the ROC curves yielded by training on the main cohort for external validation. Calibration (Calib) slope and intercept were derived by line fitting to the sample points of the calibration curveACC accuracy, SEN sensitivity, SPE specificity, AUC area under the receiver-operating-characteristic curve, CI confidence interval, cRadiomics clinical and radiomics feature-combined model

In addition to the highest AUC values and promising calibrations, cRadiomics achieved an accuracy of 0.735 (95% CI: 0.674–0.793) in the testing data of the main cohort and an accuracy of 0.763 (0.634–0.864) in the testing data of the external cohort, with a sensitivity of 0.738 (0.652–0.810) and 0.762 (0.605–0.879), and a specificity of 0.731 (0.625–0.813) and 0.765 (0.501–0.932), respectively (Tables 3 and 5). These findings suggest that the model not only accurately distinguishes between different cases but also maintains robust and reliable predictive power across different data subsets, indicating its potential for clinical application in improving clinical decision-making and enabling personalized follow-up management.

Discussion

Despite complete resection of hepatic lesions in CRC patients, significant heterogeneity remains in clinical outcomes, particularly in recurrence and survival rates. More than half of patients develop recurrence during follow-up, with the majority occurring within the first two years [30, 31]. The Clinical Risk Score (CRS) is one of the common prognostic models for predicting long-term survival in patients with CRLM [21]. Other prognostic systems, such as Tumor Burden Score, Genetic and Morphological Evaluation score, and Comprehensive Evaluation of Relapse Risk score, incorporate a variety of clinical and molecular markers, including bilobar liver metastases, CEA and CA19-9 levels, and KRAS, NRAS, BRAF mutations [32–34]. However, as more prognostic factors are continuously identified, relying solely on these systems to assess intrahepatic recurrence risk becomes increasingly inadequate.

In this study, we developed and validated a machine learning model based on clinicopathologic parameters and radiomics features derived from liver metastases, peritumoral regions, and grossly disease-free liver parenchyma on preoperative grayscale ultrasound images. The model demonstrated promising classification and calibration performances in predicting intrahepatic recurrence within two years after curative liver resection of CRLM. In addition to the lesion, the tumor margin could reveal biological and pathological characteristics of the tumor. A 2017 international consensus guideline emphasizes that histopathologic growth patterns of CRLM—desmoplastic, pushing, and replacement—carry significant prognostic and predictive value [35]. Desmoplastic growth pattern responds well to chemotherapy, while replacement growth pattern shows a poor response to systemic treatment, suggesting a less favorable prognosis. However, definitive evaluation of the growth patterns primarily relies on postoperative histopathological assessment. Some studies achieved a high accuracy in predicting the growth patterns by applying radiomics on MRI images of CRLMs [36, 37]. Importantly, research indicated that there was a strong correlation between intrahepatic micrometastases and poorer clinical outcomes [38]. Micrometastases can lead to intrahepatic hemodynamic alterations, but are often undetectable by conventional imaging techniques. CT or MRI texture analysis is emerging as a non-invasive technique with the potential to detect subtle changes caused by micrometastases or occult CRLMs [39–41]. Similarly, in our study, two peritumoral texture features (cluster prominence, zone nonuniformity) and two liver parenchyma texture features (small area emphasis, uniformity), derived from grayscale ultrasound images, were identified to be associated with intrahepatic recurrence.

Preoperative chemotherapy and the use of targeted drugs were identified as risk factors for intrahepatic recurrence by the proposed model. It is likely due to the higher tumor burden in these patients, as univariable analysis revealed that a tumor number ≥ 3, pathological lymph node positivity, and bilobar liver metastases were significantly associated with intrahepatic recurrence. Notably, chemotherapy-related liver injuries, such as steatohepatitis and sinusoidal obstruction, can alter liver parenchyma, leading to a heterogeneous appearance on CT scans that may obscure the detection of recurrent lesions [42]. This complicates precise lesion detection in preoperative imaging and accurate localization during surgery [43].

Although we have developed a promising model to predict intrahepatic recurrence within two years after curative resection of CRLM, our study has several limitations. First, as a retrospective study, it is subject to unavoidable selection bias, which may influence the reliability of data analysis. Second, not all tumors visualized through ultrasound imaging exhibit distinct and well-defined margins, and this may lead to a reduced number of radiomics features being extracted. But ultrasound is a convenient, non-invasive, and cost-effective modality for evaluating liver lesions compared to CT or MRI. Third, the extraction of radiomics features can be influenced by various factors, including the type and setting of examination equipment, as well as the data analysis algorithms employed. Therefore, prospective, multicenter studies are necessary to validate our findings and further establish their relevance in clinical practice.

In conclusion, the cRadiomics model, integrating ultrasound radiomics and clinicopathological data, improved the prediction of intrahepatic recurrence within two years for colorectal liver metastases after curative hepatectomy, and it holds great potential to improve clinical decision-making and enable personalized management and risk-adapted follow-up for CRLM patients.

Supplementary information

ELECTRONIC SUPPLEMENTARY MATERIAL

Bibliography4

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Weiser MR, Jarnagin WR, Saltz LB (2013) Colorectal cancer patients with oligometastatic liver disease: What is the optimal approach? Oncology (Williston Park) 27:1074–107824575534 · pubmed ↗
2Fernandez FG, Drebin JA, Linehan DC, Dehdashti F, Siegel BA, Strasberg SM (2004) Five-year survival after resection of hepatic metastases from colorectal cancer in patients screened by positron emission tomography with F-18 fluorodeoxyglucose (FDG-PET). Ann Surg 240:438–44710.1097/01.sla.0000138076.72547.b 1PMC 135643415319715 · doi ↗ · pubmed ↗
3Song C, Li W, Cui J et al (2024) Pre-operative prediction of histopathological growth patterns of colorectal cancer liver metastasis using MRI-based radiomic models. Abdom Radiol (NY) 49:4239–424810.1007/s 00261-024-04290-z 39069557 · doi ↗ · pubmed ↗
4Li H, Shi M, Long X et al (2024) Contrast-enhanced intraoperative ultrasound improved hepatic recurrence-free survival in initially unresectable colorectal cancer liver metastases. Dig Liver Dis. 10.1016/j.dld.2024.09.00910.1016/j.dld.2024.09.00939343654 · doi ↗ · pubmed ↗