Radiomics for Predicting the Efficacy of Immunotherapy in Hepatocellular Carcinoma: A Systematic Review and Radiomics Quality Score Assessment

Ruixin Zhang; Chengjie Zhang; Yi Liu; Zhiguo Gui; Anhong Zhang

PMC · DOI:10.3390/cancers18020186·January 6, 2026

Radiomics for Predicting the Efficacy of Immunotherapy in Hepatocellular Carcinoma: A Systematic Review and Radiomics Quality Score Assessment

Ruixin Zhang, Chengjie Zhang, Yi Liu, Zhiguo Gui, Anhong Zhang

PDF

Open Access

TL;DR

This review explores how radiomics can predict immunotherapy success in liver cancer, highlighting the need for better standardization and data sharing.

Contribution

The paper systematically evaluates radiomics models for immunotherapy prediction in HCC and identifies key methodological gaps.

Findings

01

Radiomics models perform better for short-term responses than long-term outcomes in HCC immunotherapy.

02

Combining radiomic features with clinical data improves prediction accuracy.

03

Standardization and open data sharing are critical for clinical translation of radiomics.

Abstract

Radiomics shows strong potential to predict immunotherapy efficacy in hepatocellular carcinoma, whether used alone or with immune checkpoint inhibitors. Current models perform better for short-term responses (mRECIST/RECIST 1.1) than for long-term outcomes (overall survival/progression-free survival). Integrating radiomic features with clinical characteristics markedly improves prediction. Major challenges persist: heterogeneous imaging and protocols, limited external generalizability, weak biological interpretability, suboptimal clinical applicability, and poor data sharing. This review synthesizes current evidence and recommends prioritizing standardization, multimodal and clinical data fusion, prospective multicenter validation, and the adoption of open, FAIR-compliant datasets to facilitate the translation of radiomics into reliable decision-support tools for personalized…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases3

hepatocellular carcinoma HCC tumor

Figures4

Click any figure to enlarge with its caption.

Funding1

—National Natural Science Foundation of China

Keywords

hepatocellular carcinomaimmunotherapyradiomicstreatment outcomeradiomics quality scoringsystematic review

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Hepatocellular Carcinoma Treatment and Prognosis · Cancer Immunotherapy and Biomarkers

Full text

1. Introduction

Primary liver cancer (PLC) is the sixth most common malignancy worldwide and the third leading cause of cancer-related death [1]. Hepatocellular carcinoma (HCC), the main histological subtype of PLC, accounts for about 80% of cases [2]. Due to its insidious onset, most patients with HCC are diagnosed at advanced stages, when curative surgical resection is no longer possible. Systemic therapies, particularly immune checkpoint inhibitors (ICIs), have become central to the treatment of unresectable HCC [3,4]. In 2017, nivolumab was approved by the FDA as the first second-line therapy for HCC, marking the start of the immunotherapy era [5]. In July 2019, the FDA granted breakthrough therapy designation to the combination of the PD-1 inhibitor pembrolizumab with lenvatinib as a first-line treatment for unresectable HCC [6]. Since then, combined approaches that integrate locoregional and systemic therapies have increasingly been utilized in clinical practice [7]. Nonetheless, due to the marked biological heterogeneity of HCC, only a subset of patients responds favorably to immunotherapy. Studies have shown that with ICI monotherapy, durable objective responses are achieved in only 15–20% of patients [8]. The subsequent development of various combination strategies, such as ICIs with molecularly targeted agents, and further integration with locoregional treatments, has improved response rates and survival outcomes in patients with HCC. Nevertheless, the therapeutic benefit of these regimens is still confined to a limited proportion of patients [9]. Although multiple immunotherapy combinations are now available, non-responders face substantial challenges, including increased medical costs, a higher risk of severe adverse events, and the possible loss of the optimal treatment window for other effective therapies. Therefore, accurately identifying patients who are most likely to benefit from specific regimens is crucial for advancing precision and individualized treatment.

Currently, no reliable biomarkers are available to accurately predict the efficacy of immunotherapy in patients with HCC. Proposed biomarkers include programmed death-ligand 1 (PD-L1) expression [10], tumor-infiltrating lymphocytes [11], tumor mutation burden (TMB) [12], and the expression of specific genes or signaling pathways. However, these tissue-based biomarkers require invasive biopsy to obtain tumor samples and cannot fully capture the spatial and temporal heterogeneity of tumors. Radiomics, as a new artificial intelligence technology, enables the extraction of high-throughput quantitative features from medical images such as CT and MRI, transforming them into analyzable data for the non-invasive assessment of tumor heterogeneity [13]. Radiomics has already been applied to the differential diagnosis, molecular subtyping, and prognostic evaluation of HCC [14].

Due to marked tumor heterogeneity, the clinical benefit of ICIs in HCC varies substantially among patients, and robust, reproducible, and scalable biomarkers for treatment response remain lacking. Radiomics enables non-invasive, quantitative characterization of tumor phenotypes and has emerged as a promising approach for predicting therapeutic responses. However, current radiomics research faces challenges in reproducibility and generalizability across scanners and institutions, methodological transparency (e.g., incomplete reporting of image acquisition, segmentation, and feature extraction), and biological interpretability, all of which may hinder clinical translation. In recent years, several studies have developed CT- or MRI-based radiomics models to predict the efficacy of immunotherapy in patients with HCC. However, no systematic review has yet been conducted to integrate and evaluate their overall performance and methodological quality. Therefore, this study aims to systematically summarize the available evidence to assess the predictive performance of radiomics models for immunotherapy efficacy in HCC and to evaluate the reporting quality of these studies. Collectively, our findings provide evidence-based support for the clinical translation of radiomics in precision immunotherapy for HCC.

2. Methods

This systematic review strictly follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [15], and the PRISMA 2020 Checklist. We present a schematic overview of a typical CT/MRI-based radiomics pipeline for predicting immunotherapy outcomes in HCC (Figure 1).

2.1. Search Strategy

A comprehensive literature search was performed in PubMed, Web of Science, Embase, and the Cochrane Library to identify studies published from database inception to 21 June 2025. The search strategy combined Medical Subject Headings (MeSH) and free-text terms, including “Radiomics”, “Hepatocellular Carcinoma”, and “Immune Checkpoint Inhibitor”. Detailed search strategies for each database are provided in the Supplementary Materials (Supplementary Table S1). In addition, the reference lists of the included articles were screened to identify any other relevant studies.

2.2. Study Selection

All retrieved studies were imported into EndNote version X9 for management. After removing duplicates, two independent researchers screened the titles and abstracts to preliminarily exclude irrelevant articles. The full texts of the remaining studies were then obtained and reviewed according to predefined inclusion and exclusion criteria. Any disagreements during the screening process were resolved through consultation with the corresponding author. Given the high dimensionality and multi-step nature of radiomics modeling, extremely small cohorts are highly susceptible to overfitting and feature instability and frequently lack robust internal and external validation. To mitigate small-sample biases and prevent overly optimistic performance estimates, we pre-specified a minimum sample size (n ≥ 50).

Eligibility criteria and the selection process are summarized in the PRISMA flow chart (Figure 2). We included original studies involving histologically or clinically confirmed HCC patients treated with ICI-based therapy that utilized pretreatment CT or MRI radiomics to predict immunotherapy outcomes. We excluded studies focusing exclusively on molecular or radiogenomic features rather than clinical efficacy endpoints.

2.3. Data Extraction

The following information was extracted from the original studies: (1) Study characteristics: first author, country, publication year, study design, and sample size; (2) Patient characteristics: age, gender, treatment regimen received, and predicted outcome indicators; (3) Radiomics process characteristics: imaging modality, region of interest (ROI) segmentation method, dimension reduction techniques for radiomic features, and algorithms used to construct the model; and (4) Predictive performance indicators of radiomics models: area under the curve (AUC) and concordance index (C-index). In cases where multiple models were constructed in a study, the model with the highest AUC or C-index value from the validation cohort was selected. Given the significant heterogeneity in the design of these studies, this review does not perform a meta-analysis to summarize AUC, C-index, and other indicators. Instead, this study provides a categorized description based on the type of immune-combination therapy and the different efficacy evaluation metrics.

2.4. Quality Assessment

For the research quality assessment, two tools were employed: the radiomics quality score (RQS) (https://www.radiomics.world/rqs, accessed on 22 June 2025) and the METhodological Radiomics Quality Score (METRICS) (https://metricsscore.github.io/metrics/METRICS.html, accessed on 22 June 2025). The RQS, proposed by Lambin et al. in 2017, consists of 16 items with a total score range of 0–36 [13]. It evaluates six dimensions: image protocol, feature extraction, data analysis and statistics, model development and validation, clinical applicability, and open science. The RQS is a classical tool widely used to assess the methodological quality of radiomics models; however, it has limitations in interpretability and applicability for deep learning studies. To address these limitations, the European Society of Medical Imaging Informatics introduced METRICS [16]. This tool includes 30 items (+5 conditional items) across nine categories: study design, imaging data, segmentation, image processing and feature extraction, feature processing, preparation for modeling, metrics and comparison, testing, and open science. Two researchers independently applied RQS and METRICS to evaluate the quality of the included radiomics studies. Each reviewer assigned individual scores, and any discrepancies were resolved through discussion or consultation with senior researchers.

3. Results

3.1. Literature Search

A total of 144 relevant publications were identified in the initial literature search, with an additional 3 articles obtained from the references of related studies. After removing 66 duplicates, 54 articles were excluded based on title and abstract screening due to irrelevance or because they were conference abstracts or reviews. Full texts of the remaining 24 articles were then reviewed, and 13 were excluded for not meeting the inclusion criteria. Ultimately, 11 studies were included in this review. The study selection process is presented in Figure 2.

3.2. Overall Characteristics of Included Studies

This systematic review included 11 studies [17,18,19,20,21,22,23,24,25,26,27], of which 10 (90.9%) were conducted in China. All studies were retrospective and published between 2021 and 2025, with six being multicenter studies. A total of 2014 patients were included, with the majority being male (87.3%) and the average age ranging from 47 to 67 years. Based on the immunotherapy regimens administered to patients, the studies were categorized into three groups: ICI monotherapy (1/11), ICIs combined with molecular targeted agents (6/11), and ICIs combined with molecular targeted agents plus locoregional treatments (4/11). Regarding outcome measures, seven studies focused on predicting treatment response, while four specifically predicted OS and PFS. The characteristics of the included studies are summarized in Table 1.

3.3. Methodological Quality Assessment

The detailed RQS results for each study are presented (Figure 3). The median RQS across all studies was 15 (range: 11–19), corresponding to a median percentage of 41.7% (range: 30.6–52.8%). For specific items, no study scored on the following four items: phantom study, imaging at multiple timepoints, prospective study registered in a trial database, and cost-effectiveness analysis. Only one study examined the correlation between radiomic features and biological characteristics [24]. Conversely, the studies performed well on the following four items: multiple segmentations, feature reduction or adjustment for multiple testing, multivariable analysis with non-radiomics features, and comparison to a gold standard, with average scores exceeding 90%.

The detailed METRICSs for each study are presented (Figure 4). The median METRICS across all studies was 72.5% (range: 56.0–79.5%). For each dimension, the proportion of “yes” responses was calculated for individual items. The open science dimension had the lowest proportion, at 6.1%, with only two studies providing accessible codes [24,25]. In contrast, the dimensions of image processing and feature extraction, feature processing, and preparation for modeling performed well, with more than 70% of studies answering “yes” for items in these categories. A detailed assessment of each study by item and dimension is provided (Supplementary Tables S2 and S3).

3.4. Characteristics of the Radiomics Model Pipeline

The feature extraction parameters and validation methods are summarized in Table 2. The workflow for constructing radiomics models generally involves five main steps: image acquisition, image segmentation, feature extraction, feature selection and dimension reduction, and model development. (1) Image acquisition: Seven studies used CT-derived features to build predictive models, while four studies relied on MRI images. Most studies employed multiphase imaging, with only one study using a single-phase image (portal venous phase, PVP) [23]. Additionally, the majority of studies (7/11) utilized multiple imaging devices for acquisition (Supplementary Table S4). (2) Image segmentation: Ten studies performed ROI segmentation, with seven using manual segmentation and three employing semi-automated methods. (3) Feature extraction: Pyradiomics was the most commonly used extraction tool (7/11). Across the studies, the median number of extracted features was 2236 (range: 428–3376). (4) Feature selection and dimension reduction: Six studies assessed feature reproducibility using the intraclass correlation coefficient (ICC) before feature selection, only retaining features with ICC values above a predefined threshold (minimum 0.75) for further analysis. The least absolute shrinkage and selection operator (LASSO) was the most commonly used method for feature selection. After dimensionality reduction, the median number of retained features was 10 (range: 5–32). (5) Model development: Most studies (8/11) applied more than one algorithm for model construction, with random forest (RF) (4/11) and support vector machine (SVM) (4/11) being the most frequently used. Notably, nine studies integrated clinical parameters to develop combined clinical–radiomics models (Supplementary Table S5).

3.5. Performance of Radiomics Models in Predicting Treatment Response

Our review included seven radiomics studies evaluating the efficacy of immunotherapy in HCC patients [17,18,19,20,22,26,27] (Table 3). Among these, one study focused on predicting treatment response to ICI monotherapy, assessed using the mRECIST criteria, and reported AUC values of 0.894 (95% CI: 0.797–0.991) and 0.883 (95% CI: 0.716–0.998) in the training and internal validation cohorts, respectively [17]. Three studies evaluated the performance of radiomics models for predicting treatment response to ICIs combined with molecular targeted therapy, assessed using the RECIST 1.1 criteria. AUC values ranged from 0.886 to 0.956 in training sets and from 0.792 to 0.802 in internal validation sets. Notably, a clinical–radiomics model integrating five clinical factors with imaging features demonstrated robust performance, achieving AUC values of 0.987 (95% CI: 0.968–1.000) and 0.884 (95% CI: 0.762–1.000) in the training and external validation cohorts, respectively [19]. In addition, three studies evaluated the utility of radiomics for predicting treatment response to ICIs combined with both molecular targeted therapy and locoregional therapy, assessed using the mRECIST criteria. Radiomics models achieved AUC values of 0.877–0.920 in training sets and 0.721–0.790 in internal validation sets. Clinical–radiomics models demonstrated superior performance, with AUC values of 0.950–0.960 in training sets and 0.840–0.850 in internal validation sets.

3.6. Performance of Radiomics Models in Predicting OS

Four studies evaluated the performance of radiomics models in predicting OS in HCC patients [18,21,23,24] (Table 4). In three studies focusing on ICIs combined with molecular targeted therapy, radiomics models achieved C-index values of 0.76–0.77 in training cohorts, 0.70 in internal validation cohorts, and 0.63–0.69 in external validation cohorts. Of note, incorporating clinical factors to develop clinical–radiomics models significantly enhanced predictive performance, with C-index values increasing to 0.78–0.82 in training cohorts, 0.82 in internal validation cohorts, and 0.67–0.74 in external validation cohorts. For OS prediction in patients receiving systemic therapy combined with locoregional therapy, the radiomics model yielded a C-index of 0.838 (95% CI: 0.806–0.870) in the training cohort and 0.817 (95% CI: 0.748–0.886) in the internal validation cohort. After integrating key clinical factors, including albumin–bilirubin (ALBI) grade and portal vein tumor thrombus (PVTT), model performance improved further, achieving C-index values of 0.867 (95% CI: 0.839–0.898) in the training cohort and 0.840 (95% CI: 0.782–0.897) in the validation cohort [21].

3.7. Performance of Radiomics Models in Predicting PFS

A total of four studies evaluated the performance of radiomics models in predicting PFS in HCC patients [23,24,25,26] (Table 5). Three studies focused on patients receiving ICIs combined with molecular targeted agents. In these studies, radiomics models achieved C-index values ranging from 0.67 to 0.837 in training sets, 0.64–0.830 in internal validation sets, and 0.54–0.66 in external validation sets. Clinical–radiomics combined models showed improved performance, with C-index values of 0.70–0.846 in training sets, 0.68–0.845 in internal validation sets, and 0.59–0.69 in external validation sets. Only one study developed a radiomics model to predict PFS in patients receiving systemic therapy combined with locoregional treatment. The initial radiomics model achieved a C-index of 0.59, which improved significantly to 0.75 after incorporating key clinical factors [26].

4. Discussion

Currently, no accurate and reliable biomarkers are available in clinical practice to guide individualized precision immunotherapy for HCC. Increasing evidence suggests that radiomics may provide valuable insights into tumor heterogeneity and help predict response and outcomes to immunotherapy [28,29,30]. This systematic review indicates that pretreatment CT/MRI-based radiomics models show overall promise for predicting immunotherapy outcomes in HCC, particularly for short-term responses, and model performance is generally improved when clinical variables are integrated. However, the evidence base remains constrained by substantial clinical and methodological heterogeneity, limited evaluation of long-term endpoints (OS/PFS), and a consistent gap between training and validation performance, highlighting concerns regarding generalizability. Quality appraisal using the RQS and METRICS further suggests that current studies have methodological limitations, with recurring shortcomings in external validation, prospective design, and transparency. Collectively, these findings support radiomics as a candidate imaging biomarker while underscoring the need for standardized workflows and geographically diverse, multicenter prospective validation prior to clinical adoption.

This systematic review synthesizes the available evidence on radiomics models based on pretreatment CT or MRI imaging for predicting immunotherapy efficacy in HCC. The 11 included studies demonstrated substantial heterogeneity in study design, particularly regarding the types of immunotherapy regimens and the metrics used to evaluate effectiveness. Currently, a variety of immunotherapy approaches are employed in clinical practice for HCC treatment [31]. The immunotherapy regimens in the included studies can be grouped into three categories: ICI monotherapy, ICIs combined with molecular targeted therapy, and ICIs combined with locoregional therapy. The metrics used to evaluate immunotherapy efficacy also varied across studies. Primary endpoints included OS, PFS, and treatment response assessed using the RECIST 1.1 or mRECIST criteria. While RECIST 1.1 and mRECIST focus on short-term responses, OS and PFS reflect long-term outcomes. Predictive biomarkers for long-term efficacy are particularly valuable, as they can help clinicians make informed treatment decisions [32]. However, there are relatively few studies evaluating time-to-event endpoints in HCC patients receiving immunotherapy monotherapy. It is also important to note that immunotherapy can produce delayed responses or pseudoprogression, meaning that traditional solid tumor evaluation criteria like RECIST may not fully capture therapeutic effects. The recently proposed iRECIST criteria were specifically designed to evaluate immunotherapy responses, but none of the included studies applied this framework.

4.1. Study Heterogeneity and Predictive Performance

Owing to substantial clinical and methodological heterogeneity among the included studies, a pooled analysis of effect sizes was not conducted. Only one eligible study evaluated radiomics for predicting outcomes under ICI monotherapy [17], and therefore, these findings should be considered hypothesis-generating and not generalizable. This limitation likely reflects current clinical practice in hepatocellular carcinoma, where the limited efficacy of single-agent ICIs has led to a shift toward combination regimens [3]. In studies predicting immunotherapy response, models for HCC patients receiving ICIs combined with molecular targeted therapy demonstrated excellent performance in training cohorts, with AUC values ranging from 0.880 to 0.956. For patients treated with ICIs combined with both molecular targeted therapy and locoregional therapy, models achieved AUC values of 0.877–0.920. Corresponding validation sets showed slightly lower AUC values of 0.792–0.820 and 0.721–0.790, respectively. These findings suggest that the models may exhibit some degree of overfitting, although they generally maintain good discriminative performance. The predictive performance of models for patients receiving combined locoregional therapy was relatively lower, which may reflect the increased difficulty in assessing efficacy due to the complexity of combination regimens. Regarding long-term efficacy prediction, studies remain limited, likely due to the need for extended follow-up and higher research costs. In training cohorts of HCC patients treated with ICIs combined with targeted therapy, radiomics models predicting OS and PFS achieved C-index values of 0.76–0.838 and 0.59–0.837, respectively, while validation cohorts yielded C-index values of 0.63–0.817 and 0.54–0.830.

A horizontal comparison indicated that predictive performance for long-term efficacy indicators was generally lower than that for short-term outcomes, reflecting the inherent challenges of long-term prognosis prediction. Despite variability in immunotherapy regimens and efficacy evaluation metrics across studies, the overall findings support the good discriminative ability of radiomics models in predicting HCC immunotherapy efficacy, suggesting their potential clinical utility. Furthermore, integrating clinical factors consistently improved model performance, highlighting the added value of combining clinical information with radiomic features.

4.2. Methodological Quality Assessment (RQS and METRICS)

Next, we assessed the methodological quality of the included studies using two tools: the RQS and METRICS. The RQS assessment revealed a median score of 15, corresponding to 41.7%, with only one study scoring above 50%. These results indicate that the overall quality of current radiomics models remains suboptimal, with notable deficiencies in areas such as phantom studies, multi-timepoint imaging, prospective study design, cost-effectiveness analysis, biological relevance, and open data sharing. Our findings are consistent with previous systematic reviews evaluating radiomics models for immunotherapy response in lung cancer (median RQS: 11) [28], histopathological grading in HCC (median RQS: 10) [33], and lymph node metastasis in colorectal cancer (median RQS: 18) [34]. To enhance methodological rigor and facilitate clinical translation, future radiomics studies should ensure adherence to several core domains. Specifically, investigators should (1) provide sufficiently detailed reporting of image acquisition and reconstruction protocols to support reproducibility; (2) implement rigorous validation strategies, at least including internal validation and, whenever feasible, independent external validation to establish generalizability; (3) minimize overfitting and data leakage through strict separation between training and validation phases, with feature selection and hyperparameter tuning confined exclusively to the training set; (4) evaluate the reproducibility and robustness of segmentation and radiomic features using quantitative metrics (e.g., intraclass correlation coefficients and/or test–retest or multi-timepoint assessments); and (5) report model performance comprehensively, including both discrimination and calibration, along with transparent disclosure of key methodological parameters.

In contrast to the more established RQS, the METRICS is a relatively novel assessment tool. Evaluation using the METRICS yielded a median score of 72.5%, indicating higher methodological quality compared to the RQS and reflecting its distinct scoring framework for assessing study rigor. As the METRICS is a relatively new tool, its application in radiomics research is still in the early stages.

4.3. Major Limitations of Current Evidence

Through systematic analysis of low-scoring items in both quality assessment tools, we identified the following major limitations in current radiomics research: (1) Heterogeneity in imaging data: None of the included studies adhered to standardized image acquisition protocols, and several studies used scanners from different manufacturers, introducing variability in the raw imaging data. This heterogeneity largely reflects the inherent limitations of retrospective study designs in data selection. Furthermore, no study has explicitly examined how this variability affects radiomic features or subsequent analytical results. (2) Insufficient model generalizability: Most radiomics models were developed using small, highly homogeneous datasets and lacked external validation across multiple centers or diverse populations. Among the eleven included studies, only four performed external validation. Although some studies incorporated multicenter data, limited sample sizes often necessitated pooling the data for model training, making it difficult to evaluate the models’ generalizability in real-world clinical settings. (3) Lack of biological interpretability: Establishing correlations between radiomic features and underlying biological mechanisms would link model decisions to pathophysiological processes, enhancing clinical trust and translational potential. However, only one study examined the biological relevance of radiomic features. (4) Limited clinical usability: Most studies relied on manual or semi-automated image segmentation, which is time-consuming, labor-intensive, and highly dependent on operator expertise, limiting reproducibility and efficiency in routine clinical practice. Furthermore, physiological and technical factors, such as respiratory motion, gastrointestinal peristalsis, and patient positioning, can cause significant liver deformation and signal fluctuations over short periods, resulting in the instability of radiomic features extracted from single scans. No study employed multi-timepoint imaging to ensure feature robustness. While most studies used multiphase images to construct radiomics models, current designs tend to be overly complex. Although this complexity may improve performance, it increases computational burden and costs, with no comparisons to simpler models or cost-effectiveness analysis. (5) Inadequate transparency: The radiomics modeling process is inherently complex, and open access to data and methods is essential for validation, methodological refinement, and clinical translation. Unfortunately, most studies did not provide easily accessible open-source data.

4.4. Clinical Implications and Future Directions

Immunotherapy is a cornerstone of systemic treatment for HCC. Radiomics-based risk stratification could complement conventional clinical variables and biomarker profiles to guide patient selection, treatment escalation or de-escalation, and clinical trial enrichment. If prospectively validated, such models may reduce avoidable treatment-related toxicity and unnecessary financial costs, including drug expenditures and the management of immune-related adverse events, while helping clinicians avoid delays that could cause patients to miss the optimal therapeutic window by promptly identifying individuals most likely to benefit from a given immunotherapy strategy. To accelerate clinical translation, future studies should (1) adopt rigorous designs through preregistered, prospective, multicenter protocols with prespecified endpoints and locked analysis plans to minimize selection bias and analytic flexibility; (2) standardize and quantify imaging variability by harmonizing acquisition parameters and, where feasible, incorporate phantom-based or test–retest scans to characterize scanner-specific effects and inform protocol harmonization; (3) implement multi-timepoint imaging protocols aligned with immunotherapy dynamics, including at minimum a baseline scan within 2 weeks before treatment initiation, an early on-treatment scan coinciding with the first response assessment, and follow-up scans every 8–12 weeks, with an additional confirmation scan 4–8 weeks after immune-unconfirmed progressive disease (iUPD) when immune-adapted criteria are used, enabling delta-radiomics and longitudinal modeling while allowing for the explicit evaluation of feature stability under motion-related liver deformation; (4) improve usability and efficiency by implementing externally validated fully automated or strictly standardized semi-automated segmentation workflows and reporting segmentation reliability when manual intervention is required; and (5) demonstrate clinical and economic value by supplementing the AUC with decision curve analysis and conducting model-based cost–utility analyses that compare radiomics-guided treatment allocation to standard-of-care pathways, including treatment costs, adverse-event management, downstream imaging, and quality-adjusted life years (QALYs), while transparently sharing de-identified data, segmentations, and code in accordance with FAIR principles to support independent verification and benchmarking.

4.5. Limitations of This Review

This systematic review has several limitations that warrant consideration. First, significant clinical heterogeneity existed among the included studies due to variations in treatment regimens and evaluation criteria. In the current therapeutic landscape of HCC, multiple immunotherapy strategies are used alongside diverse efficacy assessment metrics. Since radiomics-based prediction of immunotherapy response in HCC is still in its early stages, imposing strict restrictions on specific treatment protocols or outcome measures would have reduced the number of eligible studies to fewer than five. Therefore, during study selection, we did not strictly limit the types of immunotherapy regimens or efficacy endpoints; instead, we conducted stratified analyses according to different treatment–outcome combinations. Second, the evidence is geographically concentrated, with most studies conducted in China. This may reflect the high burden of HCC and early, focused investment in radiomics research in this region. Geographic clustering may limit the external validity of the findings, as regional differences in patient demographics, etiology, clinical pathways, and imaging protocols can affect model performance and transportability. Therefore, although the findings are informative given the current evidence base, their generalizability to other regions requires validation through prospective, multicenter studies across diverse populations and clinical settings with harmonized imaging workflows. Finally, this review did not include prediction models based on ultrasound or PET imaging. Ultrasound image quality is highly operator-dependent, and PET scanning is costly and not routinely performed in all HCC patients. In contrast, CT and MRI are integral to the HCC diagnostic and therapeutic workflow. Accordingly, we focused exclusively on radiomics models constructed from CT/MRI images to enhance the generalizability of our findings.

5. Conclusions

Radiomics signatures derived from pretreatment CT or MRI are promising candidate imaging biomarkers for predicting immunotherapy response in hepatocellular carcinoma. However, clinical translation requires geographically diverse, multicenter prospective validation with rigorous external testing across institutions and standardized radiomics pipelines to ensure reproducibility and transportability.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bray F. Laversanne M. Sung H. Ferlay J. Siegel R.L. Soerjomataram I. Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries CA Cancer J. Clin.20247422926310.3322/caac.2183438572751 · doi ↗ · pubmed ↗
2Rumgay H. Ferlay J. De M.C. Georges D. Ibrahim A.S. Zheng R. Wei W. Lemmens V. Soerjomataram I. Global, regional and national burden of primary liver cancer by subtype Eur. J. Cancer 202216110811810.1016/j.ejca.2021.11.02334942552 · doi ↗ · pubmed ↗
3Rimassa L. Finn R.S. Sangro B. Combination immunotherapy for hepatocellular carcinoma J. Hepatol.20237950651510.1016/j.jhep.2023.03.00336933770 · doi ↗ · pubmed ↗
4Wang Q. Yu J. Sun X. Li J. Cao S. Han Y. Wang H. Yang Z. Li J. Hu C. Sequencing of systemic therapy in unresectable hepatocellular carcinoma: A systematic review and Bayesian network meta-analysis of randomized clinical trials Crit. Rev. Oncol. Hematol.202420410452210.1016/j.critrevonc.2024.10452239332750 · doi ↗ · pubmed ↗
5Yau T. Kang Y.K. Kim T.Y. El A.B. Santoro A. Sangro B. Melero I. Kudo M. Hou M.M. Matilla A. Efficacy and Safety of Nivolumab Plus Ipilimumab in Patients with Advanced Hepatocellular Carcinoma Previously Treated with Sorafenib: The Check Mate 040 Randomized Clinical Trial JAMA Oncol.20206 e 20456410.1001/jamaoncol.2020.456433001135 PMC 7530824 · doi ↗ · pubmed ↗
6Sun X. Zhang Q. Mei J. Yang Z. Chen M. Liang T. Real-world efficiency of lenvatinib plus PD-1 blockades in advanced hepatocellular carcinoma: An exploration for expanded indications BMC Cancer 20222229310.1186/s 12885-022-09405-735305593 PMC 8933880 · doi ↗ · pubmed ↗
7Llovet J.M. De B.T. Kulik L. Haber P.K. Greten T.F. Meyer T. Lencioni R. Locoregional therapies in the era of molecular and immune treatments for hepatocellular carcinoma Nat. Rev. Gastroenterol. Hepatol.20211829331310.1038/s 41575-020-00395-033510460 · doi ↗ · pubmed ↗
8Sangro B. Sarobe P. Hervás S.S. Melero I. Advances in immunotherapy for hepatocellular carcinoma Nat. Rev. Gastroenterol. Hepatol.20211852554310.1038/s 41575-021-00438-033850328 PMC 8042636 · doi ↗ · pubmed ↗