Quantitative Imaging Advances in HPV-Positive Oropharyngeal Carcinoma

Dermot Farrell; Houda Bahig; Richard Khor; Luiz P. Kowalski; Remco de Bree; Avraham Eisbruch; Heleen Bollen; Fernando Lopez; M. P. Sreeram; Orlando Guntinas-Lichius; Juan P. Rodrigo; Nabil F. Saba; Karthik N. Rao; Sandra Nuyts; Anna Luíza Damaceno Araújo; Alfio Ferlito; Sweet Ping Ng

PMC · DOI:10.3390/cancers18020303·January 19, 2026

Quantitative Imaging Advances in HPV-Positive Oropharyngeal Carcinoma

Dermot Farrell, Houda Bahig, Richard Khor, Luiz P. Kowalski, Remco de Bree, Avraham Eisbruch, Heleen Bollen, Fernando Lopez, M. P. Sreeram, Orlando Guntinas-Lichius, Juan P. Rodrigo, Nabil F. Saba, Karthik N. Rao, Sandra Nuyts, Anna Luíza Damaceno Araújo, Alfio Ferlito

PDF

Open Access

TL;DR

This paper reviews how quantitative imaging techniques like MRI, CT, and PET, combined with AI, can improve care for patients with HPV-positive oropharyngeal cancer by enabling personalized treatment decisions.

Contribution

The paper uniquely maps imaging findings to specific decision points in radiotherapy workflows and highlights requirements for clinical translation.

Findings

01

Quantitative imaging supports risk stratification, treatment adaptation, and surveillance in HPV-positive OPSCC.

02

Standardized reporting and validation are essential for integrating imaging biomarkers into clinical workflows.

03

AI and radiomics show promise but face barriers like limited validation and heterogeneous methods.

Abstract

This is a review article of published research on the area of quantitative imaging for HPV-positive oropharyngeal carcinomas. This review looks at advances in multiple modalities, including MRI, CT, PET, and the use of deep learning tools in this field. Progress in this field will potentially change diagnostic workflows for relevant patients. Unlike prior narrative reviews that have primarily catalogued modality-specific performance, this review focuses on both decision and implementation by mapping results from imaging investigations to decision points found in the radiotherapy pathway (diagnosis, treatment adaptation, and post-treatment surveillance). Further, we have highlighted where evidence is now informing de-escalation vs. where evidence still remains exploratory. We have summarised technical and validation requirements for embedding these biomarkers into radiotherapy workflows…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases1

Oropharyngeal Carcinoma

Keywords

HPVoropharyngealMRICTPETradiomics

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHead and Neck Cancer Studies · Radiomics and Machine Learning in Medical Imaging · Esophageal Cancer Research and Treatment

Full text

1. Introduction

Human papillomavirus (HPV)-positive oropharyngeal squamous cell carcinoma (OPSCC) is biologically distinct from HPV-negative disease and generally carries a more favourable prognosis [1]. This survival advantage has fuelled interest in treatment de-escalation strategies; however, safely individualising therapy depends on robust biomarkers that can identify risk, forecast response early, and guide posttreatment decisions. Quantitative imaging is well positioned to meet this need because it can be acquired non-invasively at scale, repeated longitudinally, and integrated with radiotherapy planning.

Despite the favourable prognosis of HPV-positive OPSCC, clinically meaningful heterogeneity persists: a subset of patients experiences recurrence or treatment-related morbidity that is not well predicted by current staging and clinicopathologic stratification alone. This creates three practical gaps that motivate this review. Firstly, risk categorisation remains coarse—clinicians cannot reliably distinguish those suitable for de-escalation from those who require standard (or intensified) therapy using routinely available variables, particularly when competing objectives include cure, function preservation, and long-term toxicity [2]. Secondly, treatment modification is limited by the lack of validated early-response triggers—during radiotherapy (± systemic therapy), adaptation decisions are often constrained to anatomy- or symptom-driven changes, while biological response may precede visible size change and is rarely quantified in a standardised way [3,4]. Thirdly, posttreatment monitoring is frequently uncertain—equivocal findings can lead to additional imaging, invasive procedures, or delayed salvage, highlighting the need for objective criteria that better triage patients after treatment [5]. Imaging biomarkers are anticipated to address these gaps by providing scalable, longitudinal, non-invasive measurements that can support decision-making at diagnosis, during treatment, and in surveillance—provided they are technically robust and prospectively validated.

Contemporary radiotherapy pathways already rely on multimodality imaging for staging, target delineation, and response assessment. Embedding quantitative readouts within these offers a pragmatic route to precision care: baseline models can refine risk at diagnosis; on-treatment metrics can provide early readouts to trigger adaptive strategies; and posttreatment endpoints can standardise surveillance and triage. Importantly, quantitative methods bring the possibility of objective, reproducible measurements across centres—an important property if predictors are to support de-intensification without compromising disease control.

Quantitative imaging in head and neck oncology is evolving rapidly beyond conventional radiology, driven by advances in machine learning and new sensing modalities. For example, hyperspectral imaging combined with computer-aided diagnostic methods is increasingly being explored to enhance detection and diagnosis in head and neck cancer, reflecting a broader shift towards high-dimensional, algorithm-assisted tissue characterisation [6].

In parallel, contemporary reviews of AI-driven radiomics in head and neck cancer highlight both the breadth of emerging applications and persistent translation barriers—particularly limited external validation, site/scanner variability, and practical workflow integration [7]. Against this backdrop, the key unmet need is not another modality-specific catalogue of model performance but a clinically anchored synthesis focused on HPV-positive OPSCC that maps quantitative imaging biomarkers to real decision points (risk stratification for de-escalation, response-adaptive treatment modification, and posttreatment assessment) while critically evaluating methodological robustness and readiness for multi-centre testing and implementation.

Translation hinges on technical robustness as much as on statistical significance. Feature values vary with acquisition, reconstruction, segmentation policy, and preprocessing; consequently, adherence to consensus definitions and transparent reporting (e.g., exact b-values for diffusion, reconstruction kernels for CT, and response criteria for PET) is central to external validity. Throughout this review, we highlight where the biomarker signal is most consistent, where it attenuates after appropriate adjustment (e.g., for HPV status and stage), and how study design choices (class imbalance handling, leakage-free validation, and multi-centre testing) influence apparent performance.

This review synthesises advances across four complementary pillars:

Diffusion-weighted MRI (DWI) and apparent diffusion coefficient (ADC) mapping—capturing tumour microstructure and early therapy-induced change. Multiple OPSCC cohorts report lower baseline ADC in HPV-positive tumours and robust midtreatment ΔADC (change in ADC) signals that anticipate outcomes [8,9], whereas baseline associations can attenuate after adjusting for HPV status and stage [8,10]. Prospective studies and MR-Linac feasibility work highlight clinically actionable timepoints for adaptive strategies [11,12].
MRI radiomics—converting routine sequences (contrast-enhanced T1 weighted imaging (CE-T1), T2, and ADC) into high-throughput descriptors of phenotype. Across single-centre and multi-centre analyses, diffusion-derived features frequently rank among the most informative for HPV/p16 classification [13], and radiomics augments clinicopathologic models for survival rather than replacing them. Methodological appraisals underline the importance of class-imbalance handling, feature stability, and transparent validation [14,15].
CT radiomics and deep learning—leveraging ubiquitous planning CT. Early texture studies showed reproducible HPV-related signals on CT [16], subsequently validated across centres; nodal features add complementary information to primary tumour features [3]. CT radiomics improves overall survival/progression-free survival (OS/PFS) and locoregional-control prediction when combined with clinical variables, while modern deep learning approaches demonstrate segmentation-free HPV classification and improved extranodal-extension screening on trial data.
FDG-PET/CT metrics and radiomics—quantifying metabolic burden and heterogeneity. Volumetric indices such as metabolic tumour volume (MTV) and total lesion glycolysis (TLG) outperform single-voxel standardised uptake value (SUV) measures for prognosis, including within HPV-positive cohorts. Texture features add signals beyond MTV/TLG, and PERCIST-aligned longitudinal endpoints [17] plus PET-NECK trial evidence support [5] PET-guided post-chemoradiotherapy (CRT) management.

2. Materials and Methods

2.1. Review Design

This manuscript is a structured narrative review of quantitative imaging biomarkers in HPV-positive oropharyngeal squamous cell carcinoma (OPSCC), focused on (i) diffusion-weighted MRI (DWI)/apparent diffusion coefficient (ADC) and diffusion radiomics; (ii) radiomics and machine learning/deep learning models across MRI/CT/PET; (iii) PET-based metrics and response frameworks; and (iv) translational readiness, including workflow integration and external validation. The purpose was to synthesise evidence around clinically relevant decision points (baseline risk stratification and de-escalation selection; on-treatment response assessment for adaptive strategies; posttreatment assessment/surveillance).

2.2. Information Sources

A literature search was conducted on PubMed/MEDLINE, Embase, Scopus, Web of Science Core Collection, and the Cochrane Library. Searches were supplemented by backward citation searching (screening reference lists of key included papers and relevant reviews) and forward citation searching (“cited by” functions in Scopus/Web of Science).

2.3. Search Strategy and Limits

Searches combined three concept groups:

Disease/site: Oropharyngeal cancer, OPSCC, oropharyngeal squamous cell carcinoma, head and neck squamous cell carcinoma.
HPV status: HPV, p16, human papillomavirus.
Quantitative imaging/computational methods: Diffusion-weighted imaging (DWI), ADC/apparent diffusion coefficient, IVIM; radiomics/texture analysis; machine learning, deep learning, artificial intelligence; CT, MRI, FDG-PET/PET-CT; MR-Linac/MR-guided radiotherapy and adaptive radiotherapy; and methodological terms, including standardisation, reproducibility/repeatability, harmonisation (including ComBat), multi-centre design, and external validation.

Limits applied: January 2009 to June 2025, English language, and human studies. An example full database search strategy is provided in Appendix A. When restricted to 1 January 2009–31 June 2025, PubMed keyword searches illustrate both the scale and fragmentation of the literature that this review synthesises: HPV (46,775 records), oropharyngeal (28,583), MRI (614,877), CT (510,021), PET (133,252), and radiomics (17,570).

2.4. Eligibility Criteria

Included studies were peer-reviewed journal publications relevant to HPV-positive OPSCC and quantitative imaging, including (i) original clinical studies (prospective or retrospective); (ii) multi-centre or externally validated studies where available; (iii) trial-embedded imaging analyses; and (iv) high-quality systematic reviews, guidelines, and standards, directly informing quantitative imaging methodology or response assessment.

Excluded sources were preprints, conference abstracts without full manuscripts, theses, non-indexed reports, and other non-peer-reviewed materials. Studies of mixed head and neck cohorts were included only when HPV-positive OPSCC data were reported separately or constituted a major analytic focus.

Generative AI was used in the writing of this review article. Portions of this manuscript were edited using an AI-assisted language tool (ChatGPT, OpenAI, https://chat.openai.com, accessed on 16 January 2026) to improve clarity, grammar, and flow. All scientific content, interpretations, and conclusions are those of the authors, who reviewed and approved the final text.

3. Results

3.1. Diffusion-Weighted MRI (DWI) and Apparent Diffusion Coefficient (ADC) Mapping

Diffusion-weighted MRI (DWI) is a functional MRI technique that sensitises the signal to the random (Brownian) motion of water molecules within tissue. The derived apparent diffusion coefficient (ADC) quantifies this motion and provides a surrogate measure of tumour cellularity, stromal architecture, and necrosis. In solid tumours, highly cellular regions with intact cell membranes typically exhibit restricted diffusion and lower ADC, whereas treatment-induced cell death, oedema, and necrosis lead to less restricted diffusion and higher ADC. Because of this, DWI and ADC offer a non-invasive way to probe tumour microstructure beyond what is visible on conventional anatomical MRI. They are particularly attractive as imaging biomarkers because they can be acquired on most modern clinical MRI systems without contrast, are relatively quick to perform, and can be repeated throughout the course of chemoradiotherapy to track early biological response and microstructural change.

3.1.1. Baseline ADC and HPV Phenotyping

Across key diffusion studies in head and neck squamous cell carcinoma, including OPSCC, DWI/ADC has been evaluated at three clinically relevant timepoints: at baseline, during chemoradiotherapy (CRT), and early after CRT. In a prospective HNSCC cohort with mixed subsites, Vandecaveye et al. acquired DWI at baseline and again at weeks 2 and 4 of CRT, showing that serial ΔADC increases were significantly larger in complete responders than in non-responders, with early ΔADC correlating with 2-year local control [11]. A subsequent prospective study from the same group obtained DWI 2–4 weeks after CRT and found that early posttreatment ADC had a very high negative predictive value for residual primary tumours (negative predictive value ≈ 100% at the primary site and ≈ 91% for nodal disease), supporting DWI as a triage tool to avoid unnecessary early salvage surgery or biopsy [18]. In a complementary baseline-focused analysis, Ravanelli et al. studied advanced OPSCC with known HPV status and showed that pretreatment ADC histogram metrics were associated with progression-free and overall survival, with heterogeneity measures such as entropy predicting overall survival in HPV-negative disease [19]. Together, these findings illustrate why DWI/ADC is emphasised in this review: it provides repeatable, non-invasive readouts that capture early microstructural response and baseline heterogeneity, and it has demonstrated associations with control, survival, and the need for salvage interventions across HNSCC and HPV-aware OPSCC cohorts [11,18,19].

3.1.2. Baseline ADC as a Prognostic Marker

Multiple OPSCC cohorts report lower baseline ADC—that is, pretreatment ADC metrics derived from MRI acquired before chemoradiotherapy—in HPV-positive primary tumours, with additional discriminative value from ADC histogram and texture features. In a focused diffusion study of 34 patients, Lenoir et al. acquired 3T DWI with six b-values (0–1000 s/mm^2^), reconstructed multiple ADC maps, and showed that the ability of ADC histograms to distinguish HPV status depended strongly on the chosen b-value combination; kurtosis on the ADC_{b0–1000} map achieved the best separation of HPV-positive from HPV-negative OPSCC (AUC ≈ 0.89, sensitivity: 100%, and specificity: 82.6%) [20]. Ravanelli and colleagues similarly demonstrated that pretreatment mean ADC was significantly lower in HPV-positive OPSCC and that ADC histogram parameters related to progression-free and overall survival in advanced disease with known HPV status [9,19]. In a hybrid PET/MRI cohort, Freihat et al. also found lower pretreatment primary tumour ADC_{mean} in HPV-positive patients and reported that ADC_{mean}, but not FDG-PET metrics, helped predict chemoradiotherapy response [21]. However, this pattern is not universal. In a 44-patient OPSCC series with p16-based HPV testing, Schouten et al. observed no statistically significant difference in baseline mean ADC between HPV-positive and HPV-negative tumours [8], underscoring the influence of acquisition parameters, ROI definition, and cohort composition on reported ADC–HPV associations.

Early diffusion studies reported an association between lower pretreatment ADC and subsequent local treatment failure after radiotherapy or chemoradiotherapy [12]. These were typically retrospective, single-centre head and neck cohorts that included mixed subsites (often not limited to OPSCC), used variable DWI protocols (different b-value schemes and readouts), and lacked systematic HPV stratification. Patients who later failed treatment tended to show lower baseline ADC at diagnosis, which was interpreted as reflecting higher cellularity and more restricted diffusion in biologically aggressive tumours. However, these analyses were constrained by small sample sizes, selection bias, heterogeneous acquisition and segmentation methods, and—critically—the absence of adjustment for HPV status and stage, both of which are major prognostic determinants in OPSCC. Consequently, the observed association between low baseline ADC and failure may reflect underlying tumour biology and protocol-related artefacts as much as any independent prognostic contribution from ADC itself [12].

Across key diffusion studies in head and neck squamous cell carcinoma, a consistent pattern emerges. Baseline ADC and ADC-histogram metrics often correlate with HPV status and outcomes, but their independent prognostic value tends to attenuate once HPV/p16 status, T/N stage, and other clinical factors are included, and effect sizes are sensitive to b-value schemes, ROI definition, and segmentation. In contrast, midtreatment ΔADC—typically measured at week 2–4 of CRT—shows more robust and reproducible associations with locoregional control and survival, with low early ΔADC repeatedly identifying poor-responding tumours that are candidates for treatment intensification or early salvage [8,9,11,18,19,20,21,22].

Across studies, baseline ADC sometimes appears prognostic, but the direction and magnitude of associations are not uniform and, in several analyses, weaken when models account for dominant clinical determinants such as HPV status, stage, and related covariates [10,22]. This pattern suggests that part of the apparent prognostic signal may reflect correlated biology (e.g., HPV-linked tumour microstructure) or protocol-dependent measurement differences rather than an independent risk factor [8,20]. Heterogeneity in acquisition (b-value scheme, fitting model, distortion correction), segmentation definitions (primary vs. nodal; whole lesion vs. solid tumour; necrosis/cyst handling), and outcome definitions further shifts ADC distributions and cut-points, plausibly explaining why similar cohorts can yield different “optimal” thresholds and performance [14,20]. Clinically, baseline ADC should, therefore, be interpreted as a candidate adjunct that requires demonstration of incremental value (calibration/reclassification/decision benefit) beyond standard predictors, particularly within HPV-positive cohorts.

3.1.3. Methodological Considerations and Link to MRI Radiomics

MR-Linac implementations further demonstrate that weekly on-treatment DWI is both technically feasible and biologically informative. In a prospective R-IDEAL stage 2a study on a 1.5 T MR-Linac, thirty HNSCC patients underwent baseline and weekly DWI during radiotherapy; ADC metrics rose significantly over the course of treatment for both gross primary volume (GTV-P) and nodal disease (GTV-N), with larger mid-treatment ΔADC in primaries achieving complete remission [23]. Recursive partitioning identified a GTV-P ΔADC 5th percentile > 13% at mid-RT as the strongest discriminator of complete response, and increasing ADC was associated with progressive reductions in residual tumour volume and a negative correlation between ΔADC and change in volume [23]. These data confirm the feasibility of serial intra-treatment DWI on MR-Linac platforms and support the concept of ΔADC-guided, response-adapted planning within the same treatment course [23].

Complementing this, multiparametric MRI radiomics that include ADC features have repeatedly surfaced diffusion-derived descriptors among the most informative predictors, and cross-modal PET/MRI work has shown concordant pretreatment differences—with ADC relating to HPV status/response, while FDG-PET metrics (e.g., MTV, TLG) contribute orthogonal biological information about tumour metabolism. Together, these findings suggest that ADC-based histogram/texture features add complementary signals within multi-sequence MRI and PET/MRI models and can strengthen phenotype or response prediction when diffusion acquisition and feature extraction are standardised [21].

Against this background, we must also consider baseline diffusion signatures that differentiate HPV phenotypes. Head and neck DWI is highly sensitive to motion, susceptibility, and protocol heterogeneity, including field strength, echo-planar readout, fat suppression, and b-value selection. Lower b-values increase perfusion sensitivity, whereas very high b-values reduce signal-to-noise ratios; both effects alter ADC histogram shape and the stability of texture features. In an OPSCC cohort, Lenoir et al. systematically varied b-value combinations and showed that ADC kurtosis on an ADC_{b0–1000} map best distinguished HPV-positive from HPV-negative tumours, with performance dropping when only higher b-pairs were used [20]. By contrast, Schouten et al. used mean ADC from single-slice ROIs with a different b-value scheme and found no significant baseline ADC difference by HPV [8]. Together, these data illustrate how motion control, readout/fat suppression, segmentation policy, and b-value schemes can drive conflicting conclusions about HPV separability, underscoring the need for explicit reporting and standardisation when interpreting ADC-based phenotyping in OPSCC [8,20].

To improve generalisability, feature extraction should be aligned with the Image Biomarker Standardisation Initiative (IBSI) and reported in a way that allows for full reproducibility. In practice, this means, at minimum, (i) reporting exact b-values and diffusion readouts (including EPI and fat-suppression choices, which directly affect diffusion contrast and derived metrics), (ii) stating the diffusion model used for ADC (mono-exponential vs. alternatives) and key fitting options (e.g., noise-floor handling), (iii) defining the segmentation policy (2D vs. 3D; manual vs. semi-automatic; software/tool; thresholds and morphological edits) and how inter-observer variability was managed (consensus contours, repeated segmentations, or agreement statistics), and (iv) specifying the timing and calculation of ΔADC (week 2/week 4 or other prespecified timepoints; absolute vs. percentage change; whole-lesion vs. subvolume; per-patient aggregation rules). Systematic methodological reviews highlight inconsistent acquisition, segmentation, and feature protocols as major sources of between-study variability and modest Radiomics Quality Scores, underlining the need for standardised pipelines [24]. IBSI provides consensus feature definitions, reference values, and reporting checklists that improve cross-software agreement and test–retest reliability for DWI and radiomics workflows [14]. In parallel, HPV-aware ADC studies show that ostensibly strong prognostic effects can disappear once p16 status and clinical covariates are properly modelled [10], reinforcing the need to document both imaging methodology and statistical adjustment. Accordingly, minimum reportable items include scanner/vendor and field strength; sequence type and bandwidth; all diffusion b-values; motion and fat-suppression strategy; ADC model and fitting details; ROI workflow and quality control; ΔADC definition and normalisation; and any preprocessing (registration, resampling, intensity normalisation/clipping) prior to feature extraction—each directly affecting reproducibility and comparability across centres [10,14,24].

For HPV-associated OPSCC, midtreatment ΔADC demonstrates the most consistent link to outcomes: across prospective cohorts with prespecified week-2/week-4 DWI, patients who ultimately achieve better locoregional control show larger intratreatment ΔADC, and this effect persists in multivariable analyses that include HPV status and stage [11,18,22]. MR-Linac programmes further show that on-treatment DWI is feasible on a weekly basis, with systematic midtreatment ADC rises in primary and nodal disease, supporting the clinical concept of response-adapted management during the same course of CRT [23].

Baseline ADC and ADC-radiomics assist HPV phenotyping and can augment risk models when acquisition is controlled: studies using standardised b-value schemes and consistent ROI/feature extraction report that pretreatment ADC distributions and histogram shape (e.g., kurtosis/skewness) help distinguish HPV phenotypes and relate to early responses [9,19,20,21]. However, findings are not uniform—some OPSCC series report no significant baseline ADC difference by HPV—underscoring that protocol choices (b-value spread, readout/fat-suppression, and segmentation) can mask or magnify biological effects [8]. In HPV-positive OPSCC, ΔADC at week 2/week 4 is the most actionable early biomarker for outcome-oriented decisions (e.g., surveillance intensity and adaptive planning), while baseline ADC/ADC-radiomics are best used as a phenotypic context that can refine risk estimates when methods are explicitly standardised [8,9,11,18,19,20,21,22,23].

Baseline ADC is available upfront and is, therefore, attractive for initial stratification (e.g., supporting risk phenotyping or de-escalation eligibility), but it is also more vulnerable to inter-protocol variability and can represent mixed processes (cellularity, oedema, and necrosis), contributing to inconsistent prognostic strength across cohorts and potential attenuation after adjustment for HPV/p16 status and other clinical covariates [8,10,20,22]. By contrast, midtreatment ΔADC (dADC) functions as an early response biomarker, capturing therapy-induced microstructural change before gross anatomical response, and is, therefore, more directly actionable for response-adaptive treatment modification (escalation or de-intensification) when measured at a predefined timepoint [11,18,22]. Practically, this supports a two-stage interpretation: baseline ADC is best positioned as a pretreatment adjunct that must prove its incremental value beyond clinicopathologic predictors within HPV-positive cohorts, whereas midtreatment dADC is best positioned as a trigger candidate for adaptive protocols—where the key evidentiary requirement is prospective validation of a pre-specified timepoint and threshold linked to an actionable decision rule and robust cross-centre standardisation [14].

Key clinical DWI/ADC evidence is summarised in Table 1, including prospective serial-treatment cohorts defining early response timepoints and contemporary HPV-focused analyses that clarify the comparative utility of baseline ADC versus midtreatment ΔADC for clinical decision-making.

3.2. MRI Radiomics

3.2.1. Reproducibility, Standardisation, and Model Credibility

MRI radiomics refers to the extraction of quantitative image biomarkers from routine MRI to characterise tumour phenotypes in a reproducible way [14]. Typical pipelines comprise standardised image acquisition (e.g., CE-T1, T2, DWI) with key parameters reported; tumour/ROI segmentation (2D or 3D; manual or semi-automatic) with software and editing rules documented; preprocessing to place images on a comparable intensity scale and voxel size (normalisation, resampling, and optional registration/denoising); feature extraction using IBSI-compliant definitions for first-order (intensity histogram), shape, and texture families (GLCM, GLRLM, GLSZM, NGTDM), often with wavelet or Laplacian-of-Gaussian filter expansions; and finally, feature selection, model building (e.g., regularised regression or tree-based methods), and performance evaluation with prespecified cross-validation and, where possible, external validation in line with radiomics quality recommendations [14,15]. In HPV-associated OPSCC, the MRI radiomics studies referenced in this section broadly follow this framework: features are derived mainly from CE-T1, T2, and DWI/ADC within primary tumour (±nodal) ROIs, with pipelines that include intensity normalisation, voxel resampling, stability and redundancy filtering, explicit handling of class imbalance, and cross-validated model fitting [4,13,27,28]. Within such standardised workflows, MRI radiomics yields descriptors that plausibly reflect cellularity, stromal reaction, necrosis, and microarchitecture and can support HPV status classification and prognostic modelling when methods are transparently reported and quality criteria are met [4,13,15,27,28].

Across modalities, a recurring barrier to clinical translation is reproducibility. Feature values vary with acquisition, reconstruction, segmentation policy, and preprocessing; therefore, adherence to the Image Biomarker Standardisation Initiative (IBSI) for feature definitions, explicit reporting of diffusion b-values and ΔADC calculation, and use of harmonisation methods (e.g., ComBat) are essential [14]. Guidance from EIBALL/EORTC outlines validation tiers, endpoints, and reporting for biomarker studies, and Quantitative Imaging Biomarkers Alliance (QIBA) diffusion profiles support scanner performance for DWI [29,30]. Radiomics quality assessments continue to find modest Radiomics Quality Scores and limited external validation, underscoring the need for transparent pipelines, pre-registration, and independent testing [15].

3.2.2. HPV Phenotype Classification from MRI

Diffusion-derived radiomic features repeatedly emerge as informative for HPV phenotype discrimination. In a focused DWI study of pretreatment OPSCC with p16-defined HPV status, Lenoir et al. acquired 3T DWI with six b-values (0, 50, 100, 500, 750, 1000 s/mm^2^); whole-tumour ADC histograms showed that kurtosis on ADC_{b0–1000} separated HPV-positive from HPV-negative disease with AUC ≈ 0.89, and the authors emphasised that separability depended on the b-value scheme used to compute ADC [20]. Building on this diffusion signal in a full radiomics framework, a single-centre multiparametric MRI study reported mean test AUC ~ 0.77 for HPV classification when ADC-derived features were included alongside CE-T1 and T2, indicating that diffusion features add measurable discriminative power beyond conventional sequences [13]. Extending to heterogeneous, multi-centre data with explicit class-imbalance handling, an ADC-only radiomics model performed at least as well as—and in some external-testing analyses better than—CE-T1/T2-based models, supporting the generalisability of diffusion-anchored HPV phenotyping when pipelines are transparently validated [28]. Taken together, these results suggest that, within MRI radiomics workflows, much of the separative signal for HPV status resides in diffusion, provided ADC is computed with appropriate b-values and models are developed and tested with attention to class balance and external validation [13,20,28].

3.2.3. Prognostic Models and De-Intensification

Beyond phenotype classification, MRI radiomics is also predictive of clinical outcome. In a large single-centre OPSCC cohort, Boot et al. manually delineated 249 primary tumours on pretreatment native T1-weighted MRI and extracted 498 radiomic features for HPV status classification and overall survival modelling [27]. Radiomics-only logistic regression and random forest models achieved an AUC of 0.79 for HPV prediction, whereas a model combining radiomic factors with clinical parameters improved HPV classification to an AUC of 0.89 and yielded an overall survival C-index of 0.72 [27]. This supports MRI radiomics as an augment to, rather than a replacement for, established clinicopathologic risk factors in OPSCC.

In HPV-positive neoadjuvant-chemotherapy cohorts, Lyu et al. conducted a retrospective multicohort study of p16-positive OPSCC in which primary-plus-nodal MRI radiomics were used to predict neoadjuvant chemotherapy response and to stratify survival in an independent validation cohort, supporting response-oriented risk stratification in HPV-positive disease [4]. Together with the large single-centre series by Boot et al. [27], these findings support MRI radiomics as an augmentation to clinicopathologic factors rather than a replacement. However, historical imaging observations require cautious reinterpretation because of confounding caused by subsite mix, protocol heterogeneity, and incomplete HPV stratification.

Between-scanner and protocol differences can shift radiomics feature values—not just the visual appearance of images. In MRI radiomics, this is critical because classifiers rely on subtle intensity and texture patterns; if those patterns mainly capture scanner or protocol differences rather than tumour biology, models may appear to perform well in single-centre experiments but fail when applied to new sites. First-order intensity features (e.g., mean, median, and entropy) and texture features (e.g., GLCM contrast/entropy, GLRLM run-lengths, GLSZM zone sizes, and NGTDM coarseness) are particularly sensitive to scanner/vendor and field strength, coil configuration, sequence/readout, fat suppression, intensity normalisation and binning, and voxel size/resampling; ADC-derived features additionally vary with the b-value scheme and fitting model [14]. To limit these effects, the IBSI recommends standardised feature definitions and full reporting of acquisition and preprocessing (scanner/vendor, key sequence parameters, all b-values, segmentation policy, and normalisation/resampling) [14]. For multi-centre HPV-associated OPSCC radiomics, statistical harmonisation methods such as ComBat can further mitigate scanner- and site-related shifts in feature distributions and improve comparability across centres, as shown in multi-centre imaging and head and neck radiomics studies [31,32]. Wherever possible, test–retest or phantom data should be used to identify which features remain stable in the chosen pipeline, ensuring that downstream HPV classification or prognostic models are driven by robust biological signals rather than hidden protocol differences [14,31,32].

For outcomes, radiomics is most persuasive when combined with clinicopathologic factors rather than used in isolation. In a large single-centre OPSCC cohort, Boot et al. showed that T1-based MR radiomics predicted HPV status with AUC ~0.79 and that adding radiomic factors to clinical variables improved HPV-classification (AUC 0.89) and overall survival prediction (OS C-index ~ 0.72) [27]. In HPV-positive neoadjuvant chemotherapy cohorts, Lyu et al. used baseline MR radiomics to predict NAC responses across internal and external validation sets and then constructed nomograms that combined radiomic signatures with clinical characteristics to stratify PFS and OS [4]. Together with ADC-focused classification work showing that diffusion-derived features often carry the strongest HPV-separating signal [13,20,28], these studies support a pragmatic approach: we should leverage ADC-anchored radiomics to assist HPV phenotyping, and integrate radiomic signatures with established clinical factors to refine risk stratification, provided that acquisition, segmentation, and validation are standardised and transparently reported [4,27].

Representative MRI radiomics pipelines and validation approaches are summarised in Table 2, spanning HPV classification and prognostic modelling, with feature definition and reporting expectations aligned with IBSI guidance.

3.3. CT Radiomics and Deep Learning Models

3.3.1. CT Radiomics for HPV Phenotype Classification

Early feasibility work showed that CT texture differs between HPV-positive and HPV-negative primaries [16], with Buch et al. demonstrating on CT that hand-crafted texture features from tumour ROIs separate groups when referenced against p16/HPV status—establishing a reproducible radiomic signal on routine scans [16]. This was followed by multi-centre development with external validation, where a CT radiomic signature for p16/HPV achieved AUCs around 0.70–0.80 across scanners and institutions [13,33]. Independent cohorts then corroborated HPV discrimination using CT radiomics in both primary tumours and metastatic nodes—linking heterogeneity metrics to HPV status in OPSCC [34,35]—while related methodological work examined pipeline generalisability/robustness [36]. Fully automated 3D models were introduced to reduce manual annotation and capture contextual cues on CT [37], and combined radiomics with HPV status to enable risk stratification beyond imaging alone [37]. Together, these studies support CT radiomics as a scalable tool for HPV phenotyping, with complementary information available from nodal analyses and evolving workflows that emphasise robustness and clinical utility [16,34,35,36,37,38].

3.3.2. Prognostic CT Radiomics and Nodal/Peritumoural Features

CT radiomics—especially when combined with clinical variables—improves predictions of OS/PFS and locoregional control over clinical factors or stage alone [39]. In 2020 [39], pretreatment contrast-enhanced CT radiomics was developed and validated to classify HPV status and stratify overall survival; when radiomics was fused with clinical variables, survival prediction outperformed clinical-only or stage-only models, supporting integration of imaging-derived phenotypes into risk prediction [39].

Larger OPSCC series relate pretreatment CT texture to recurrence and control, and adding peritumoural rings to intratumoural features improves prediction and yields disease-free survival (DFS) nomograms that are straightforward to apply clinically [40,41]. In [40], primary tumour textures on planning CT were associated with local recurrence, indicating an independent failure-risk signal from radiomic heterogeneity [40]. In [41], CT features associated with HPV status also informed prognosis, and combining intratumoural with peritumoural features improved discrimination and supported clinically usable DFS nomograms [41].

Methodological decisions materially shape radiomic performance. In Ren and Yuan’s CT radiomics study, 3D segmentations produced a higher proportion of reproducible texture features than 2D, yet HPV classification accuracy was actually slightly better with 2D models—illustrating that more complex volumetric segmentation does not necessarily improve prediction and that practicality and endpoint performance must be considered alongside feature stability [42]. Consistent with this, radiomics quality audits and OPSCC-specific texture reviews highlight that heterogeneous segmentation strategies and incomplete reporting of ROI definitions (for example, gross tumour versus post-contrast enhancing volume), reconstruction kernels, and post-processing parameters make it difficult to reproduce and externally validate models across centres [24,43].

Kann et al. retrospectively evaluated a previously developed CT-based deep learning model within the ECOG-ACRIN E3311 randomised de-escalation trial cohort. Pretreatment contrast-enhanced CTs were used; on each scan, the largest pathologic node (by short axis) and up to two additional nodes were segmented and labelled for extranodal extension (ENE) using surgical pathology as the ground truth. The algorithm produced node-level ENE probabilities and was benchmarked against four board-certified head and neck radiologists (blinded), with AUC as the primary endpoint. [3]. In 178 scans (313 nodes; 71 with ENE), the model achieved AUC = 0.86 (95% confidence interval (CI) 0.82–0.90), outperforming all readers (p < 0.0001). At the reader-matched specificity (false-positive rate ≈ 22%), algorithm sensitivity reached 75% (≈+13% absolute vs. the best reader); at a 30% false-positive rate, sensitivity was ≈90%. Reader specificity and sensitivity varied widely (43–86% and 45–96%, respectively) with poor inter-reader agreement (κ ≈ 0.32), underscoring the potential clinical utility of a consistent AI screener [3]. Because overt ENE was excluded by protocol in E3311, the cohort represents a diagnostically challenging, pre-operative screening population; the authors argue that this trial-anchored evaluation offers rare, high-credibility evidence for head and neck AI. Noted caveats include retrospective design and the need for node selection/segmentation; nevertheless, performance gains were most pronounced for >1 mm ENE and in nodes ≥ 1 cm short axis, aligning with clinically meaningful thresholds [3].

In interpretable pipelines, useful performance can be retained while keeping the decision process transparent. Altinok et al. used the Mens eX Machina selector to reduce 851 CT radiomic features to two shape descriptors: sphericity and max2DDiameterRow [44]. A Bayesian Network trained on these two features achieved AUC 0.78 in training and 0.72 on the held-out test set. The low-dimensional, conditional-probability structure made feature contributions explicit. A higher-capacity SVM using 25 features increased test AUC to 0.83 but reduced interpretability. Fanizzi et al. trained an Inception-V3 CNN on GTV-cropped CT images from the OPC-Radiomics cohort and evaluated it on an independent multi-centre test set [45]. They applied Grad-CAM to visualise salient regions. In correctly classified HPV-positive cases, saliency maps were predominantly intratumoural, whereas HPV-negative predictions emphasised tumour edges. The independent-test AUC was 0.735, showing that post hoc explainability can be combined with competitive predictive performance.

Methodological quality remains a constraint. In a focused systematic review of OPSCC-HPV radiomics, Spadarella et al. [15] reported a median Radiomics Quality Score of ~33% (range: 0–42%), with few studies performing external validation and no studies providing a public protocol, phantom study, or repeated-timepoint imaging. These findings underscore the need for transparent, standardised pipelines; preregistered analysis plans; and independent, multi-centre testing before clinical adoption. Accordingly, rigorous reporting and harmonisation are essential; the following checklist summarises the minimum items required for reproducible studies.

CT radiomics robustness is sensitive to acquisition and reconstruction choices, including reconstruction kernels, slice thickness, and denoising. This was highlighted in methodological guidance and reviews of OPSCC-HPV radiomics, as noted in both the IBSI consensus and the Spadarella et al. review, emphasising protocol standardisation and careful reporting to limit feature instability and bias [14,15].

To improve multi-centre portability, adherence to the IBSI reference feature definitions and processing scheme is recommended; the IBSI provides consensus definitions and reference values for a broad set of features and outlines validation practices that enhance reproducibility. In parallel, harmonisation strategies such as ComBat can reduce inter-centre shifts in feature distributions, supporting pooled modelling while maintaining biological signal [14].

For HPV-associated OPSCC, CT radiomics is most mature for HPV status classification and for augmenting prognostic models when fused with clinical variables; peritumoural features add value [16,24,33,34,35,37,38,39,40,41,42,46]. Deep learning extends this to segmentation-free status prediction and ENE triage on trial data [3,47]. Robust deployment depends on IBSI-consistent pipelines, harmonisation, and external validation [14,15].

CT radiomics and deep learning studies relevant to HPV phenotyping and nodal risk features are summarised in Table 3, from early CT texture signals through multi-centre HPV classifiers and recent trial-linked deep learning for clinically important endpoints such as ENE, alongside standardisation/validation frameworks intended to improve portability.

3.4. FDG-PET/CT and Texture Analysis

At diagnosis, FDG-PET/CT complements contrast-enhanced CT/MRI by improving whole-body staging (occult nodal and distant metastases), helping localise unknown primaries, and informing treatment intent. In radiotherapy workflows, it assists GTV delineation and highlights metabolically active nodes that may be equivocal on anatomical imaging alone. After chemoradiotherapy (CRT), a timed posttreatment scan (typically at 10–12 weeks) is used for response assessment: a complete metabolic response has a high negative predictive value and can obviate planned neck dissection, whereas residual or equivocal uptake prompts short-interval imaging, biopsy, or salvage. Outside this window, PET/CT is used selectively in surveillance when there is clinical or biochemical suspicion of recurrence. In HPV-associated OPSCC—where disease often presents with small primaries and nodal metastases, outcomes are favourable but heterogeneous, and HPV/p16 status and T/N stage do not fully capture risk—FDG-PET/CT contributes baseline metabolic metrics such as SUV_{max}/SUV_{mean}, metabolic tumour volume (MTV), total lesion glycolysis (TLG), and intratumour heterogeneity descriptors that reflect tumour biology beyond size. These parameters can refine risk stratification, aid early and post-CRT response assessment, and support posttreatment decisions (e.g., observation, biopsy, or salvage) while complementing MRI-based diffusion and texture information.

3.4.1. Volumetric PET Metrics (MTV/TLG) and Prognosis

Across treatment-uniform CRT cohorts and larger single-centre series, volumetric PET metrics stratify risk. In a uniform intensity-modulated radiotherapy (IMRT) + concurrent-chemo cohort of 176 OPSCC patients, Lim et al. delineated MTV with a 42% SUVmax threshold and computed TLG as SUVmean × MTV; each doubling of primary tumour MTV predicted locoregional failure (HR ≈ 2.4; p = 0.005), distant metastasis, and overall survival (HR ≈ 1.9 and 1.8; p < 0.001). TLG was likewise associated with DM and OS (p < 0.001). In Kim et al. (n = 221), pretreatment MTV and TLG (primary tumour volume of interest (VOI)) were significant on univariate analysis, with empiric cut-offs (e.g., MTV ≈ 11 mL; TLG ≈ 79 g) separating DFS/OS; overall 5-year OS of 72% and DFS of 79.5% were reported, supporting baseline volumetric burden as a pragmatic class to prioritise for LRC, DM, PFS/DFS, and OS [49,50].

By contrast, in Lim et al., primary tumour SUVmax was associated with death on univariate analysis but lost significance after adjustment for T stage, whereas MTV and TLG remained independently associated with overall survival (p ≈ 0.013 for TLG and p ≈ 0.032 for MTV; SUVmax p ≈ 0.16) [49]. Kim et al. likewise found that age, tumour SUVmax, and MTV were independent predictors of OS and DFS in multivariable models [50]. Conceptually, SUVmax samples only the hottest voxel, whereas MTV and TLG encode whole-tumour metabolic burden, which likely explains why volumetric PET metrics often demonstrate stronger or more consistent prognostic associations than SUVmax across series. Emerging HPV-positive OPSCC cohorts suggest that volumetric and, in some studies, heterogeneity measures can further refine risk stratification, although findings remain heterogeneous and dependent on model specification.

3.4.2. Intratumoural Heterogeneity and Texture-Derived Risk

Within HPV-positive OPSCC cohorts, volumetric FDG burden generally remains prognostic, but cut-offs and effect sizes are often more modest and variable than in mixed HNSCC series. Alluri et al. studied 70 patients with HPV-positive stage III–IV OPSCC and found that total and primary tumour MTV were significantly associated with event-free survival, with total MTV and then primary MTV remaining independent prognostic markers in multivariable models, whereas SUV metrics and most TLG measures did not [51]. In Mena et al. (n = 105 HPV-positive OPSCC), baseline MTV measured on PET/CT using gradient-based and 50% SUV_{max} segmentation showed an optimal total MTV cut-off of ≈12.7 mL for event-free survival in Kaplan–Meier analysis, while multivariable models identified intratumoural heterogeneity (AUC-CSH) as an independent predictor; patients with both higher heterogeneity and higher SUV_{max} had the worst outcomes [52]. Together, these data suggest that in HPV-positive OPSCC, tumour burden remains clinically informative, but prognostic thresholds and independent effects are more nuanced and depend on how volumetric and heterogeneity metrics are modelled.

Methodological details are useful for replication, as shown when Alluri et al. focused on stage III–IV HPV-POSITIVE OPSCC treated definitively, analysing PET-derived SUVmax/SUVmean/SUVpeak, MTV, TLG, and reporting MTV as independent prognosticators after adjustment [51]. Mena et al. computed primary tumour MTV via gradient-based and 50% SUVmax segmentation and defined TLG as SUVmean × MTV; optimal cut-offs (e.g., SUVmax ~ 16.7; MTV ~ 12.7 mL) separated risk groups, and heterogeneity (lower AUC-CSH) plus higher MTV identified the poorest EFS, underscoring that volumetric burden remains valuable even as HPV-positive biology moderates effect sizes [52].

3.4.3. PET Texture and Radiomics-Based Prognostic Models

In advanced T-stage OPSCC treated with curative-intent (chemo)RT, pretreatment PET texture adds prognostic signal beyond bulk uptake. Cheng et al. extracted histogram, GLCM (e.g., uniformity, entropy, and dissimilarity), and NGTDM features from primary tumour PET VOIs (SUV ≥ 2.5) [53]. On multivariable Cox analysis, GLCM uniformity remained independently associated with PFS/DSS/OS alongside TLG; a simple score combining TLG > 121.9 and uniformity ≤ 0.138 identified markedly poorer outcomes (p < 0.001 for PFS/DSS; p = 0.002 for OS). These findings illustrate that spatial texture—particularly low uniformity (i.e., higher heterogeneity)—can stratify risk independent of volumetric burden.

Within HPV-positive OPSCC, heterogeneity metrics retain prognostic value and refine volumetric risk. In 105 HPV-positive OPSCC patients, Mena et al. quantified intratumoural heterogeneity using the AUC-CSH index (where lower AUC-CSH indicates higher heterogeneity) and showed that AUC-CSH was an independent predictor of event-free survival, while Kaplan–Meier analysis identified optimal baseline cut-offs of SUV_{max} ≈ 16.7 and total MTV ≈ 12.7 mL that separated risk groups [52]. Patients with both higher heterogeneity and higher MTV, or higher heterogeneity and higher SUV_{max}, had the worst EFS, supporting the view that texture-derived heterogeneity captures biologically meaningful spatial variation not fully reflected by MTV/TLG alone in HPV-positive disease [52].

In a multi-institutional OPSCC cohort (n = 311, Yale + TCIA), Haider et al. extracted 1037 PET and 1037 CT radiomic features from primary tumours and nodes and built random survival forest models using three inputs: AJCC-8-only, radiomics-only, and combined (AJCC + radiomics) [2]. In HPV-positive disease, the best radiomics model achieved a C-index of ≈ 0.62 for PFS versus ≈ 0.54 for AJCC alone, with radiomics and combined models generally outperforming AJCC in time-dependent analyses for PFS (and often OS), indicating that clinical + radiomics models provide an added prognostic value beyond staging and standard covariates. To assess generalisability and guard against single-centre optimism, we next consider multi-centre benchmarking and blind-test challenges.

Where available, multi-centre blind-test benchmarks such as HECKTOR are particularly informative. These challenges aggregate PET/CT data from several institutions, apply a common preprocessing and labelling scheme, and keep test labels hidden so models are evaluated on a truly unseen, multi-centre cohort [54]. Demonstrating good performance under these conditions shows that radiomics and AI models can generalise beyond their development site, helping to mitigate single-centre overfitting and providing a realistic benchmark for clinical translation.

3.4.4. Multi-Centre Benchmarks

While benchmarks demonstrate technical feasibility, translation ultimately depends on practice-changing clinical evidence, exemplified by PET-NECK. In a multi-centre randomised trial, PET-CT–guided surveillance after CRT was non-inferior to planned neck dissection for overall survival, dramatically reducing operations (54 vs. 221 neck dissections) and yielding cost savings of approximately GBP 1492 per patient, with similar complication rates and quality of life—a practice-changing result in a predominantly oropharyngeal, p16-positive (HPV-positive) cohort [5].

In parallel, shared multi-centre datasets and challenge platforms help define current performance ceilings and reinforce the case for rigorous standardisation. The TCIA head and neck FDG-PET/CT collections provide openly available multi-centre data, while the HECKTOR challenges offer multi-centre, blind-test benchmarks in which fully automated PET/CT pipelines tackle tumour segmentation and progression-free survival prediction. In the 2021 edition, the best methods achieved Dice ≈ 0.76 for primary tumour segmentation and C-index ≈ 0.70–0.72 for PFS on a held-out multi-centre test set, illustrating that robust external performance is achievable but still leaves headroom for further refinement and standardisation [54].

PET radiomics are highly susceptible to scanner and reconstruction choices (e.g., reconstruction kernel, iterations, post-reconstruction filters, and voxel size), and numerous head and neck PET radiomics papers have emphasised that inconsistent acquisition and preprocessing can markedly alter feature values and model performance. Studies from Payabvash and others on head and neck cohorts illustrate that voxel-intensity normalisation and consistent preprocessing pipelines can materially change which radiomic features are selected as important and can improve prognostic model performance compared with using raw SUV values alone [55]. Conceptually, pre-specifying the normalisation method and keeping reconstruction parameters fixed helps to reduce spurious variability and should yield more reproducible, generalisable signatures, particularly in multi-centre OPSCC radiomics.

Differences between fixed and relative SUV thresholds can alter radiomic values and downstream risk groups. Cheng et al. explicitly thresholded primary tumour VOIs using SUV ≥ 2.5 and still found GLCM-derived uniformity to be independently prognostic alongside TLG, illustrating that while segmentation choices modulate feature values, they do not abolish prognostic texture signal [53]. Multi-centre blind-test challenges such as HECKTOR further show that fully automatic PET/CT segmentation and outcome-prediction models achieve only moderate performance and struggle in more difficult cases, reinforcing that standardised preprocessing, consistent VOI definitions, and rigorous external validation are essential, along with feature stability testing and parsimonious, interpretable models [54].

Clinically credible PET models in HPV-associated OPSCC, therefore, combine clinical covariates with volumetric PET metrics (and, where robust, texture features), are trained on standardised or harmonised data, and demonstrate performance on independent test sets. Single- and multi-centre OPSCC series show that radiomics can add prognostic value beyond AJCC-8 [2,49,50,52,53], while multi-centre blind-test platforms such as HECKTOR and randomised data such as PET-NECK illustrate the importance of external validation and reproducibility for clinical adoption [5,54,55].

Key FDG-PET/CT metrics, radiomics models, and response frameworks are summarised in Table 4, including baseline metabolic burden studies, PET-guided management evidence, response standardisation, and multi-centre benchmarking efforts.

3.5. Methodological Issues Driving Heterogeneity

Throughout this review, we have referenced and discussed methodological issues that are driving heterogeneity in reported performance. In Table 5, we summarise these issues to create a framework for future research in this field. Reproducible quantitative imaging in HPV-positive OPSCC depends on transparent, standardised reporting across acquisition, segmentation, feature extraction, and validation because small methodological differences (e.g., diffusion b-value selection, ROI rules, or inter-site harmonisation) can materially alter reported performance. We, therefore, contextualise current radiomics quality limitations and reporting gaps [14,15,24] and align the checklist with practical approaches to reducing between-site variability [31,32] and modality-specific protocol sensitivity (e.g., b-value dependence in diffusion biomarkers) [20]. For PET-derived endpoints and response assessment, we additionally reference standard response frameworks to support consistent reporting in longitudinal workflows [17,57].

4. Future Directions

As discussed in this article, quantitative imaging in HPV-positive OPSCC is an area of active research and development globally, and several future directions are particularly promising. Serial intratreatment DWI on MR-Linac platforms consistently shows mid-course ADC rises in responding head and neck disease [11,18,22], supporting adaptive designs that use pre-specified ΔADC thresholds to modify target volumes and doses within a single course of CRT. Recent prospective MR-Linac work demonstrates the feasibility of weekly DWI acquisition and quantifies temporal ADC change at whole-tumour and subvolume levels, providing an operational footing for response-adapted radiotherapy in HPV-positive OPSCC [22]. In parallel, PET-CT-guided post-CRT surveillance has already proven non-inferior to planned neck dissection in PET-NECK [5] while markedly reducing operations, and next-generation studies should integrate baseline PET burden (e.g., MTV/TLG) [49,50,52] and PET radiomics [2,55] with PET-guided surveillance schedules to personalise follow-up intensity in HPV-positive populations, formally testing health–economic and quality-of-life endpoints alongside oncologic outcomes. For radiomics to influence guidelines, pipelines must also be prospectively specified and auditable, aligned with trial-readiness criteria such as those articulated by EIBALL/EORTC (EIBALL/EORTC imaging biomarker guidelines)—including pre-registration, clearly defined endpoints, data-management plans, and decision-impact analyses—so that imaging biomarkers can be embedded credibly into de-intensification and adaptive-RT trials.

Moving from proof-of-concept performance to clinical adoption in HPV-positive OPSCC now requires a coordinated translational programme that prioritises generalisability, decision impact, and workflow readiness. In the short term (12–24 months), the field should converge on a small number of “trial-ready” biomarkers per modality and agree upon base definitions (acquisition parameters, preprocessing, and ROI rules); mandate site-separated external validation as a minimum evidentiary threshold; and adopt transparent, pre-registered pipelines aligned with reproducibility standards and radiomics quality recommendations [14,15]. Given known scanner/protocol effects, harmonisation and technical QA (including ComBat-style approaches and diffusion performance profiling) should be used where appropriate to support multi-centre pooling and comparability [14,31,32]. In the medium term (2–5 years), candidate biomarkers should be embedded prospectively as decision triggers within multi-centre de-escalation and adaptive radiotherapy designs—supported by the feasibility of serial intratreatment DWI on MR-Linac platforms and repeated evidence that midtreatment ΔADC captures clinically meaningful response during treatment. In parallel, surveillance pathways provide a pragmatic translation target: PET-CT-guided follow-up has already reduced unnecessary operations without compromising outcomes in PET-NECK, and future studies should test whether baseline PET burden (MTV/TLG) and PET radiomics can personalise surveillance intensity with health–economic and quality-of-life endpoints [2,5,54]. In the long term (5+ years), translation will require auditable, trial-ready workflows (pre-specified endpoints, data governance, and decision-impact evaluation) aligned with frameworks such as EIBALL/EORTC, enabling biomarkers to be embedded credibly into adaptive-RT and de-intensification trials.

5. Conclusions

Quantitative imaging in HPV-positive OPSCC has moved beyond exploratory correlations to a small number of convergent, clinically meaningful signals. Across prospective head and neck cohorts, midtreatment ΔADC rises in DWI are repeatedly associated with better local control and survival and outperform baseline ADC once HPV status and stage are accounted for [11,18,22]. PET studies show that volumetric metrics such as MTV and TLG are more robust prognosticators than SUVmax alone, linking higher metabolic burden to worse locoregional control, distant metastasis, and overall survival in both mixed HNSCC and HPV-positive OPSCC [48,49,51,52]. Radiomics and AI models—based on MRI, CT, and PET—consistently add incremental value when combined with clinicopathologic factors, improving HPV status discrimination and progression-free or overall survival prediction beyond AJCC-8 staging in multi-institutional series [2,3,4,16,27]. At the same time, systematic reviews highlight low median Radiomics Quality Scores, limited external validation, and heterogeneous methodology, underscoring that these promising tools are not yet ready for guideline-directed use [15].

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ang K.K. Harris J. Wheeler R. Weber R. Rosenthal D.I. Nguyen-Tân P.F. Westra W.H. Chung C.H. Jordan R.C. Lu C. Human Papillomavirus and Survival of Patients with Oropharyngeal Cancer N. Engl. J. Med.2010363243510.1056/NEJ Moa 091221720530316 PMC 2943767 · doi ↗ · pubmed ↗
2Haider S.P. Zeevi T. Baumeister P. Reichel C. Sharaf K. Forghani R. Kann B.H. Judson B.L. Prasad M.L. Burtness B. Potential Added Value of PET/CT Radiomics for Survival Prognostication beyond AJCC 8th Edition Staging in Oropharyngeal Squamous Cell Carcinoma Cancers 202012177810.3390/cancers 1207177832635216 PMC 7407414 · doi ↗ · pubmed ↗
3Kann B.H. Likitlersuang J. Bontempi D. Ye Z. Aneja S. Bakst R. Kelly H.R. Juliano A.F. Payabvash S. Guenette J.P. Screening for Extranodal Extension in HPV-Associated Oropharyngeal Carcinoma: Evaluation of a CT-Based Deep Learning Algorithm in Patient Data from a Multicentre, Randomised de-Escalation Trial Lancet Digit. Health 20235 e 360e 36910.1016/S 2589-7500(23)00046-837087370 PMC 10245380 · doi ↗ · pubmed ↗
4Lyu W. Gong J. Zhu L. Xu T. Huang S. Shen C. Wang C. He X. Ying H. Hu C. MR Radiomics Unveils Neoadjuvant Chemo-Responsiveness with Insights into Selective Treatment de-Intensification in HPV-Positive Oropharyngeal Carcinoma Oral Oncol.202415910704910.1016/j.oraloncology.2024.10704939341091 · doi ↗ · pubmed ↗
5Mehanna H. Wong W.-L. Mc Conkey C.C. Rahman J.K. Robinson M. Hartley A.G.J. Nutting C. Powell N. Al-Booz H. Robinson M. PET-CT Surveillance versus Neck Dissection in Advanced Head and Neck Cancer N. Engl. J. Med.20163741444145410.1056/NEJ Moa 151449327007578 · doi ↗ · pubmed ↗
6Wu I.C. Chen Y.C. Karmakar R. Mukundan A. Gabriel G. Wang C.C. Wang H.C. Advancements in Hyperspectral Imaging and Computer-Aided Diagnostic Methods for the Enhanced Detection and Diagnosis of Head and Neck Cancer Biomedicines 202412231510.3390/biomedicines 1210231539457627 PMC 11504349 · doi ↗ · pubmed ↗
7Alabi R.O. Elmusrati M. Leivo I. Almangush A. Mäkitie A.A. Artificial Intelligence-Driven Radiomics in Head and Neck Cancer: Current Status and Future Prospects Int. J. Med. Inform.202418810546410.1016/j.ijmedinf.2024.10546438728812 · doi ↗ · pubmed ↗
8Schouten C.S. de Graaf P. Bloemena E. Witte B.I. Braakhuis B.J.M. Brakenhoff R.H. Leemans C.R. Castelijns J.A. de Bree R. Quantitative Diffusion-Weighted MRI Parameters and Human Papillomavirus Status in Oropharyngeal Squamous Cell Carcinoma AJNR Am. J. Neuroradiol.20153676376710.3174/ajnr.A 427125721078 PMC 7964304 · doi ↗ · pubmed ↗