Assessing the Relative Importance of Imaging and Serum Biomarkers in Capturing Disability, Cognitive Impairment, and Clinical Progression in Multiple Sclerosis
Alessandro Cagol, Pascal Benkert, Sabine Schaedelin, Mario Ocampo‐Pineda, Noemi Montobbio, Po‐Jui Lu, Batuhan Ayci, Antonia Wenger, Alfi Aran Shukur, Kornelius Kaim, Lester Melie‐Garcia, Matthias Weigel, Alessio Signori, Pasquale Calabrese, Ludwig Kappos, Maria Pia Sormani

TL;DR
This study identifies spinal cord atrophy and cortical degeneration as key predictors of disability and progression in multiple sclerosis, with serum biomarkers adding complementary value.
Contribution
The study introduces a multimodal approach combining MRI and serum biomarkers to predict MS severity and progression.
Findings
Spinal cord atrophy is the strongest predictor of disability severity and progression independent of relapses.
Cortical thinning and subcortical atrophy, especially in deep gray matter, improve prediction of MS progression.
Serum biomarkers like sNfL and sGFAP provide independent and complementary information for outcome prediction.
Abstract
The heterogeneity of multiple sclerosis (MS) pathology calls for robust biomarkers to predict disability and progression, particularly progression independent of relapse activity (PIRA). Here, we aimed to identify the most informative MRI and serum biomarkers for predicting clinical outcomes in people with MS (pwMS), including disability severity, cognitive impairment, disease phenotype, and risk of PIRA. We applied a machine learning–based feature selection approach to cross‐sectional and longitudinal data from two independent pwMS cohorts. Cohort 1 (n = 120) included 57 MRI biomarkers, incorporating advanced quantitative MRI (qMRI). Cohort 2 (n = 279) included 35 MRI biomarkers derived from conventional MRI. Both cohorts obtained serum neurofilament light chain (sNfL) and glial fibrillary acidic protein (sGFAP) measurements. Spinal cord atrophy consistently emerged as the strongest…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3
FIGURE 4
FIGURE 5
FIGURE 6| Cohort 1 | Cohort 2 | |
|---|---|---|
| Participants, No. | 120 | 279 |
| Females, No. (%) | 70 (58.3) | 200 (71.7) |
| Age, mean (SD), years | 47.7 (13.6) | 45.6 (11.6) |
| Disease duration, median [IQR], years | 12.4 [4.2; 24.5] | 10.2 [6.0; 17.8] |
| Disease phenotype: | ||
| ‐ RRMS | 89 (74.2) | 259 (92.8) |
| ‐ SPMS | 20 (16.7) | 10 (3.6) |
| ‐ PPMS | 11 (9.2) | 10 (3.6) |
|
DMT: ‐ Platform treatments, No. (%) |
5 (4.2) |
26 (9.3) |
| ‐ Oral treatments, No. (%) | 40 (33.3) | 160 (57.3) |
| ‐ Monoclonal antibody treatments, No. (%) | 57 (47.5) | 47 (16.8) |
| ‐ Untreated, No. (%) | 18 (15.0) | 46 (16.5) |
| EDSS, median [IQR] | 3.0 [2.0; 5.0] | 2.0 [1.5; 3.25] |
| sNfL Z‐score, median [IQR] | 0.39 [‐0.36; 1.08] | 0.23 [‐0.63; 0.99] |
| sGFAP Z‐ascore, median [IQR] | 0.14 [‐0.71; 1.00] | 0.00 [‐0.71; 0.60] |
| T2LV, median [IQR], mL | 7.2 [2.8; 16.3] | 5.6 [2.2; 14.1] |
| Cortical lesion count, median [IQR] | 2 [0; 5.5] | / |
| PRL count, median [IQR] | 3 [0.3; 6.0] | / |
| C1‐C4 lesion count, median [IQR] | 1 [0; 2.3] | 3 [3; 5] |
| Normalized brain volume, median [IQR] | 0.70 [0.66; 0.74] | 0.69 [0.66; 0.72] |
| Normalized WM volume, median [IQR] | 0.29 [0.27; 0.30] | 0.28 [0.27; 0.30] |
| Normalized GM volume, median [IQR] | 0.40 [0.37; 0.42] | 0.38 [0.36; 0.41] |
| Normalized cortical volume, median [IQR] | 0.30 [0.27; 0.32] | 0.29 [0.20; 0.30] |
| Normalized DGM volume, median [IQR] | 0.03 [0.03; 0.04] | 0.03 [0.03; 0.04] |
| CTh, median [IQR], mm | 2.33 [2.2; 2.41] | 2.38 [2.28; 2.45] |
| C2‐C3 CSA, median [IQR], mm2 | 60.2 [53.3; 65.6] | 60.8 [55.7; 65.9] |
| Follow‐up duration, median [IQR], years | 3.5 [2.6; 4.2] | 7.3 [6.7; 7.9] |
| Patients exhibiting PIRA during follow‐up, No. (%) | 28 (23.3) | 79 (28.3) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultiple Sclerosis Research Studies · Amyotrophic Lateral Sclerosis Research · Neuroinflammation and Neurodegeneration Mechanisms
Introduction
1
Multiple sclerosis (MS) is a chronic condition characterized by demyelination, inflammation, and neurodegeneration within the central nervous system (CNS). Magnetic resonance imaging (MRI) plays a pivotal role in the diagnosis and clinical management of people with MS (pwMS), primarily by enabling the detection of white matter lesions (WMLs). However, despite the recognized importance of WML assessment in clinical practice, the correlation between WML burden and clinical symptoms is often weak – a phenomenon referred to as the clinico‐radiological paradox [1]. This discrepancy stems from several factors, encompassing the limited pathological specificity of conventional MRI, the frequent oversight of spinal cord involvement, and the insensitivity of standard imaging techniques to diffuse damage in normal‐appearing tissue [1]. Additionally, reparative mechanisms and adaptive processes largely go undetected by conventional MRI.
These limitations have driven the search for biomarkers with greater sensitivity and specificity to quantify pathological changes in pwMS. Significant progress has been made in characterizing focal damage, particularly through the development of techniques to identify cortical lesions (CLs), which are closely associated with physical and cognitive impairment, as well as disease progression [2, 3]. Advanced imaging techniques have also emerged to capture the heterogeneity of WMLs in terms of demyelination, neuroaxonal loss, smoldering inflammation, and repair. For instance, the detection of paramagnetic rim lesions (PRLs) has enabled the identification of chronic active lesions, which are characterized by persistent inflammatory activity and are linked to worse clinical outcomes [4, 5]. Furthermore, quantitative MRI (qMRI) has enriched our understanding of MS pathology by offering microstructural insights into tissue changes and providing proxy measures of demyelination, macromolecular loss, neuroaxonal degeneration, and iron dysregulation [6]. These metrics are valuable for characterizing WMLs and detecting subtle abnormalities in tissue that appears normal on conventional MRI [7]. In addition, brain volumetric measurements have also become a widely used research tool for quantifying tissue loss, serving as a surrogate marker for the overall neurodegenerative burden [8]. Improved techniques for assessing spinal cord pathology, both in terms of focal lesions and diffuse tissue damage, have similarly been developed [8]. Beyond imaging, serum biomarkers such as neurofilament light chain (sNfL) and glial fibrillary acidic protein (sGFAP) have demonstrated significant potential in capturing inflammatory and neurodegenerative processes. While sNfL primarily reflects axonal damage, sGFAP serves as a marker of astrocytic activation, and both have been shown to correlate with clinical severity and predict disease progression [9, 10, 11].
These advancements have expanded our knowledge of MS pathology and provided a broad array of potential biomarkers for use in clinical trials and practice. However, most studies to date have focused on a limited subset of these biomarkers, leaving gaps in our understanding of their relative contributions. Moreover, implementing all these biomarkers in clinical practice is impractical and could lead to redundancy. Thus, it is essential to identify the most relevant biomarkers for disease management, monitoring, and treatment decisions to optimize care for pwMS. This is particularly critical for developing biomarkers that predict disease evolution, especially in addressing the neurodegenerative component of MS, which is considered the primary driver of progression independent of relapse activity (PIRA) [12]. Tackling PIRA remains one of the most pressing unmet needs for improving long‐term outcomes in pwMS.
In this context, we conducted a cross‐sectional and longitudinal study to assess the relative contribution of a wide range of MRI biomarkers reflecting brain and spinal cord pathology, along with serum biomarkers, in explaining clinical outcomes in pwMS. Our analysis incorporated both conventional and advanced MRI metrics, capturing focal and diffuse inflammatory as well as neurodegenerative changes at macroscopic and microstructural levels. To identify the most informative biomarkers, we applied a data‐driven machine learning approach designed to robustly capture the relative importance of each feature in explaining clinical outcomes. Specifically, we aimed to determine the extent to which each biomarker contributes to explaining: (1) the severity of neurological disability, (2) the degree of cognitive impairment, (3) the clinical disease phenotype, and (4) the risk of future disease progression due to PIRA. Associations of a subset of the selected biomarkers with outcomes (1), (3), and (4) were further evaluated in an independent validation cohort.
Methods
2
Participants and Study Design
2.1
In this longitudinal, prospective observational study, we used data from two independent cohorts. Cohort 1 included both pwMS and healthy controls (HCs) who underwent advanced MRI and serum assessments at baseline, along with longitudinal clinical follow‐up. Cohort 2 included pwMS who underwent conventional MRI and serum assessments at baseline, also followed by longitudinal clinical observation.
The study was approved by the local ethics committee, and all patients provided written informed consent before study entry. An overview of the study design is provided in Figure 1.
Study design. Two independent cohorts of people with multiple sclerosis (PwMS) were analyzed. Cohort 1 included extensive multimodal MRI and serum biomarkers and was used for cross‐sectional analyses of disability and disease phenotype, as well as longitudinal analyses of time to progression independent of relapse activity (PIRA) and progression independent of relapse and MRI activity (PIRMA). Cohort 2 included a more limited set of clinically available MRI and serum measures and was used for replication and validation analyses. Cross‐sectional outcomes were assessed at baseline, while longitudinal outcomes were evaluated over follow‐up. Abbreviations: CLs, cortical lesions; CSA, cross‐sectional area; CTh, cortical thickness; EDSS, Expanded Disability Status Scale; HCs, healthy controls; ICVF, intracellular volume fraction; MTsat, magnetization transfer saturation; MWF, myelin water fraction; PIRMA, progression independent of relapse and MRI activity; PIRA, progression independent of relapse activity; PMS, progressive multiple sclerosis; PRLs, paramagnetic rim lesions; PwMS, people with multiple sclerosis; qT1, quantitative T1; QSM, quantitative susceptibility mapping; RRMS, relapsing–remitting multiple sclerosis; SC, spinal cord; SDMT, Symbol Digit Modalities Test; sGFAP, serum glial fibrillary acidic protein; sNfL, serum neurofilament light chain; T1‐WMLs, T1‐weighted white matter lesions; T2‐WMLs, T2‐weighted white matter lesions.
This study followed the STROBE reporting guideline.
Cohort 1
2.1.1
Cohort 1 included pwMS who participated in both the INsIDER study (NCT05177523) and the Swiss Multiple Sclerosis Cohort (SMSC) study [13]. Inclusion criteria for pwMS were as follows:
- Availability of an advanced brain MRI scan conducted as part of the INsIDER study. The MRI scan date served as the study baseline for each participant.
- Availability of at least two clinical follow‐up examinations, each conducted at least six months apart from one another and from the study baseline, performed prospectively as part of the SMSC study protocol.
- Diagnosis of MS according to the 2017 revised McDonald criteria [14].
- Age between 18 and 80 years.
- Absence of relevant neurological or psychiatric comorbidities.
Additionally, Cohort 1 included HCs participating in the INsIDER study, with the availability of an advanced brain MRI scan obtained using the same protocol as in pwMS.
Exclusion criteria for all participants encompassed pregnancy, contraindications to MRI, inability to provide informed consent, relapses or steroid treatment within the prior 3 months, and insufficient MRI quality. All eligible participants were enrolled.
Cohort 2
2.1.2
Cohort 2 included pwMS participating in the SMSC study, followed at the University Hospital of Basel, and meeting the following inclusion criteria:
- Availability of a 3‐Tesla (3T) brain MRI scan conducted as part of the SMSC study. The date of the first available 3T MRI scan served as the study baseline for each participant.
- Availability of at least two clinical follow‐up examinations, each conducted at least six months apart from one another and from the study baseline, performed prospectively as part of the SMSC study protocol.
- Diagnosis of MS according to the 2017 revised McDonald criteria [14].
- Age between 18 and 80 years.
- Absence of relevant neurological or psychiatric comorbidities.
Exclusion criteria were identical to those applied in Cohort 1.
Clinical Data
2.2
All participants in both cohorts underwent regular clinical evaluations, conducted at least annually, as part of the SMSC study [13]. Standardized assessments included the calculation of the Expanded Disability Status Scale (EDSS) score (https://www.neurostatus.net/)) by certified raters.
The occurrence of PIRA during follow‐up was defined as an increase in EDSS score (≥1.5, ≥1.0, or ≥0.5 points if baseline EDSS was 0, 1.0–5.5, or >5.5, respectively) using a roving baseline [15], confirmed after at least 6 months, in the absence of relapses between the EDSS increase and the preceding reference visit (conducted ≥90 days before the EDSS increase) and between the EDSS increase and its confirmation [16]. PIRA events were identified using the msprog package [17].
In Cohort 1, cognitive performance was assessed using the oral version of the Symbol Digit Modalities Test (SDMT) in a subset of 97 pwMS and 100 HCs, with raw SDMT scores converted to z‐scores based on normative data [18].
MRI Acquisition and Analysis
2.3
For Cohort 1, advanced brain MRI examinations were conducted using a standardized acquisition protocol on a 3T whole‐body MR system (Magnetom Prisma, Siemens Healthineers), using a 64‐channel phased‐array head and neck coil for radiofrequency reception. The protocol included: (1) 3D fluid‐attenuated inversion recovery (FLAIR), (2) 3D magnetization‐prepared 2 rapid gradient‐echo (MP2RAGE), (3) multi‐shell diffusion, (4) 3D echo planar imaging (EPI), (5) three 3D radiofrequency spoiled gradient echo acquisitions used to obtain magnetization transfer saturation (MTsat) maps [19], and (6) fast acquisition with spiral trajectory and adiabatic T2‐prep (FAST‐T2). FLAIR and MP2RAGE images covered the spinal cord from C1 to C4.
The standardized brain MRI protocol for Cohort 2 included: (1) 3D FLAIR, and (2) 3D magnetization‐prepared rapid gradient‐echo (MPRAGE), both covering the spinal cord from C1 to C4. Further protocol details are provided in Tables S1 and S2.
Due to differences in MRI acquisition protocols, only a subset of MRI biomarkers could be quantified in Cohort 2 (Figure 1).
Lesion Segmentation
2.3.1
T2‐hyperintense white matter lesions (T2‐WMLs) were segmented using a deep learning‐based tool [20] and manually refined. Cortical lesions (CLs) were manually segmented on MP2RAGE by two expert raters [21]. T1‐hypointense WMLs (T1‐WMLs) were automatically segmented on MP2RAGE/MPRAGE using SAMSEG (v7.2.0) [22]. PRLs, defined as discrete FLAIR‐hyperintense lesions either completely or partially encircled by a rim of paramagnetic signal, visible in at least one contrast between unwrapped phase and quantitative susceptibility mapping (QSM) [23], were manually detected by two expert raters. Spinal cord lesions in the C1‐C4 tract were manually identified on FLAIR and MP2RAGE/MPRAGE images.
Brain Morphometry
2.3.2
Segmentation of brain structures and calculation of global/regional brain volumes and cortical thickness (CTh) were performed with FreeSurfer (v.6.0.0; http://surfer.nmr.mgh.harvard.edu/), with lesion‐filled MP2RAGE/MPRAGE images as input. Results were manually checked and edited as needed. Brain volumes were normalized by the total intracranial volume (TIV) obtained with SAMSEG [24].
Spinal Cord Morphometry
2.3.3
The mean cross‐sectional area (CSA) at C1, C2, C3, and C4 was measured using Spinal Cord Toolbox [25] (v.5.3.0) on MP2RAGE/MPRAGE images. Manual labelling of intervertebral discs was performed to ensure accuracy, and outputs were manually checked.
qMRI Analysis
2.3.4
We analyzed 5 different qMRI contrasts providing insights into: (1) micro/macrostructural integrity (quantitative T1 [qT1]), (2) macromolecular content (MTsat), (3) myelin content (myelin water fraction [MWF]), (4) axon and dendrite density (intracellular volume fraction [ICVF]), and (5) iron and myelin content (QSM) [6]. qT1 maps were derived from MP2RAGE images [26]. MTsat maps were reconstructed using magnetization transfer‐weighted, proton density‐weighted, and T1‐weighted images [19]. MWF maps were generated from FAST‐T2 data [27]. ICVF values were estimated using Neurite Orientation Dispersion and Density Imaging (NODDI) applied to diffusion images [28], after denoising and correction for motion, geometric distortions, and eddy‐currents. QSM maps were reconstructed from 3D‐EPI images using the morphology‐enabled dipole inversion (MEDI) algorithm [29].
Mean qMRI values were extracted within WMLs, normal‐appearing white matter (NAWM), normal‐appearing cortical and deep gray matter (GM), and the normal‐appearing thalamus.
Longitudinal WML Changes
2.3.5
All longitudinal conventional MRI scans acquired as part of the SMSC study during the clinical follow‐up were analyzed to identify new or enlarging T2‐WMLs. These lesions were automatically detected [30] and manually verified to ensure accuracy. The identification of new or enlarging WMLs was instrumental in determining who among pwMS experienced progression independent of relapse and MRI activity (PIRMA), defined as PIRA occurring in the absence of MRI‐detected inflammatory activity [12, 16]. The protocol details for conventional MRI scans are provided in Table S2.
Neurofilament Light Chain
2.4
sNfL and sGFAP levels were quantified using the ultrasensitive single molecule array (Simoa) technology (Quanterix), following the manufacturer's protocol [10]. Z‐scores were calculated using normative reference data, accounting for age and body mass index (BMI) for sNfL, and for age, BMI, and sex for sGFAP [31].
Statistical Analysis
2.5
All statistical analyses were conducted using R (version 4.3.1), with statistical significance set at p<0.05.
The Boruta algorithm (configured with 10,000 trees, 2,000 iterations) [32, 33], an extension of the random forest algorithm, was employed to evaluate the contribution of each biomarker in explaining neurological disability, cognitive performance, disease phenotype, and occurrence of PIRA. The Boruta algorithm is a feature selection method that compares the importance of original variables against randomly permuted “shadow” variables. By iteratively fitting random forest models, it evaluates feature importance by comparing it to the highest importance achieved by the shadow variables.
In Cohort 1, we included as predictors in the Boruta models clinical and demographic factors, brain and spinal cord volumetric measurements, qMRI metrics, measures of lesion burden, and serum biomarkers. A comprehensive list of the 59 included predictors is available in Table S3. To account for the linear dependence of brain and spinal cord volumetric measurements and qMRI metrics on age and sex, these variables were adjusted by calculating their deviation from the HC population. Linear regressions were fitted within the HC population with volumetric or qMRI metrics as the dependent variables and age and sex as explanatory variables. The residuals from these regressions in pwMS, representing deviations from expected values, were used as input variables in the Boruta models [34]. Age and sex were also included as predictors in the models to minimize potential remaining confounding effects.
Separate Boruta models were applied to the following dependent variables: (1) EDSS, analyzed both as a continuous variable and dichotomized based on a cut‐off of either ≥3.0 or ≥6.0; (2) SDMT z‐score; (3) disease phenotype (PMS vs. RRMS); and (4) PIRA incidence. For PIRA, we used a survival model to analyze time‐to‐event data and additionally assessed PIRA as a binary outcome. To account for potential baseline differences in demographic and clinical variables between patients with and without PIRA, 1:1 nearest‐neighbor propensity score matching was conducted. The matching criteria included age, sex, disease duration, EDSS, disease‐modifying treatment (DMT) class, disease phenotype, and follow‐up duration.
The predictive performance of variables selected by the Boruta models was assessed using random forest models with tenfold cross‐validation. Performance metrics included mean R^2^ for regression tasks, mean area under the receiver operating characteristic curve (AUC) for classification tasks, and mean concordance index (C‐index) for survival analysis.
Sensitivity and complementary analyses were conducted to address methodological limitations, improve interpretability, and further characterize model performance. Because variable‐importance estimates in the Boruta model may be affected by strong collinearity among predictors, we conducted sensitivity analyses using conditional Boruta models implemented with conditional random forests (cforest, party R package), which estimate conditional variable importance while accounting for correlations between predictors. To further evaluate the temporal stability of PIRA prediction, model performance was additionally quantified using time‐dependent AUCs calculated at multiple follow‐up time intervals. To improve interpretability of the PIRA prediction models, we explored post hoc explanation approaches based on Shapley additive explanations, including SHAP dependence plots, which describe both the relative importance and the directionality of individual predictors in the random forest models. In parallel, we fitted linear survival models using ridge‐penalized Cox regression as a complementary, more interpretable modeling framework.
Additional sensitivity analyses included: (1) recalculating SDMT z‐scores based on the HC population rather than normative data, (2) including EDSS as an additional explanatory variable in the model predicting time to PIRA, (3) including DMT class as an additional explanatory variable in the model predicting time to PIRA, (4) analyzing time to PIRA specifically within the RRMS patient group, and (5) investigating time to PIRMA.
Analogous Boruta models, implemented with the same hyperparameters, were applied to the validation cohort (Cohort 2) to assess the relative importance of biomarkers. In this cohort, data were available for a subset of 37 out of the 59 biomarkers used in Cohort 1, as detailed in Table S4. Due to the absence of a paired HC cohort, biomarkers were entered into the models using their original values; all models included age, sex, and disease duration as covariates. As in Cohort 1, outcomes of interest included cross‐sectional EDSS and disease phenotype at baseline, as well as prediction of future time to PIRA and discrimination between propensity score‐matched pwMS with and without PIRA during follow‐up. Matching criteria were identical to those applied in Cohort 1. To assess clinical applicability, we derived a parsimonious risk score using ridge regression based on clinically accessible biomarkers identified as predictors of PIRA by the Boruta analysis in Cohort 2; this score was subsequently validated in Cohort 1.
Additional methodological details are available in Supporting Methods.
Results
3
Cohort 1
3.1
A total of 120 pwMS and 105 HCs (55% female; mean age: 37.8 (SD: 13.0) years) were included in Cohort 1. Key characteristics of pwMS are provided in Table 1. The median follow‐up time was 3.5 years (IQR: 2.6; 4.2). Univariate associations between variables used in the Boruta models and outcome measures (i.e. EDSS, SDMT, disease phenotype, and time to PIRA) are visually represented in Figure 2.
Univariate associations in Cohort 1. Abbreviations: CC, corpus callosum; CI, confidence interval; CL, cortical lesion; CSA, cross‐sectional area; CTh, cortical thickness; DGM, deep gray matter; EDSS, Expanded Disability Status Scale; GM, gray matter; HR, hazard ratio; ICVF, intracellular volume fraction; MVF, myelin volume fraction from magnetization transfer saturation; MWF, myelin water fraction; NAWM, normal‐appearing white matter; PIRA, progression independent of relapse activity; PRL, paramagnetic rim lesions; qMRI, quantitative MRI; QSM, quantitative susceptibility mapping; qT1, quantitative T1; RMS, relapsing multiple sclerosis; SC, spinal cord; SDMT, Symbol Digit Modalities Test; sGFAP, serum glial fibrillary acidic protein; sNfL, serum neurofilament light chain; WM, white matter; WML, white matter lesions.
EDSS Prediction
3.1.1
Twenty‐four variables were identified by the Boruta model as significant predictors of the EDSS score, with age and disease duration being the strongest. Among biomarkers, spinal cord volumetric measurements emerged as the most critical, with C2, C1, C3, and C4 CSA contributing the most to the prediction model. Additional selected biomarkers included brain volumetric measurements, T1‐ and T2‐WML burden, spinal cord and CL burden, and qMRI metrics in both lesions and normal‐appearing GM (Figure 3). The model incorporating these predictors achieved a mean R^2^ of 0.55.
Selected predictors of the EDSS, SDMT, and disease phenotype in Cohort 1. Abbreviations: CC, corpus callosum; CL, cortical lesion; CSA, cross‐sectional area; CTh, cortical thickness; DGM, deep gray matter; GM, gray matter; ICVF, intracellular volume fraction; MVF, myelin volume fraction from magnetization transfer saturation; MWF, myelin water fraction; NAWM, normal‐appearing white matter; PRL, paramagnetic rim lesions; qMRI, quantitative MRI; QSM, quantitative susceptibility mapping; qT1, quantitative T1; RMS, relapsing multiple sclerosis; SC, spinal cord; sGFAP, serum glial fibrillary acidic protein; sNfL, serum neurofilament light chain; WM, white matter; WML, white matter lesions.
The most relevant biomarkers for distinguishing patients with an EDSS score ≥3.0 (n = 69) were thalamic qT1 and C2 CSA (model mean AUC = 0.91), while for differentiating those with an EDSS score ≥6.0 (n = 24), the key biomarkers were C2 and C1 CSA (model mean AUC = 0.93) (Figure S1).
SDMT Prediction
3.1.2
Seven variables emerged as significant predictors of the SDMT z‐score. These included, in order of importance, brain volumetric measurements (WM, thalamic, total brain, hippocampal, and DGM volumes), T1‐WML burden, and DGM qT1 (Figure 3). The model incorporating these predictors achieved a mean R^2^ of 0.31. In the sensitivity analysis, similar results, confirming the same volumetric variables, were observed when SDMT z‐scores were recalculated based on the HC population rather than normative data (mean R^2^ = 0.25) (Figure S2).
Prediction of the Disease Phenotype
3.1.3
Sixteen variables significantly contributed to differentiating between PMS and RRMS, with age being the strongest predictor, followed by C2 CSA. Other selected biomarkers included T1‐WML burden, C4, C3, and C1 CSA, qMRI metrics in both WMLs and normal‐appearing tissue, and spinal cord lesion burden (Figure 3). The model incorporating these predictors achieved a mean AUC of 0.92. When EDSS was included as an additional predictor, it emerged as the strongest determinant, while the other selected biomarkers were largely consistent with the main analysis (mean AUC = 1.00) (Figure S3).
Prediction of Time to PIRA
3.1.4
During the follow‐up period, 28 pwMS experienced PIRA and 26 pwMS experienced PIRMA. Among them, 16 patients with PIRA and 15 patients with PIRMA had an RRMS course. Only one patient experienced relapse‐associated worsening.
Seven variables emerged as significant predictors of time to PIRA. These included, in order of importance, CTh in the temporal lobe, volumetric measurements of the ventricles, caudate, DGM and thalamus, and C3 and C4 CSA (Figure 4). The selected variables yielded a mean C‐index of 0.67. Results remained consistent when including EDSS as an additional predictor (mean C‐index = 0.70).
Selected predictors of PIRA in Cohort 1. Abbreviations: CC, corpus callosum; CL, cortical lesion; CSA, cross‐sectional area; CTh, cortical thickness; DGM, deep gray matter; GM, gray matter; ICVF, intracellular volume fraction; MVF, myelin volume fraction from magnetization transfer saturation; MWF, myelin water fraction; NAWM, normal‐appearing white matter; PIRA, progression independent of relapse activity; PRL, paramagnetic rim lesions; qMRI, quantitative MRI; QSM, quantitative susceptibility mapping; qT1, quantitative T1; RMS, relapsing multiple sclerosis; SC, spinal cord; sGFAP, serum glial fibrillary acidic protein; sNfL, serum neurofilament light chain; WM, white matter; WML, white matter lesions.
When predicting PIRMA events, five variables were selected: CTh in the temporal lobe, DGM volume, C1 CSA, ventricular volume, and thalamic volume (mean C‐index = 0.70) (Figure S4).
In the RRMS subgroup, CTh in the temporal lobe was the only significant predictor of both PIRA (mean C‐index = 0.64) and PIRMA (mean C‐index = 0.69) (Figure S5).
In the propensity score‐matched analysis, six variables were identified as discriminators between patients with and without PIRA, including measures of cortical damage (CL volume, CL count, and cortical qT1), neuroaxonal density within WMLs (ICVF), spinal cord C4 CSA, and sNfL levels (model mean AUC = 0.80) (Figure 4). Group characteristics before and after propensity score‐matching are reported in Table S5.
In SHAP dependence plots aimed at exploring the relationship between individual predictors and model‐predicted PIRA risk, lower DGM volumes, reduced cervical CSA, and increased ventricular volume were associated with higher model‐predicted PIRA risk. Temporal CTh displayed a non‐linear association, with lower thickness corresponding to higher predicted PIRA risk (Figure S6). The ridge Cox model confirmed the direction of associations observed in the SHAP analyses (Figure S7).
Validation Cohort
3.2
A total of 279 pwMS were included in Cohort 2; key characteristics are reported in Table 1. The median follow‐up time was 7.3 years (IQR: 6.7; 7.9).
Eighteen variables were identified as significant predictors of the EDSS score. As in Cohort 1, spinal cord volumetric measurements – particularly CSA at all cervical levels – were among the strongest contributors. Additional selected biomarkers included T2‐ and T1‐WML volume, sGFAP levels, and several brain volumetric measures, with thalamic volume again emerging as the most relevant (Figure 5). The model achieved a mean R^2^ of 0.34. Models discriminating pwMS based on EDSS cut‐offs of ≥3.0 and ≥6.0 achieved mean AUCs of 0.79 and 0.92, respectively (Figure S8).
Selected predictors of the EDSS and disease phenotype in Cohort 2. Abbreviations: CC, corpus callosum; CI, confidence interval; CL, cortical lesion; CSA, cross‐sectional area; CTh, cortical thickness; DGM, deep gray matter; GM, gray matter; SC, spinal cord; sGFAP, serum glial fibrillary acidic protein; sNfL, serum neurofilament light chain; WM, white matter; WML, white matter lesions.
Eighteen variables also significantly contributed to distinguishing between RRMS and PMS. As in Cohort 1, spinal cord volumetric measures were the most important, alongside T1‐ and T2‐WML volume, sGFAP values, and multiple brain volumetric metrics (model mean AUC = 0.87) (Figure 5).
During follow‐up, 79 pwMS experienced PIRA. Seven variables were selected as predictors of time to PIRA. These included CSA at the C1, C2, and C3 levels, total brain volume, pallidum volume, and cingulate CTh (model mean C‐index: 0.70) (Figure 6).
Selected predictors of PIRA in Cohort 2. Abbreviations: CC, corpus callosum; CI, confidence interval; CL, cortical lesion; CSA, cross‐sectional area; CTh, cortical thickness; DGM, deep gray matter; GM, gray matter; SC, spinal cord; sGFAP, serum glial fibrillary acidic protein; sNfL, serum neurofilament light chain; WM, white matter; WML, white matter lesions.
In the RRMS subgroup, the top predictors of time to PIRA, in order of importance, were C3 CSA, pallidum volume, thalamic volume, mean CTh, and caudate volume (model mean C‐index: 0.76) (Figure S9).
Fifty‐six out of the 79 pwMS with PIRA fulfilled criteria for PIRMA. The same five variables – C3 CSA, pallidum volume, thalamic volume, mean CTh, and caudate volume – were selected as predictors of time to PIRMA (model mean C‐index: 0.81) (Figure S10).
In the propensity score‐matched analysis, six variables emerged as discriminators between patients with and without PIRA. The most relevant was cingulate CTh, followed by C1 CSA, sGFAP levels, total brain volume, thalamic volume, and mean CTh (model mean AUC = 0.67) (Figure 6).
Conditional Boruta in both cohorts produced findings that were overall comparable with the main analyses, although with a more parsimonious set of selected predictors (Figures S11–S14).
Derivation and Validation of a Clinically Accessible PIRA Risk Score
3.3
To explore the feasibility of deriving a clinically applicable risk score for PIRA based on widely available MRI biomarkers, we developed a parsimonious prognostic score using ridge regression. From the subset of predictors identified by the Boruta analysis in Cohort 2, we selected variables with broad clinical availability, resulting in a model including age, C1 CSA, normalized total brain volume, and normalized thalamic volume. This score was derived in Cohort 2. In this cohort, the risk score was strongly associated with time to PIRA (hazard ratio [HR] 3.30, 95% CI 1.99–5.62; p <0.0001) and achieved a C‐index of 0.64. Time‐dependent AUCs calculated between 2 and 7 years were all statistically significant, ranging from 0.81 at 3 years to 0.61 at 7 years (Table S6).
When applied to Cohort 1, the risk score showed consistent prognostic performance, with an HR of 3.60 (95% CI 1.57–8.25; p = 0.002) and a C‐index of 0.67 (Figure S15). Time‐dependent AUCs between 2 and 4 years were all significant, with values ranging from 0.68 at 2 years to 0.73 at 3 years (Table S7).
Discussion
4
In this prospective cross‐sectional and longitudinal study, we assessed the relative importance of conventional and advanced MRI biomarkers of brain and spinal cord damage, alongside serum biomarkers, in predicting key clinical outcomes in pwMS. A machine learning approach was employed to identify the variables most significantly contributing to clinical outcomes and to rank their importance. Our analysis suggested several predictors of neurological disability, cognitive impairment, as well as future disability progression, including spinal cord and brain volumetric metrics, qMRI parameters, and measures of lesion burden. Among the evaluated biomarkers, spinal cord atrophy emerged as a relevant determinant of disability severity and a distinguishing feature of patients with progressive disease. Cortical damage, encompassing both focal lesion burden and diffuse neurodegeneration, contributed to predicting the risk of PIRA. Notably, PIRA was also linked to markers of spinal cord and deep gray matter damage, in addition to sNfL levels. The reproducibility of these findings was broadly consistent in an independent validation cohort of 279 pwMS with a longer follow‐up. In this second cohort, spinal cord CSA measurements at all cervical levels again emerged as top predictors of EDSS, reinforcing their central role in determining physical disability. Thalamic volume, lesion burden, and sGFAP levels also contributed meaningfully to model performance. Consistent predictors of time to PIRA included spinal cord atrophy, deep gray matter volume, and cortical thickness, particularly in the cingulate cortex. These findings support the relevance of the selected biomarkers in capturing both cross‐sectional disability and future disease progression.
Pathological changes in the spinal cord are a hallmark of MS, often manifesting early in the disease course and becoming particularly pronounced in PMS [8]. These changes include both focal demyelinating lesions and diffuse degeneration leading to atrophy. Spinal cord lesions are associated with an increased risk of disease progression [35], while spinal cord atrophy has been consistently linked to clinical disability in several studies [8, 36]. Atrophy reflects a complex interplay of microstructural changes, including demyelination, neuroaxonal loss, and gliosis [37], and progresses partially independently of both spinal cord lesions and brain pathology [38]. Importantly, spinal cord atrophy has been shown to predict future clinical worsening [39], including PIRA [23, 40]. In this study, spinal cord atrophy emerged as the most significant marker of disability severity, with CSA at the four rostral cervical cord levels identified as primary predictors of the EDSS score. Among these, C2 CSA was the most informative biomarker for distinguishing PMS from RRMS. Furthermore, spinal cord CSA predicted disease progression due to PIRA. Notably, the influence of spinal cord CSA on disability severity and progression was independent of spinal cord lesion burden, which played a comparatively minor role. Collectively, these findings underscore spinal cord degeneration as a key determinant of MS severity. The validation cohort further reinforced these conclusions, with CSA at C1–C3 levels among the most important predictors of both overall disability and time to PIRA. Notably, C3 CSA was the strongest predictor of progression in the RRMS subgroup, supporting its robustness as a marker across disease phenotypes. Remarkably, these findings are noteworthy given that spinal cord CSA was derived directly from brain MRI scans. Although coverage was limited to the upper cervical cord and no specific spinal cord imaging protocol was available, the results indicate that meaningful and clinically relevant information can be obtained from cervical spinal cord segments that are commonly included in standard brain MRI examinations.
Measures of cortical damage also emerged as crucial predictors of clinical outcomes. Cortical degeneration, encompassing both focal lesions and diffuse atrophy reflected by volume loss and thinning, is a critical yet often underappreciated manifestation of MS pathology. Focal CLs, long overlooked due to imaging limitations, are now recognized as significant contributors to disease progression, linked to inflammation, demyelination, and neuronal loss [2, 41]. Cortical atrophy correlates with cognitive decline and physical disability, progressing steadily and most prominently in progressive disease forms [8, 42]. This degeneration arises from a complex interplay of primary neurodegenerative processes and secondary effects of white matter damage [43], occurring in non‐random clinically‐relevant anatomical patterns [44]. Advances in imaging have enhanced the detection of cortical abnormalities, including both lesions and atrophy, providing valuable insights into their role as predictors of long‐term disability and their potential as therapeutic targets [7]. In our study, while cortical damage was significantly associated with disease severity, its contribution was relatively lower than other MRI‐based metrics. However, cortical damage was pivotal in predicting PIRA. Remarkably, CTh in the temporal lobe emerged as the strongest predictor of time to PIRA in Cohort 1, even in sensitivity analyses restricted to PIRMA events, and restricted to PIRA events that occurred in patients with a classical RRMS disease course. Cerebral cortex has previously been shown to exhibit accelerated thinning in patients experiencing PIRA as compared to stable patients [42]. Interestingly, our findings also suggest a potential connection with results from a large longitudinal study, which identified accelerated GM atrophy in the temporal lobe as the only volumetric measurement distinguishing patients with secondary progressive MS from those with RRMS [45]. In line with these findings, the validation cohort confirmed the predictive value of CTh – particularly cingulate and mean CTh – for PIRA and PIRMA.
Propensity score‐matched analysis in Cohort 1 further underscored the importance of focal cortical degeneration in explaining disability accumulation due to PIRA. CLs showed the highest importance in distinguishing patients with PIRA from stable patients, with both CL volume and count being significant predictors. Interestingly, a previous study has likewise linked CL burden to the risk of conversion to secondary progressive MS [2]. Additionally, mean qT1 values in the normal‐appearing cerebral cortex significantly contributed to the classification, highlighting their relevance as a measure sensitive to tissue micro‐ and macrostructural integrity, influenced by myelin, water, and iron content [6].
As expected, the biomarkers most associated with performance on the SDMT differed from those predicting physical disability. Subcortical volumetric measurements – including white matter, thalamus, hippocampus, and deep GM – emerged as the most relevant predictors. These findings align with prior studies emphasizing the importance of subcortical structures in explaining cognitive performance, particularly attention and information processing speed in pwMS [46, 47]. The relevance of subcortical degeneration was further supported by the importance of qT1 values in the deep GM as a measure of tissue degeneration and loss of structural integrity.
The results of this study highlighted the complementary roles of brain and spinal cord volumetric measurements alongside measures of focal lesion burden. While CL load and T1‐hypointense WMLs were significant predictors of the clinical outcomes under consideration, PRLs showed predictive value in univariate analyses but did not provide additional information in multivariate models. This finding may reflect the enhanced insights offered by advanced qMRI metrics, which capture WML microstructural changes with greater sensitivity. Alternatively, it could be partially attributable to the limited statistical power of the study, underscoring the need for further investigation with larger cohorts. Notably, qMRI metrics of normal‐appearing tissue also significantly contributed to predicting disease severity and progression. Additionally, sNfL and sGFAP levels were useful in predicting PIRA. Elevated sNfL levels, a biomarker of neuroaxonal injury, are valuable for detecting acute and chronic neuronal damage. Existing literature has also shown the potential value of sNfL in predicting PIRA [10, 16, 48]. Given its increasing availability in clinical settings, sNfL is a promising biomarker; however, previous studies have highlighted sGFAP as a complementary marker that may offer greater specificity for smoldering disease activity and lower sensitivity to acute inflammation [10, 49]. Indeed, in the validation cohort, sGFAP emerged as a significant predictor not only of disability severity and disease phenotype but also of PIRA occurrence.
In this study, we identified several predictors of PIRA, predominantly reflecting cortical degeneration, cervical spinal cord atrophy, and deep gray matter damage. These predictors yielded overall moderate performance in time‐to‐event models, with C‐indices ranging from 0.64 to 0.81 across analyses. This finding is consistent with the known complexity of predicting an outcome such as PIRA, which is intrinsically challenging. Silent progression is influenced by multiple, partially overlapping biological and clinical mechanisms. In addition, its identification and the determination of its onset are technically challenging due to reliance on EDSS‐based definitions, measurement noise, variability in follow‐up intervals, and the influence of confounding factors such as aging and comorbidities. Together, these factors likely contribute to the moderate discriminative performance observed, which is nonetheless comparable to – or in some cases higher than – previous investigations [50, 51, 52, 53, 54]. Importantly, a prediction score derived from a limited set of clinically accessible biomarkers achieved consistent performance and was corroborated across the two cohorts, suggesting that meaningful risk stratification is achievable to a certain extent. Such an approach may be particularly valuable for research applications and may also support clinical monitoring by facilitating the identification of individuals at potentially higher risk of silent progression.
A key strength of this study is the simultaneous evaluation of a diverse array of MRI and serum biomarkers, providing a comprehensive assessment of their relative contributions to clinical outcomes. The machine learning approach employed for feature selection reduced noise and overfitting while maintaining model interpretability, enabling the handling of complex, non‐linear relationships. Models based on the predictors selected by Boruta showed high performance for disability classification and moderate performance for PIRA prediction, in line with the inherent complexity of this outcome. Importantly, results from the second cohort corroborated these overall patterns.
Nevertheless, the study has some limitations. First, PIRA was assessed exclusively through EDSS, which may underestimate subtle progression with still significant clinical impact. Second, the absence of spinal cord regional damage assessment is a limitation, especially considering recent studies showing that spinal cord GM atrophy outperforms whole CSA in explaining disability [33, 55]. Third, although the inclusion of a validation cohort strengthens the robustness of our findings, several differences between cohorts should be acknowledged. The cohorts differed in follow‐up duration – an important factor when studying outcomes such as PIRA – and in baseline clinical characteristics, with Cohort 1 including a higher proportion of progressive MS and higher baseline EDSS. These differences may partly explain discrepancies in the specific biomarkers selected across cohorts. In addition, fewer biomarkers were available in Cohort 2, and the absence of a matched healthy control cohort precluded adjustment of imaging biomarkers for physiological confounders using deviation‐from‐normal approaches. Despite these differences, we deliberately included the second cohort to enhance generalizability, acknowledging that cohorts with an availability of advanced MRI metrics as extensive as Cohort 1 are exceptionally rare. Importantly, both cohorts consistently highlighted the relevance of cortical and spinal cord damage, and predictors of PIRA identified in Cohort 2 generalized well when applied to Cohort 1 for prognostication, supporting the robustness of the identified biomarkers across heterogeneous populations. Fourth, the disproportionate inclusion of a broad set of MRI‐derived biomarkers compared to only two serum biomarkers (sNfL and sGFAP) may have introduced a bias toward the selection of MRI features in the variable importance ranking. Although serum biomarkers were associated with PIRA, the study was not powered to determine whether their addition to quantitative MRI metrics yields a statistically robust improvement in predictive performance, and further studies specifically designed to address this question will be required. Finally, despite the inclusion of two independent cohorts and a relatively long follow‐up, it should be acknowledged that PIRA occurred only in a subset of patients, representing a common challenge in PIRA research. Consequently, the absolute number of PIRA events was limited, which may affect the generalizability of the findings. Nevertheless, the consistency of results across cohorts supports the robustness of the observed associations, while further confirmation in larger populations and additional settings will be important to refine and validate these predictive approaches.
Despite these limitations, the findings are of potential clinical relevance. They highlight the crucial role of spinal cord and cortical degeneration as prognostic markers in MS, underscoring the potential value of their inclusion in clinical evaluations to bridge the gap between conventional MRI biomarkers and measures of disease severity and progression. These insights potentially pave the way for more refined and targeted approaches to understanding and managing MS.
Funding
No specific funding was received for this work.
Conflicts of Interest
No specific conflict of interest related to this work.
Supporting information
Supporting File: advs73801‐sup‐0001‐SuppMat.docx.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1F. Barkhof , “The Clinico‐radiological Paradox in Multiple Sclerosis Revisited,” Current Opinion in Neurology 15, no. 3 (2002): 239–245, 10.1097/00019052-200206000-00003.12045719 · doi ↗ · pubmed ↗
- 2A. Scalfari , C. Romualdi , R. S. Nicholas , et al., “The Cortical Damage, Early Relapses, and Onset of the Progressive Phase in Multiple Sclerosis,” Neurology 90, no. 24 (2018): e 2099–e 2106, 10.1212/WNL.0000000000005685.29769373 · doi ↗ · pubmed ↗
- 3M. Calabrese , M. Filippi , and P. Gallo , “Cortical Lesions in Multiple Sclerosis,” Nature Reviews Neurology 6, no. 8 (2010): 438–444, 10.1038/nrneurol.2010.93.20625376 · doi ↗ · pubmed ↗
- 4F. Bagnato , P. Sati , C. C. Hemond , et al., “Imaging Chronic Active Lesions in Multiple Sclerosis: A Consensus Statement,” Brain 147, no. 9 (2024): 2913–2933, 10.1093/BRAIN/AWAE 013.38226694 PMC 11370808 · doi ↗ · pubmed ↗
- 5M. Absinta , P. Sati , F. Masuzzo , et al., “Association of Chronic Active Multiple Sclerosis Lesions with Disability in Vivo,” JAMA Neurology 76, no. 12 (2019): 1474, 10.1001/JAMANEUROL.2019.2399.31403674 PMC 6692692 · doi ↗ · pubmed ↗
- 6C. Granziera , J. Wuerfel , F. Barkhof , et al., “Quantitative Magnetic Resonance Imaging towards Clinical Application in Multiple Sclerosis,” Brain 144, no. 5 (2021): 1296–1311, 10.1093/brain/awab 029.33970206 PMC 8219362 · doi ↗ · pubmed ↗
- 7A. Cagol , C. Tsagkas , and C. Granziera , “Advanced Brain Imaging in Central Nervous System Demyelinating Diseases,” Neuroimaging Clinics 34, no. 3 (2024): 335–357, 10.1016/J.NIC.2024.03.003.38942520 · doi ↗ · pubmed ↗
- 8J. Sastre‐Garriga , D. Pareto , M. Battaglini , et al., “MAGNIMS Consensus Recommendations on the Use of Brain and Spinal Cord Atrophy Measures in Clinical Practice,” Nature Reviews Neurology 16, no. 3 (2020): 171–182, 10.1038/s 41582-020-0314-x.32094485 PMC 7054210 · doi ↗ · pubmed ↗
