Beyond Fixed Thresholds: Cluster-Derived MRI Boundaries Improve Assessment of Crohn’s Disease Activity
Jelena Pilipovic Grubor, Sanja Stojanovic, Dijana Niciforovic, Marijana Basta Nikolic, Zoran D. Jelicic, Mirna N. Radovic, Jelena Ostojic

TL;DR
This study shows that using cluster-based MRI analysis improves the accuracy of assessing Crohn’s disease activity compared to traditional fixed thresholds.
Contribution
The study introduces unsupervised clustering of MRI features to better stratify Crohn’s disease activity.
Findings
Cluster-derived classification showed clearer separation of disease activity groups than fixed thresholds.
Wall thickness was the main factor in cluster-based separation, supported by diffusion metrics and ADC.
Clustering improved Mahalanobis distances between inactive, active, and severe disease categories.
Abstract
Background/Objectives: Crohn’s disease (CD) requires precise, noninvasive monitoring to guide therapy and support treat-to-target management. Magnetic resonance enterography (MRE), particularly diffusion-weighted imaging (DWI), is the preferred cross-sectional technique for assessing small-bowel inflammation. Indices such as the Magnetic Resonance Index of Activity (MaRIA) and its diffusion-weighted variant (DWI MaRIA) are widely used for grading disease activity. This study evaluated whether unsupervised clustering of MRI-derived features can complement these indices by providing more coherent and biologically grounded stratification of disease activity. Materials and Methods: Fifty patients with histologically confirmed CD underwent 1.5 T MRE. Of 349 bowel segments, 84 were pathological and classified using literature-based thresholds (MaRIA, DWI MaRIA) and unsupervised clustering.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —Provincial Secretariat for Higher Education and Scientific Research, Autonomous Province of Vojvodina, Republic of Serbia
- —Ministry of Science, Technological Development and Innovation
- —Faculty of Technical Sciences, University of Novi Sad
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInflammatory Bowel Disease · Diagnosis and treatment of tuberculosis · Radiomics and Machine Learning in Medical Imaging
1. Introduction
Crohn’s disease (CD) is a chronic, relapsing inflammatory bowel disease (IBD) that can affect any segment of the gastrointestinal tract, most commonly the terminal ileum. Due to its transmural nature, CD is frequently associated with complications such as abscesses, fistulas, and strictures, resulting in significant morbidity and a lifelong need for disease monitoring [1,2]. The incidence of CD continues to rise worldwide, with a peak onset in adolescence and early adulthood [2]. Given its chronic course, patients with CD often undergo multiple imaging evaluations over their lifetime. In this context, magnetic resonance enterography (MRE) has become the modality of choice, offering detailed anatomical and functional information without ionizing radiation. This is particularly important in pediatric and young adult populations, where minimizing cumulative radiation exposure is a major concern [3,4]. Diffusion-weighted imaging (DWI), a functional magnetic resonance imaging (MRI) technique based on the Brownian motion of water molecules, has been integrated into MRE protocols as a non-contrast method for detecting inflammatory changes. Active inflammation is typically associated with restricted diffusion and reduced apparent diffusion coefficient (ADC) values, whereas chronic fibrotic segments tend to show higher ADC values than those observed in active inflammation [5,6]. Several studies have demonstrated that DWI can differentiate acute from chronic inflammation, with diagnostic performance comparable or superior to contrast-enhanced imaging [7,8]. Concerns regarding gadolinium-based contrast agents (GBCAs), including potential retention in brain tissue and risks in patients with impaired renal function, have further strengthened the role of DWI as a safer alternative for repeated evaluations [9,10]. To objectively assess disease activity, semiquantitative indices, such as the Magnetic Resonance Index of Activity (MaRIA) and the DWI-based MaRIA (also known as the Clermont score; DWI MaRIA), have been developed and validated [11,12]. The MaRIA index uses the mural thickness, edema, ulceration, and relative contrast enhancement, whereas DWI MaRIA is a non-contrast variant that substitutes the enhancement term with diffusion metrics (high-b DWI signal and ADC) and includes the T2 signal-intensity (SI) ratio per the Clermont formula [12,13]. Although these indices have demonstrated strong correlations with endoscopic findings, they rely on literature-based MaRIA cut-offs derived from heterogeneous populations and varied imaging protocols. In daily practice, this may lead to inconsistent results, particularly in patients with borderline or mild disease activity [12,13,14]. Advanced data analysis methods, such as cluster analysis, may offer a useful complementary approach to refine classification of bowel segment activity. Cluster analysis is a robust, unsupervised multivariate method widely applied in biomedical research to identify natural groupings within complex datasets [15]. In IBD, clustering techniques have been used to stratify patients by disease behavior, treatment response, and prognosis [16,17], yet their application to MRI-based activity indices remains largely unexplored. By grouping bowel segments according to intrinsic imaging patterns rather than rigid thresholds, cluster-derived classifications may provide more biologically coherent and clinically meaningful separations of disease activity. The aim of the present study was therefore to compare conventional literature-based MaRIA and DWI MaRIA classifications with cluster-derived groupings in order to evaluate which approach provides a more consistent and clinically relevant categorization of CD activity.
2. Materials and Methods
Fifty consecutive patients with histologically confirmed CD were enrolled. Referrals for MRE were based on clinical or laboratory indications, either during active disease or for therapy monitoring. All participants gave written informed consent. The protocol was approved by the institutional ethics committee and conducted in accordance with the Declaration of Helsinki.
2.1. Study Population
Fifty consecutive patients with histologically confirmed CD and under regular follow-up at the Clinic for Gastroenterology and Hepatology were enrolled. Patients were referred for MRE based on clinical or laboratory indications, either during active disease or for therapeutic monitoring. All participants provided written informed consent prior to imaging. The study was approved by the institutional ethics committee and conducted in accordance with the Declaration of Helsinki. Histologic confirmation of CD was obtained by endoscopic biopsy in all 50 patients, ensuring diagnostic verification prior to MRI analysis. Patients with prior bowel surgery were excluded because postoperative adhesions, anastomoses, and mesenteric remodeling can displace bowel loops across abdominal quadrants and distort native segment boundaries, potentially compromising the predefined seven-segment map and the comparability of segment-level measurements across the cohort. Characteristics of the study population are summarized in Table 1. Inclusion criteria comprised histologically verified CD, clinical indication for MRE as part of disease management, and successful completion of MRI examination with adequate image quality for interpretation. Exclusion criteria were prior surgical resection of the bowel, contraindications to MRI such as metallic implants, pacemakers, artificial heart valves, surgical clips, or insulin pumps, known allergy to gadolinium-based contrast agents (GBCAs), severe renal impairment defined as estimated glomerular filtration rate (eGFR) <30 mL/min, first trimester of pregnancy, and severe claustrophobia.
2.2. MR Acquisition
All examinations were performed with patients in the supine position. Imaging was carried out on a 1.5 Tesla scanner (Signa HDxT, GE Healthcare, Boston, MA, USA) using an eight-channel phased-array abdominal coil. To minimize peristaltic motion, 20 mg of intravenous hyoscine butylbromide (Buscopan) was administered immediately prior to scanning. For bowel distention, patients ingested 1500 mL of a biphasic oral contrast solution consisting of 500 mL mannitol and 1000 mL water over a 40 min period prior to imaging. The imaging protocol included coronal and axial T2-weighted Fast Imaging Employing Steady-state Acquisition (FIESTA; balanced steady-state free precession, bSSFP) sequences with and without fat suppression (repetition time/echo time, TR/TE = 3.9/1.6 ms; slice thickness 6.0 mm with 1.0 mm gap; field of view, FOV = 614 × 440 mm^2^ coronal/600 × 430 mm^2^ axial; acquisition matrix = 192 × 320; number of excitations, NEX = 1; flip angle = 75°). DWI was acquired axially with b = 0 and 800 s/mm^2^ (TR/TE = 8000.0/78.6 ms, NEX = 4), and coronally with b = 0 and 1400 s/mm^2^ (TR/TE = 2000.0/71.6 ms, NEX = 4), both with geometry identical to the corresponding T2 FIESTA planes. Axial three-dimensional (3D) Liver Acquisition with Volume Acceleration (LAVA) was performed during a single breath-hold (TR/TE = 4.1/2.1 ms; slice thickness = 4.4 mm; FOV = 614 × 440 mm^2^; matrix = 320 × 160; NEX = 0.70; flip angle = 12°). Coronal multiphase FIESTA was obtained with TR/TE = 3.9/1.7 ms; 15 acquisitions; slice thickness = 6.0 mm with 1.0 mm gap; FOV = 614 × 440 mm^2^; matrix = 192 × 320; NEX = 1; flip angle = 75°. Dynamic post-contrast imaging was performed with coronal multiphase 3D LAVA with fat suppression. Acquisition included one pre-contrast and four post-contrast phases (15 s, 45 s, 70 s, 90 s) (TR/TE = 4.3/2.1 ms; slice thickness = 4.4 mm; FOV = 600 × 430 mm^2^; matrix = 320 × 160; NEX = 0.73; flip angle = 12°). Gadobutrol (Gadovist, 0.1 mL/kg; 175.25 mg gadolinium per mL, Bayer AG, Leverkusen, Germany) was injected intravenously at 1.5–2.5 mL/s, followed by 20 mL saline at the same rate, using a power injector (Optistar LE; Liebel-Flarsheim Company LLC, Cincinnati, OH, USA). Axial and coronal post-contrast 3D LAVA sequences were acquired with identical parameters as the corresponding pre-contrast sequences. Coronal DWI, multiphase coronal FIESTA, and 3D LAVA were obtained with Array Spatial Sensitivity Encoding Technique (ASSET) parallel imaging (acceleration factor = 2.0). ADC maps were calculated using a mono-exponential model.
2.3. Image Analysis and Measurements
The gastrointestinal tract was divided into seven anatomical segments per patient: jejunum, proximal ileum, terminal ileum, cecum and ascending colon, transverse colon, descending colon, and sigmoid colon with rectum, yielding 350 segments in total. To standardize segment selection, segment labels were assigned as follows: segment I (jejunum) was assumed in the left upper quadrant of the abdomen; segment II (proximal and middle ileum) in the left lower quadrant; segment III (distal and terminal ileum) in the right upper and right lower quadrants; segment IV (cecum and ascending colon, right colon); segment V (transverse colon); segment VI (descending colon, left colon); and segment VII (sigmoid colon and rectum). This seven-segment scheme reflects a published convention and represents one of several accepted segmentations used in MRE studies [11,12,18,19], selected to provide fixed anatomic rules for reproducible segment labeling across patients. The perianal region was excluded. One poorly distended pathological segment was excluded, leaving 349 segments for analysis. Wall thickness was measured on T2-weighted fat-suppressed images. Edema was assessed by the signal intensity (SI) ratio between the bowel wall and the contralateral psoas muscle. Ulceration was evaluated qualitatively. The SI of the bowel wall was measured with regions of interest (ROIs) placed over the most affected area and normalized to the SI of the psoas muscle (T2 SI ratio). ADC values were measured on coronal DWI (b = 1400 s/mm^2^) at corresponding anatomical locations, with ROIs defined on T2-weighted reference images. ROI size was adapted to segmental thickness (0.2–1.0 cm^2^). Segments were considered morphologically normal if no thickening, edema, ulceration, or pathological contrast enhancement was present, and if not adjacent to inflamed bowel. Segmental status was confirmed in at least two planes and multiple sequences. Relative contrast enhancement (RCE) was calculated from pre- and post-contrast T1-weighted values according to the formula: RCE = [(wall signal intensity (WSI) post-gadolinium − WSI pre-gadolinium)/WSI pre-gadolinium] × 100 × (SD noise pre-gadolinium/SD noise post-gadolinium). Activity indices were calculated as follows:
MaRIA = 1.5 × wall thickness (mm) + 0.02 × RCE + 5 × edema + 10 × ulceration [11].
DWI MaRIA = 1.5 × wall thickness (mm) + 3.5 × DWI signal + 1.75 × T2 SI ratio − 1.321 × ADC × 10^3^ [12].
Segments were stratified into three groups using literature-based MaRIA cut-offs: for MaRIA, ≤7 indicated inactive disease, >7 and ≤11 defined active disease, and >11 defined severe disease; for DWI MaRIA, ≤8 indicated inactive disease, >8 and ≤12.5 defined active disease, and >12.5 defined severe disease (Figure 1).
2.4. Statistical Analysis
Two complementary strategies were applied to classify bowel segments into disease activity groups. The first approach relied on literature-based MaRIA categorization using established cut-off values, while the second approach employed cluster analysis performed separately on the final composite indices (the MaRIA index and the DWI MaRIA index), in order to identify data-driven groupings. The number of clusters was prespecified as three, consistent with the expected clinical categories of inactive, active, and severe disease. Clustering was implemented separately on the final MaRIA index and on the final DWI MaRIA index after z-standardization, using one-dimensional (1D) Euclidean inter-segment distances and agglomerative hierarchical clustering with single-link (nearest-neighbor) linkage. Starting from singleton clusters, agglomeration proceeded until exactly three clusters remained (k = 3). We clustered on the composite indices rather than on individual index components to align directly with clinical cut-off usage. Separation patterns were inspected on the agglomeration schedule and dendrogram to assess chaining [20,21]. Group differences were tested for gender and age, with no significant differences observed. Normality of distribution was assessed using skewness, kurtosis, and p-values. Multivariate analysis of variance (MANOVA) was used to examine overall differences between the three disease-severity groups. When significant, one-way analysis of variance (ANOVA) was applied to individual parameters, followed by post hoc two-tailed Student’s t-tests to assess pairwise differences. For clarity of presentation, the results of t-tests are displayed within the same tables as discriminant analysis (DA). Superscript markers indicate significance levels: *^1^ denotes a significant difference compared to the lower-value group, and *^2^ indicates that the highest-value group differed significantly from both other groups. All comparisons were considered statistically significant at p < 0.05. DA was performed to determine which imaging parameters contributed most to group separation. The relative contribution of each parameter was expressed as a percentage, and group homogeneity was calculated to evaluate classification consistency. Pairwise Mahalanobis distances were computed to quantify the degree of separation between groups and to compare the effectiveness of literature-based MaRIA versus cluster-derived classifications (larger distances indicating clearer separation). Internal validation of the clustering, then assessed within-cluster homogeneity as a measure of cohesion and pairwise Mahalanobis distances between the three clusters as a measure of separation. All mathematical analyses were performed using SPSS software (version 29.0; IBM Corp., Armonk, NY, USA).
All aspects of this study, including design, data acquisition, statistical analysis, interpretation, and manuscript preparation, were conducted entirely by the authors and members of the research team without the use of generative artificial intelligence (GenAI) systems or large language models (LLMs). The authors take full responsibility for the integrity, accuracy, and originality of all data and analyses presented in this study. The work represents genuine human intellectual contribution and scientific judgment.
3. Results
Across 50 patients, 350 bowel segments were evaluated (seven per patient). After exclusion of one poorly distended pathological segment, 349 segments remained; of these, 84 met the predefined criteria for pathological involvement and entered the comparative analyses (literature-based cut-offs and data-driven clustering).
3.1. Literature-Based MaRIA Classification
Application of the literature-based MaRIA classification resulted in 5 segments being classified as inactive, 16 as active, and 63 as severe (Table 2, Table 3 and Table 4). The distribution of the examined MRI-derived variables within these groups was approximately normal, as indicated by descriptive statistics and dispersion parameters. MANOVA revealed significant overall differences between the three severity groups (p < 0.001). One-way ANOVA confirmed significant between-group differences for all examined variables (p < 0.001). Pairwise comparisons were assessed with two-tailed t-tests, with results incorporated into the discriminant-analysis table (Table 5 indicated by *^1^ and *^2^). Discriminant analysis identified DWI MaRIA, MaRIA, and ADC as the strongest contributors to intergroup differentiation, while confirming high within-group homogeneity (Table 5). Mahalanobis distances (Table 6) quantified the degree of separation between the three MaRIA literature-based groups, providing a multivariate measure of intergroup similarity and dissimilarity.
3.2. Literature-Based DWI MaRIA (Clermont) Classification
Using literature-based DWI MaRIA classification yielded 21 inactive, 14 active, and 49 severe segments. The distribution of the examined MRI-derived variables within these groups was approximately normal, as indicated by descriptive statistics and dispersion parameters (Table 7, Table 8 and Table 9). MANOVA revealed significant overall differences between the three groups (p < 0.001). One-way ANOVA confirmed significant between-group differences for all examined variables (p < 0.001). Post hoc pairwise differences were assessed using two-tailed t-tests, and the results are incorporated in the discriminant analysis table (Table 10, indicated by *^1^ and *^2^). DA highlighted DWI MaRIA, wall thickness, and ADC as the most influential contributors to group differentiation, with homogeneity exceeding 95% across groups (Table 10). Mahalanobis distances (Table 11) quantified the intergroup distances between the three DWI MaRIA literature-based groups.
3.3. Cluster-Derived MaRIA Classification
Unsupervised clustering of MaRIA-derived variables produced 22 inactive, 37 active, and 25 severe segments (Table 12, Table 13 and Table 14). The distribution of the examined MRI-derived variables within these clusters was approximately normal, as indicated by descriptive statistics and dispersion parameters. Importantly, the minimum and maximum values observed in the descriptive statistics for each cluster define new data-driven thresholds for group boundaries, representing cluster-derived cut-offs that more accurately reflect the characteristics of the analyzed cohort. MANOVA revealed significant differences between the three cluster-derived groups (p < 0.001). One-way ANOVA confirmed significant between-group differences for all examined variables (p < 0.001). Post hoc pairwise differences were assessed using two-tailed t-tests, and the results are incorporated in the discriminant analysis table (Table 15 indicated by *^1^ and *^2^). DA identified DWI MaRIA, MaRIA, and ADC as the three most important contributors to group differentiation, with high within-cluster homogeneity (Table 15). Mahalanobis distances were 9.26 for inactive vs. active, 24.22 for inactive vs. severe, and 15.27 for active vs. severe. Compared with literature-based MaRIA, the corresponding distances were larger under clustering (inactive vs. active: 9.26 vs. 2.60; inactive vs. severe: 24.22 vs. 4.95; active vs. severe: 15.27 vs. 4.12) (Table 16).
3.4. Cluster-Derived DWI MaRIA Classification
Clustering of DWI MaRIA-derived variables yielded 21 inactive, 37 active, and 26 severe segments (Table 17, Table 18 and Table 19). The distribution of the examined MRI-derived variables within these clusters was approximately normal, as indicated by descriptive statistics and dispersion parameters. The minimum and maximum values in the descriptive statistics provide new, cluster-derived thresholds for distinguishing between inactive, active, and severe disease. MANOVA revealed significant differences between the three cluster-derived groups (p < 0.001). One-way ANOVA confirmed significant between-group differences for all examined variables (p < 0.001). Post hoc pairwise differences were assessed using two-tailed t-tests, and the results are incorporated in the discriminant analysis tables (Table 20, indicated by *^1^ and *^2^). DA identified DWI MaRIA, wall thickness, and ADC as the most relevant contributors to group differentiation, with high internal homogeneity (Table 20). Mahalanobis distances were 7.40 for inactive vs. active, 16.35 for inactive vs. severe, and 9.41 for active vs. severe. Relative to the literature-based DWI MaRIA classification, these distances were larger under clustering (inactive vs. active: 7.40 vs. 3.59; inactive vs. severe: 16.35 vs. 5.72; active vs. severe: 9.41 vs. 2.85) (Table 21).
4. Discussion
This study explores whether data-driven grouping can improve MRI-based classification of Crohn’s disease (CD) activity beyond literature-based categories derived from fixed thresholds. In our analysis, bowel segments were classified both by applying literature-based thresholds from established indices (MaRIA and DWI MaRIA) and by using data-driven clustering. For each approach, we examined which parameters contributed most strongly to separating inactive, active, and severe segments. While the diffusion-based DWI MaRIA consistently showed the highest contribution under literature-based thresholds, clustering shifted the emphasis toward wall thickness, with DWI MaRIA and ADC remaining close behind. This yielded clearer separation of disease-activity groups and demonstrated that the two approaches reflect different aspects of the underlying biology. Our cluster-derived groups showed wider separation than literature-derived categories, and the observed differences were consistent with biologically expected imaging patterns in wall thickness, diffusion, and signal-intensity measures. In practical terms, clustering yielded boundaries that better reflected the expected gradients of inflammation across segments in this cohort, while remaining consistent with established MRI indices. MRE addresses a central limitation of ileocolonoscopy: the inability to evaluate the proximal small bowel comprehensively and to quantify transmural inflammation. Contemporary guidance from the European Crohn’s and Colitis Organisation (ECCO), the European Society of Gastrointestinal and Abdominal Radiology (ESGAR), the European Society of Pathology (ESP), and the International Bowel Ultrasound Group (IBUS) underscores the value of cross-sectional imaging for assessing mural and extramural disease, treatment response, and complications, particularly in situations where endoscopy cannot fully capture disease extent [22,23,24,25,26]. Recent reviews likewise emphasize MRE as a radiation-free modality suitable for repeated follow-up and for integrating objective, quantitative features into treat-to-target strategies [27,28,29,30]. CT enterography and intestinal ultrasound remain complementary cross-sectional techniques. While CT provides high spatial resolution and wide availability, MRE offers superior soft-tissue contrast, quantitative evaluation of transmural inflammation, and no ionizing radiation, making it preferable for repeated longitudinal monitoring. The data-driven clustering framework proposed here could also be explored in CT or ultrasound datasets in future comparative studies to assess whether similar cohort-specific boundaries emerge across modalities. We performed MRE at 1.5 T, a choice consistent with reports that, despite higher signal-to-noise ratio at 3 T, small-bowel imaging does not consistently achieve superior diagnostic performance at higher field strength and may be more prone to motion and susceptibility artifacts [30,31]. A notable limitation of the threshold-based MaRIA classification was that only five segments were categorized as inactive, despite the broader MRI profile suggesting a higher proportion of non-inflamed bowel. Unlike the predefined thresholds of the MaRIA and DWI MaRIA indices, clustering yielded new cohort-specific boundaries that redistributed segments into inactive, active, and severe groups in a way that was more consistent with the overall imaging profile. This discrepancy highlights how fixed cut-offs can fail to capture the distribution of segments, whereas cluster-derived groupings provided a more balanced and biologically consistent classification. Our findings offer a complementary perspective to recent efforts proposing simplified or pediatric-specific activity indices, such as the simplified MaRIA (sMaRIA), the Pediatric simplified MaRIA (P-sMaRIA), the Pediatric Inflammatory Crohn’s MRE Index (PICMI), and the modified Clermont score [32,33,34,35,36,37,38]. While those efforts recalibrate established formulas to improve clinical performance, our analysis shows that direct application of original literature-based thresholds can yield distributions that are poorly aligned with the broader imaging profile. In contrast, cluster-derived boundaries adapt to the characteristics of the cohort and generate classifications that are more consistent with biological expectations. The clinical relevance of redefining boundaries emerges clearly from our results, but these boundaries were derived from our own cohort and imaging protocol. Rather than suggesting universal cut-offs, our findings support deriving cohort-specific clusters at each center to refine classification in the local setting. Activity thresholds influence therapeutic decisions, including initiation or modification of treatment and the timing of response assessment. In practice, centers performing MRE for CD can use accumulated imaging data to establish cluster-derived reference ranges specific to their scanners and patient populations, supporting more consistent interpretation and treatment decisions. If cohort-specific clustering identifies groups that are more clearly separated, this could enhance classification accuracy at the margins and support more precise decision rules for treat-to-target care. Similar to the development of the original MaRIA and DWI MaRIA indices, this study was designed as a methodological proof of concept focused on improving the classification of disease activity rather than predicting treatment outcomes. The clustering approach refined the separation of inactive, active, and severe disease categories within the established MRI framework, without the intention to replace existing indices. Although therapeutic outcomes were not analyzed, future prospective studies should test whether cluster-derived categories can predict treatment escalation, surgery, or relapse-free survival, thereby linking imaging-derived clusters to clinical decision-making. These refinements are particularly relevant for the increasingly valued treatment targets of transmural response and healing, which reflect resolution of inflammation throughout the bowel wall rather than mucosa alone [39,40,41,42]. Cluster-derived boundaries can therefore serve as an initial framework for calibration at the level of individual MRI centers. They can be prospectively validated and, if shown to be robust, incorporated into routine reporting alongside widely used indices. At the same time, reproducibility across scanners and protocols remains essential, especially for multicenter studies. Standardized terminology and reporting frameworks promoted by international guidelines aim to reduce variability introduced by differences in technique [22,23,24,34]. Centers adopting cluster-derived thresholds should document protocol details, maintain rigorous quality control of ROI methodology and measurement procedures, and periodically re-estimate boundaries when hardware or sequences change. Multicenter consortia could use harmonized pipelines to test whether cluster-based boundaries stabilize when sample size and diversity increase [16,17]. Looking forward, radiomics and artificial intelligence (AI) offer a natural extension. Deep learning and conventional radiomics applied to MRE can capture patterns beyond human perception and show promise for distinguishing inflammation from fibrosis and predicting outcomes [43,44,45,46]. Incorporating cluster-derived labels in future AI studies may help define more biologically coherent phenotypes, but these applications will require rigorous external validation before translation into clinical practice. Cluster-derived categories can serve as a form of weak supervision, guiding AI models to recognize biologically meaningful distinctions in disease activity even when full histologic labeling is not available. In this approach, clusters function as pseudo-labels that orient feature learning toward patterns reflecting true inflammatory burden rather than predefined thresholds. Recent studies demonstrate that radiomic and deep learning models can extract quantitative MRI and CT features that distinguish inflammatory and fibrotic components of CD and predict the therapeutic response [47,48,49,50]. Weakly supervised and self-supervised learning strategies are increasingly used in medical imaging to address label scarcity and enhance generalizability across datasets [49]. These concepts support the potential of cluster-informed AI pipelines for more objective and reproducible disease assessment.
This study has several limitations. It was a single-center study conducted on a 1.5 T platform, which may limit the generalizability of the results. Future studies using 3 T systems could explore whether higher field strength further improves the sensitivity and reproducibility of cluster-derived boundaries [29,30]. Second, our reference standard relied solely on imaging, which inherently emphasizes mural inflammation and transmural features rather than mucosal status. Histologic grading across all bowel segments is not technically feasible, since endoscopic biopsy provides mucosal samples from a limited portion of the colon and terminal ileum, whereas MRE assesses transmural and proximal small-bowel inflammation beyond the reach of endoscopy. Consequently, imaging–histology correspondence is inherently partial, and MRI-based evaluation reflects a broader, transmural disease component rather than mucosal severity alone. Recent evidence further supports this concept, emphasizing that transmural healing has emerged as an independent therapeutic target alongside mucosal healing and is associated with improved long-term outcomes in CD [50,51]. While endoscopy is traditionally regarded as the reference for assessing mucosal disease activity, it cannot comprehensively evaluate the proximal small bowel or transmural involvement. As a result, imaging-based and endoscopy-based assessments may diverge, with MRE capturing deeper wall and extramural changes that are invisible to mucosal inspection [22,26]. In addition, recent work has highlighted areas of discordance between MRE and ileocolonoscopy for ileal strictures, underscoring the need to integrate modalities when defining decision rules [52]. Third, the sample size was modest, which may limit the stability of clustering results, and prospective validation is required to ensure that “inactive,” “active,” and “severe” categories correspond to meaningful differences in prognosis and treatment response. Fourth, the relative importance of individual parameters may vary with differences in imaging protocols or patient populations. Nevertheless, the ranking observed in this cohort provides meaningful insight into disease characterization and can stimulate further validation in broader settings. Finally, we did not perform external validation or a direct comparison between cluster-derived classifications and simplified indices (such as sMaRIA, P-sMaRIA, or the modified Clermont score) for specific clinical decision-making. Multicenter studies with harmonized acquisition protocols will be essential to confirm the generalizability of these findings. Such evaluations remain an important priority for future research. Despite these limitations, our findings provide proof-of-concept that data-driven clustering can generate biologically consistent classifications that complement established indices.
5. Conclusions
Cluster-derived groupings of bowel segments based on MRI features achieved clearer distinction of disease-activity categories than literature-based classifications, with wall thickness, diffusion-based metrics, and ADC emerging as the most informative contributors. These data-driven boundaries offer a framework for local recalibration of cut-offs, improving the alignment between imaging assessments and clinical decision-making. Looking ahead, multicenter external validation across vendors and protocols, prospective linkage to therapeutic outcomes and transmural targets, and integration with radiomics/AI pipelines remain essential priorities. This approach also aligns with a learning-health-system perspective, where centers iteratively refine imaging classifications to better mirror the biology observed in their own patient populations.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Panés J. Bouzas R. Chaparro M. García-Sánchez V. Gisbert J.P. Martínez de Guereñu B. Mendoza J.L. Paredes J.M. Quiroga S. Ripollés T. Systematic review: The use of ultrasonography, computed tomography and magnetic resonance imaging for the diagnosis, assessment of activity and abdominal complications of Crohn’s disease Aliment. Pharmacol. Ther.20113412514510.1111/j.1365-2036.2011.04710.x 21615440 · doi ↗ · pubmed ↗
- 2Torres J. Mehandru S. Colombel J.F. Peyrin-Biroulet L. Crohn’s disease Lancet 20173891741175510.1016/S 0140-6736(16)31711-127914655 · doi ↗ · pubmed ↗
- 3Hudson A.S. Wahbeh G.T. Zheng H.B. Imaging and endoscopic tools in pediatric inflammatory bowel disease: What’s new?World J. Clin. Pediatr.2024138909110.5409/wjcp.v 13.i 1.8909138596437 PMC 11000065 · doi ↗ · pubmed ↗
- 4Durak M.B. Increased Use of Magnetic Resonance Enterography in Crohn’s Disease Dis. Colon. Rectum 202568 e 10810.1097/DCR.000000000000361039655802 · doi ↗ · pubmed ↗
- 5Foti P.V. Travali M. Farina R. Palmucci S. Coronella M. Spatola C. Puzzo L. Garro R. Inserra G. Riguccio G. Can Conventional and Diffusion-Weighted MR Enterography Biomarkers Differentiate Inflammatory from Fibrotic Strictures in Crohn’s Disease?Medicina 20215726510.3390/medicina 5703026533803953 PMC 8000737 · doi ↗ · pubmed ↗
- 6Thormann M. Melekh B. Bär C. Pech M. Omari J. Wienke A. Meyer H.-J. Surov A. Apparent Diffusion Coefficient for Assessing Crohn’s Disease Activity: A Meta-Analysis Eur. Radiol.2023331677168610.1007/s 00330-022-09149-936169687 PMC 9935736 · doi ↗ · pubmed ↗
- 7Jannatdoust P. Valizadeh P. Razaghi M. Rouzbahani M. Abbasi A. Arian A. Role of Abbreviated Non-Contrast-Enhanced MR-Enterography in the Evaluation of Crohn’s Disease Activity and Complications as an Alternative for Full-Protocol Contrast-Enhanced Study: A Systematic Review and Meta-Analysis Res. Diagn. Interv. Imaging 2023610003010.1016/j.redii.2023.10003039077544 PMC 11265495 · doi ↗ · pubmed ↗
- 8Park S.H. DWI at MR Enterography for Evaluating Bowel Inflammation in Crohn Disease AJR Am. J. Roentgenol.2016207404810.2214/AJR.15.1586226959382 · doi ↗ · pubmed ↗
