Use of Modified YOLOv5 Algorithm in the Differential Diagnosis of Colonic Crohn's Disease and Ulcerative Colitis on CTE Images

Mingbo Bao; Wenjia Liu; Haifeng Shi; Mingzhu Meng; Jian Cao

PMC · DOI:10.1155/grp/1506567·July 23, 2025

Use of Modified YOLOv5 Algorithm in the Differential Diagnosis of Colonic Crohn's Disease and Ulcerative Colitis on CTE Images

Mingbo Bao, Wenjia Liu, Haifeng Shi, Mingzhu Meng, Jian Cao

PDF

Open Access

TL;DR

This study explores using a modified YOLOv5 algorithm to help distinguish between two types of inflammatory bowel disease using CT scans.

Contribution

The novel use of a modified YOLOv5 algorithm for differential diagnosis of colonic Crohn's disease and ulcerative colitis on CTE images.

Findings

01

The YOLOv5x model achieved high mAP scores (0.97 mAP_0.5 and 0.83 mAP_0.5:0.95) in both validation and test sets.

02

The model's performance was comparable to two radiologists (84.5% diagnostic accuracy).

03

The modified YOLOv5 algorithm shows promise for early detection and differential diagnosis of IBD.

Abstract

Background: Inflammatory bowel disease (IBD) is an immune-mediated disorder characterized by intestinal inflammation and includes two subtypes: Crohn's disease (CD) and ulcerative colitis (UC). The computed tomography manifestations of colonic CD (cCD) and UC are similar, and differential diagnosis is challenging. Our study aimed to investigate the feasibility of using a modified YOLOv5 algorithm for differentiating between cCD and UC on computed tomography enterography (CTE) images. Methods: This multicenter retrospective study analyzed data from a total of 29 cCD patients and 29 UC patients. Five submodels (YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x) of YOLOv5 were trained and evaluated on the datasets. The CTE images of the cCD group and UC group were divided into a training set, validation set, and test set at a ratio of 8:1:1. Finally, the precision (Pr), recall rate (Rc),…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

VPS72

Proteins1

Species1

Homo sapiens(human · species)

Chemicals3

mannitol magnesium CTE

Diseases15

Crohn's disease ulcerative colitis inflammatory bowel disease UC intestinal diseases prostatic hypertrophy colon cancer DL IBD glaucoma intestinal obstruction arrhythmia CD intestinal inflammation immune-mediated disorder

Figures7

Click any figure to enlarge with its caption.

a,b](#fig4)). Formulas for calculating S1, S2, IoU, and loss_IoU_ are as follows:

Funding3

—Changzhou Medical Center
—Nanjing Medical University
—Changzhou Municipal Science and Technology Bureau

Keywords

artificial intelligencediagnosisinflammatory bowel diseasemachine learning

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Inflammatory Bowel Disease · Colorectal Cancer Screening and Detection

Full text

1. Introduction

Over the last few years, the prevalence and incidence of inflammatory bowel disease (IBD), including Crohn's disease (CD) and ulcerative colitis (UC), have steadily increased, posing a substantial social and economic burden on governments and health systems. [1–4] This growing concern has attracted the attention of clinicians and radiologists. CD can be subdivided into four types based on the location: ileal CD, colonic CD (cCD), ileocolonic CD, and isolated upper CD [5]. Among them, cCD accounts for 22%~35% of all IBD cases according to the reported literature [6, 7]. However, the CT manifestations of cCD and UC are similar, and differentiating between the two conditions remains a challenge.

Artificial intelligence (AI) has been increasingly adopted across different areas of medicine, with a growing body of research exploring its application in gastroenterology [8]. The clinical use of AI has been primarily aimed at exploring the causes, diagnosing, and predicting outcomes for patients with IBD [9–13]

Sutton et al. [14] investigated the use of deep learning (DL) algorithms to differentiate UC from other intestinal diseases and evaluate the severity of UC during endoscopy. The team analyzed a dataset containing 851 images of UC patients and conducted a five-fold cross-validation. The results revealed that the highest accuracy (87.50%) and area under the curve (AUC) (0.90) were achieved using the DenseNet121 architecture, compared to 72.02% and 0.50 by predicting the majority class (“no skill” model). Additional research into the automatic evaluation of UC endoscopic imagery has highlighted the benefits of DL techniques. A study by Takenaka et al. utilized a DL-based model to assess endoscopic images from UC patients [15]. Their approach achieved a high degree of accuracy (90.1%) in analyzing endoscopic images from 40,758 UC patients in endoscopic remission, all of whom had been assigned a UC Endoscopic Severity Index Score (UCEIS) of 0. The availability of extensive datasets has facilitated the development of innovative solutions to address needs in the management of IBD patients. Nonetheless, the diversity of AI methods, data sources, and clinical outcomes presents challenges in seamlessly integrating AI technology into daily clinical practice, especially in analyzing CT imaging data in the diagnosis of IBD.

The 5th edition of You Only Look Once (YOLOv5), which is a cutting-edge, state-of-the-art computer vision model developed by Ultralytics, is capable of performing both object detection and classification tasks simultaneously. In this paper, YOLOv5 was adapted to detect and classify cCD and UC on computed tomography enterography (CTE) images.

2. Patients and Methods

2.1. Patients

Ethics approval was obtained from the Second Hospital of Changzhou Affiliated with Nanjing Medical University (Ethics Number: [2022] KY224-01). This retrospective study analyzed data from two independent hospitals in Jiangsu, China, namely, The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University and the Chinese Medical Hospital of Wujin. All patients had complete clinical and CTE data. A total of 29 cCD patients (21 males; 8 females) and 29 UC patients (18 males; 11 females) were included. Exclusion criteria are as follows: (1) patients in the remission phase; (2) CTE images with poor quality. A flow chart of the study is shown in Figure 1.

2.2. Bowel Preparation Before CTE

Patients were instructed to consume a low-residue diet and remain fasting after the evening meal on the day before the examination. In addition, oral magnesium was administered half an hour after the evening meal. The patients fasted (refrained from eating and drinking) before the CT examination in the morning. A hypotonic solution was diluted with mannitol to prepare an isotonic solution (300 mOsmol). Subsequently, 1000–1500 mL of isotonic solution was ingested approximately 45–50 min before the scan to distend the bowels. Ten minutes before the CT scan, patients were intramuscularly injected with 10 mg of raceanisodamine hydrochloride (1 mL:10 mg), and 500 mL of isotonic solution was ingested again.

Attention: (i) The amount of isotonic solution was modulated to reach the appropriate dosing in pediatric and elderly patients. (ii) Oral isotonic solution was not required for patients with complete intestinal obstruction, as the intestinal fluid was sufficient for intestinal distension. (iii) Raceanisodamine hydrochloride is contraindicated in patients with prostatic hypertrophy, glaucoma, and arrhythmia.

2.3. CT Scan Protocols

Two different models of CT scanners were used: a General Electric CT scanner (Discovery CT750 HD) and a Siemens CT scanner (Somatom Definition Flash CT, Germany). The CT scans (tube voltage of 100~120 kV, tube current modulated automatically, pitch of 1.1, slice thickness of 5 mm, layer spacing of 3 mm, 512∗512 matrix, and field of view of 70 cm) included the abdomen and pelvis from the dome of the diaphragm to the pubic symphysis; reformatted sagittal and coronal CT images were generated. Specifically, 100 mL of nonionic iodinated contrast material (320 or 350 mg I/mL) was injected through the elbow vein at a rate of 4.0 mL/s. Enhanced CT scans started at 45 s after the injection of the contrast agent.

2.4. Image Analysis

Two radiologists, each with more than 10 years of abdominal CT experience, assessed the CTE images through the picture archiving and communication systems (PACSs) (http://www.jinpacs.com/) independently. The diagnosis of IBD was based on CTE signs of active IBD such as wall thickness of greater than 3 mm in a distended bowel loop, mural hyperenhancement (increased attenuation of the inner layer), mural stratification (bilaminar or trilaminar appearance of the bowel wall), prominence of the vasa recta (comb sign), and mesenteric lymph nodes [16, 17]. The analysis was conducted in two stages. In Stage 1, the two radiologists were blinded to the patients' histopathological diagnoses and performed the diagnostic assessment. In Stage 2, the CTE images were assessed again after unmasking the histopathological diagnosis, and the CTE images were saved at this stage. Disparities between the two radiologists were resolved by discussion or by a third senior radiologist.

2.5. CTE Image Dataset and Data Labeling

Finally, 1290 cCD images (cCD group) and 982 UC images (UC group) were obtained. Image annotation (YOLO format) was performed using the LabelImg offline software package (Tzutalin, Version 1.8.1, available at https://github.com/tzutalin/labelImg). Annotations completed by one radiologist were verified by the other radiologist. The data of the two groups were randomly (Simple random method) divided into a training set, validation set, and test set at a ratio of 8:1:1. Labeled data distribution is shown in Figure 2.

2.6. Computer Configuration

The computer environment consisted of Windows 10 Enterprise 64-bit, Intel(R) Core (TM) i7-10700F 2.9GHz, 32 GB RAM, and NVIDIA GeForce RTX 2060 GPU with 12 GB memory. Data were stored on a hard disk (512 GB solid-state drive) for offline analysis. The models were built based on Python (Version, 3.8), Pytorch (Version, 2.0.1), and CUDA 11.8. While running the model, all other programs were closed.

2.7. The Architecture of YOLOv5

The performance of five modified submodels (YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x) of YOLOv5 was compared in the same dataset. The architecture of YOLOv5 and modified hyperparameter settings are presented in Figure 3 and Table 1. ReLU6 and HS were chosen as the activation functions in the convolutional layers; the formulas for the calculation are as follows:

[eqn]

[eqn]

2.8. Intersection Over Union (IoU) and Loss Function

The IoU is calculated by dividing the surface area of the intersection (S1, red area) between two rectangular boxes by the surface area (S2) of the union between those two rectangles. (Xp1, Yp1) and (Xp2, Yp2) represent the top-left and bottom-right corners, respectively, of the predicted rectangular box. (Xl1, Yl1) is the top left corner, and (Xl2, Yl2) is the bottom right corner of the labeled rectangular box (see Figure 4a,b). Formulas for calculating S1, S2, IoU, and loss_IoU_ are as follows:

[eqn]

[eqn]

[eqn]

[eqn]

IoU can have a value between 0 and 1. An IoU value of 0 indicates no overlap between the two rectangular boxes, while an IoU of 1 indicates complete overlap. This study utilized the GIoU algorithm. While IoU only considers overlapping areas, GIoU includes nonoverlapping areas as well, providing a more accurate assessment of the degree of coincidence (Figure 4c). The formula for calculating GIoU was defined as follows:

[eqn]

[eqn]

An IoU of 0 suggests no intersection between A and B. However, GIoU can also display negative values, with a value of −1 indicating infinite separation between A and B. Conversely, a GIoU of 1 indicates complete overlap between A and B. The GIoU algorithm solves the problem of the IoU value being 0 when there is no intersection between A and B. Loss_GIoU_ can be calculated as:

[eqn]

2.9. Metrics for Evaluation

The metrics calculated in this study included [18]: precision (Pr), recall rate (Rc), and mean average precision (AP) (mAP). The mathematical formulations are as follows:

[eqn]

[eqn]

[eqn]

True positives (TPs) and true negatives (TN) represent samples that were correctly identified as positive and negative, respectively. False positives (FP) and false negatives (FN) are instances of samples that were incorrectly identified as positives or negatives, respectively. As a two-axis mapping, AP represents the region enclosed by Pr and recall as a function of class for a given class [19]. An essential parameter for network model training is the mAP, which is derived from the mean average Pr for each category.

2.10. Statistical Analysis

Statistical analysis was performed using SPSS 16.0 statistical software (IBM). For variables conforming to a normal distribution (Kolmogorov–Smirnov test), the count data were expressed in the form of mean ± standard deviation, and one-way analysis of variance (ANOVA) was used to analyze the variance between the two groups. The Mann–Whitney U test was performed if the data did not conform to a normal distribution. p < 0.05 was considered statistically significant.

3. Results

3.1. Age Distribution

A total of 29 cCD patients (mean age 45.31 ± 20.37) and 29 UC patients (mean age 48.10 ± 18.94) were included. The Kolmogorov–Smirnov analysis demonstrated that age in the two groups conformed to a normal distribution (p = 0.147 and 0.109, respectively). No significant difference in age was found between the two groups (F = 0.292, p = 0.591).

3.2. Results of Model Training and Validation

In the interrater reliability analysis, the ICC for annotations by two radiologists was 0.945 (95% CI: 0.943–0.999), indicating excellent agreement. In both the training and validation sets, Pr, Rc, mAP__0.5_, and mAP__0.5:0.95_ gradually increased as the training epoch increased, whereas the loss value gradually decreased.

After 200 training epochs, the YOLOv5x model achieved the highest mAP__0.5_ and mAP__0.5:0.95_ of 0.98 and 0.86 in the validation set compared with the other four models (Figure 5). Among all models in our dataset, YOLOv5x exhibited the highest giga floating-point operations per second (GFLOPs) and the time used for training, at 142.1 and 8.14 h, respectively. Detailed data are provided in Table 2.

3.3. Visualization

Different features were extracted from different convolutional layers in different modules and were observed in the form of pictures, as shown in Figure 6.

3.4. Comparison

During Stage 1, the diagnostic accuracies of radiologists for cCD and UC were 0.79 (23/29) and 0.90 (26/29), respectively. About 84.5% (49/58) of IBD patients were correctly diagnosed by the two radiologists based on CTE images, and nearly 15.5% (9/58) were misdiagnosed. The CT images of those misdiagnosed cases showed signs that could be interpreted as both cCD and UC, with some cases resembling colon cancer. Hence, differentiating between cCD and UC based on CTE images was challenging, even after involving the third senior radiologist (Figure 7).

In the test set, the YOLOv5n model achieved the lowest Pr (0.93), Rc (0.92), mAP__0.5_(0.94), and mAP__0.5:0.95_(0.68), while the YOLOv5x model obtained the highest Pr (0.97), Rc(0.95), mAP__0.5_(0.97), and mAP__0.5:0.95_(0.83). Specifically, the mAP__0.5:0.95_ of YOLOv5x for cCD and UC were 0.83 and 0.86, respectively. These results are much closer to the diagnostic accuracies of two radiologists. Detailed data are provided in Table 3.

4. Discussion

In this paper, five submodels of YOLOv5 were adapted to detect and classify cCD and UC on CTE images. This methodological framework demonstrates potential for characterizing CTE imaging biomarkers in IBD. It has contributed to a better understanding of the pathogenesis of IBD. Our findings revealed that variations in model complexity within the YOLOv5 architecture resulted in different mAP in the differential diagnosis of cCD and UC on CTE images, with mAP increasing with model complexity. The YOLOv5x model achieved the highest mAP__0.5_ and mAP__0.5:0.95_ of 0.98 and 0.86 in the validation set, respectively. Moreover, similar results were obtained in the test set, achieving an mAP__0.5_ and mAP__0.5:0.95_ of 0.97 and 0.83, respectively. Hence, the trained YOLOv5x model demonstrated a strong generalization ability in our dataset.

At present, the diagnosis and monitoring of IBD have not been standardized. Therefore, the diagnosis of IBD warrants caution and may require amending the diagnoses during follow-up and adapting treatments accordingly. CD can affect any section of the digestive tract and is subdivided into four types according to the Montreal classification [5, 20]. Due to the similar CT manifestations of cCD and UC, differentiating between these two diseases based solely on radiological images is challenging.

Diverse data types can be used for AI analysis of IBD, including endoscopic data, medical imaging examination, histological data, gene sequence, and protein sequence [21–24]. Among these data types, endoscopic and medical imaging examination data are relatively easy to access and are the most frequently used tools for the diagnosis and management of IBD in daily clinical practice. A previous report showed that supervised ML models yielded a classification accuracy of 71.0%, 76.9%, and 82.7% based on endoscopic data only, histological data only, and combined endoscopic/histological data, respectively. The combined model also obtained an accuracy of 83.3% on the test set [25]. Currently, no other studies are available to compare our DL-based research on CTE data. Dhaliwal et al. [26] developed a random forest model based on multimodal data (including digestive endoscopy, pathology, and magnetic resonance images), achieving an accuracy of 97% in distinguishing between cCD and UC, identifying the seven most significant features (three histological features and four endoscopic features). In contrast to the study by Dhaliwal et al., our proposed YOLOv5 model can visually identify cCD and UC directly from CTE images, potentially providing assistance to clinicians in formulating treatment strategies.

DL is a relatively new field compared to traditional ML, especially the application of DL in medical image analysis. DL models require a large amount of heterogeneous training data and employ a convolutional neural network to extract features of input data from all sites automatically, enabling flexible decisions based on the situation [27, 28]. YOLOv5 is a new one-stage object-detection DL algorithm that features a very high speed of detection and accuracy comparable to that of other state-of-the-art architectures. YOLOv5 has several submodels with different parameter configurations.

Our results also indicate that increased model complexity is an effective way to improve the generalization ability of YOLOv5. Meanwhile, model size also increased with increasing model complexity. The comparative analysis revealed that the YOLOv5x model occupied 120.6 MB of memory, representing a nearly 39.8 times larger size than YOLOv5n (3.03 MB). This large size limits its applicability in deploying the models to mobile devices.

In contrast to the diagnostic accuracies of radiologists in the first stage, the mAP__0.5:0.95_ of the YOLOv5x (0.84, validation set) model was close to the diagnostic accuracies of radiologists (84.5%) to some extent. However, due to the small number of cases, this outcome should be confirmed through further studies.

Nevertheless, the limitations of the present study should be acknowledged. First, the relatively small sample size may lead to model overfitting. A total of 58 patients were enrolled, and selection bias was inevitable. This directly contributed to the relatively low mAP of the model. Additional studies with larger cohorts are required to enhance the generalization ability of YOLOv5, and multicenter collaboration is necessary to obtain larger sample sizes, thereby strengthening the applicability of the results. Second, no external validation dataset was established, and the model's robustness requires further verification. Third, since the aim was to distinguish between cCD and UC on CTE images, the intestinal tube was labeled rather than the intestinal wall. In future work, we plan to analyze the intestinal walls of cCD and UC using YOLOv5. Additionally, the interpretability of the YOLOv5 model and the deployment of the model for distinguishing cCD and UC require further investigation.

5. Conclusion

Accurate differentiation between cCD and UC is essential for guiding targeted therapeutic strategies in IBD. Our findings suggest that YOLOv5-based models may assist in the differential diagnosis between cCD and UC. This pilot study explores the feasibility of a YOLOv5-based approach to analyze CTE features, with initial findings indicating that such models could help augment the differential diagnosis of IBD subtypes. However, further in-depth research is required to improve the model's robustness and advance its clinical deployment.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Alatab S. Sepanlou S. G. Ikuta K. The Global, Regional, and National Burden of Inflammatory Bowel Disease in 195 Countries and Territories, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study 2017 Lancet Gastroenterology and Hepatology 202051173010.1016/S 2468-1253(19)30333-431648971 PMC 7026709 · doi ↗ · pubmed ↗
2Singh S. Qian A. S. Nguyen N. H. Trends in U.S. Health Care Spending on Inflammatory Bowel Diseases, 1996-2016 Inflammatory Bowel Diseases 202228336437210.1093/ibd/izab 07433988697 PMC 8889287 · doi ↗ · pubmed ↗
3Lewis J. D. Parlett L. E. Jonsson Funk M. L. Incidence, Prevalence, and Racial and Ethnic Distribution of Inflammatory Bowel Disease in the United States Gastroenterology 2023165511971205.e 210.1053/j.gastro.2023.07.00337481117 PMC 10592313 · doi ↗ · pubmed ↗
4Gilliland A. Chan J. J. De Wolfe T. J. Yang H. Vallance B. A. Pathobionts in Inflammatory Bowel Disease: Origins, Underlying Mechanisms, and Implications for Clinical Care Gastroenterology 20241661445810.1053/j.gastro.2023.09.01937734419 · doi ↗ · pubmed ↗
5Satsangi J. Silverberg M. S. Vermeire S. Colombel J. F. The Montreal Classification of Inflammatory Bowel Disease: Controversies, Consensus, and Implications Gut 20065567497531669874610.1136/gut.2005.082909 PMC 1856208 · doi ↗ · pubmed ↗
6Martin S. T. Vogel J. D. Restorative Procedures in Colonic Crohn Disease Clinics in Colon & Rectal Surgery 201326210010510.1055/s-0033-13480482-s 2.0-8487999177224436657 PMC 3709952 · doi ↗ · pubmed ↗
7Hedrick T. L. Friel C. M. Colonic Crohn Disease Clinics in Colon & Rectal Surgery 2013262848910.1055/s-0033-13480462-s 2.0-8487961144124436655 PMC 3709925 · doi ↗ · pubmed ↗
8Lai W. Y. Lin K. W. Ling L. P. Li J. W. Lau L. H. S. Chiu P. W. Y. Artificial Intelligence in Colonoscopy - Where Are We Now in 2024? Gastroenterology 20251-2812810.1159/000544030 PMC 1253403739947147 · doi ↗ · pubmed ↗