# Stool-Based Proteomic Signature for the Noninvasive Classification of Crohn's Disease and Ulcerative Colitis Using Machine Learning

**Authors:** Elmira Shajari, David Gagné, Francis Bourassa, Mandy Malick, Patricia Roy, Jean-François Noël, Hugo Gagnon, Maxime Delisle, François-Michel Boisvert, Marie Brunet, Jean-François Beaulieu

PMC · DOI: 10.14309/ctg.0000000000000925 · Clinical and Translational Gastroenterology · 2025-10-02

## TL;DR

This study uses stool proteins and machine learning to noninvasively distinguish Crohn's disease from ulcerative colitis with high accuracy.

## Contribution

A novel noninvasive proteomic signature and machine learning model for classifying Crohn's disease and ulcerative colitis.

## Key findings

- A predictive model achieved an area under the curve of 0.96 on both training and test datasets.
- Sixteen stool proteins were identified as significant biomarkers for distinguishing Crohn's disease from ulcerative colitis.
- The Naive Bayes model outperformed other machine learning algorithms in classification accuracy.

## Abstract

Crohn's disease (CD) and ulcerative colitis (UC) have overlapping symptoms, but they differ in pathology and treatment. Currently, distinguishing between these diseases involves invasive procedures such as colonoscopy and histopathology. Fecal proteins, stable and in direct contact with inflammation, offer a noninvasive alternative. This study focuses on using high-throughput data-independent acquisition mass spectrometry and machine learning to develop an accurate biomarker signature from complex stool samples.

Stool samples obtained from 69 active patients were analyzed. Analysis of the stool proteome led to the identification and quantification of approximately 1,250 proteins. The samples were divided into training and testing groups. After data processing, various feature selection algorithms were applied on the training group to determine proteins that were significantly different between the CD and UC groups. In addition, 6 machine learning algorithms were evaluated to identify the best-performing classifiers.

Sixteen proteins were selected based on several feature selection algorithms, and 6 models were trained based on them. According to the performance metrics of each algorithm on the training data set, the Naive Bayes model was selected. For performance validation, the final predictive model was applied to 16 blind prospective samples as the test data set. Notably, the model achieved an area under the curve of 0.96 on both the training and test data sets, highlighting its robustness and stability.

This study demonstrates the potential of combining multiple stool protein biomarkers through high-throughput data-independent acquisition mass spectrometry and machine learning tools to develop a predictive model for efficiently distinguishing CD from UC.

## Linked entities

- **Diseases:** Crohn's disease (MONDO:0005011), ulcerative colitis (MONDO:0005101)

## Full-text entities

- **Diseases:** Crohn's Disease (MESH:D003424), Ulcerative Colitis (MESH:D003093), inflammation (MESH:D007249)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12637349/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12637349/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12637349/full.md

---
Source: https://tomesphere.com/paper/PMC12637349