# Machine Learning–Based Prediction of Histopathological Classification in Colorectal Polyps

**Authors:** Gökhan Koker, Gizem Zorlu Gorgulugil, Muhammed Ali Coskuner, Merve Eren Durmus

PMC · DOI: 10.5152/tjg.2025.25542 · The Turkish Journal of Gastroenterology · 2025-10-01

## TL;DR

This study uses machine learning to predict the type of colorectal polyps based on patient data, aiming to improve early cancer screening.

## Contribution

The study introduces a machine learning approach to classify colorectal polyps using non-invasive demographic and dietary data.

## Key findings

- SVM and random forest achieved the highest accuracy in predicting polyp types.
- Extreme gradient boosting uniquely identified hyperplastic polyps but had lower overall accuracy.
- Dietary factors like bulgur and red meat consumption were key predictors identified by the model.

## Abstract

Colorectal polyps are precursor lesions of colorectal cancer, and their histopathological types are critical for determining malignant potential. Predicting polyp histopathological types may support early and appropriate clinical management. Machine learning (ML) algorithms based on accessible demographic, clinical, and lifestyle data can contribute to individualized screening strategies.

This retrospective cross-sectional study included 491 individuals who underwent colonoscopy for the first time between 2022 and 2025 at University of Health Sciences, Antalya Training and Research Hospital. Demographic and clinical data were recorded, and dietary habits were assessed using the Food Frequency Questionnaire. Patients were classified into 3 groups according to histopathology: adenomatous polyp, hyperplastic polyp, and no polyp. Four ML algorithms—decision tree, random forest, support vector machines (SVMs), and extreme gradient boosting—were applied. Model performance was evaluated using accuracy, sensitivity, specificity, kappa statistic, and McNemar’s test. Variable contributions were further analyzed with SHapley Additive exPlanations.

Accuracy ranged from 70.9% to 76.4%, with the highest performance from SVM (76.4%) and random forest (75.7%). Extreme gradient boosting showed lower overall accuracy (70.9%) but was the only model that identified hyperplastic polyps. The no polyp group was consistently predicted with high accuracy (sensitivity 85.6%-95.9%). Precision for adenomatous polyps was highest with SVM (71.4%). SHapley Additive exPlanations analysis highlighted frequent bulgur consumption (>2 times/week), red meat intake, age, and body mass index as major predictors.

Machine learning algorithms can predict colorectal polyp histopathological types using routine demographic, clinical, and dietary data, enabling more personalized and effective screening beyond age-based protocols.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** hyperplastic polyps (MESH:D011127), colorectal cancer (MESH:D015179), Colorectal Polyps (MESH:D003111), adenomatous polyp (MESH:D018256)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12520147/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12520147/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12520147/full.md

---
Source: https://tomesphere.com/paper/PMC12520147