# Prediction of lymph node metastasis in T1 colorectal cancer based on machine learning

**Authors:** Suyujie Shi, Xiongwu Li, Linjun Li, Haowen Zhong, Ruoyan Wang, Zhenyu Zhang, Chuyi Liao, Yun Mao, Meijie Yang, Yaying Yang

PMC · DOI: 10.7717/peerj.20500 · 2026-02-11

## TL;DR

This study uses machine learning to predict lymph node metastasis in early-stage colorectal cancer, identifying new risk factors to improve patient treatment.

## Contribution

The study introduces four novel predictive indicators for lymph node metastasis in T1 colorectal cancer using machine learning.

## Key findings

- The random forest algorithm showed the best performance in predicting lymph node metastasis risk.
- Seven key risk factors for lymph node metastasis were identified in T1 colorectal cancer patients.
- Four new predictive indicators were discovered, including tumor submucosal invasion area and serrated lesions.

## Abstract

Colorectal cancer (CRC) ranks as the third most frequently diagnosed cancer. Early diagnosis and precise risk assessment for lymph node metastasis (LNM) of T1 CRC, characterized by tumor confined to the mucosa and submucosa, essential for enhancing patient outcomes and informing therapeutic strategies. This project aims to use machine learning in refining clinical decision-making processes for T1 CRC patients, thereby laying the groundwork for more personalized and efficacious treatment protocols.

In this study, we analyzed data from 210 patients with T1 CRC who underwent surgical resection at the First Affiliated Hospital of Chongqing Medical University from 2017 to 2023. The datasets encompassed clinical, endoscopic, and pathological parameters, which were examined to identify potential predictors of LNM. A range of machine learning algorithms, including boosted trees, decision trees, logistic regression, multilayer perceptron (MLP), naïve Bayes, k-nearest neighbors (K-NN), random forest and support vector machine (SVM), were leveraged to construct a predictive model for LNM in T1 CRC.

Our research demonstrated that the random forest algorithm outperformed other models in predictive performance for the risk of LNM. Furthermore, the model identified seven key risk factors associated with LNM. We found four novel LNM predictive indicators for T1 CRC: tumor submucosal invasion area, percentage of tumors with invasive carcinoma, poorly differentiated tumor cell clusters, and serrated lesions.

This study developed a risk predictive model for LNM in T1 CRC patients by utilizing eight machine learning algorithms. Four novel predictive indicators were identified, improving the accuracy of LNM prediction.

## Linked entities

- **Diseases:** colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** invasive (MESH:D009361), LNM (MESH:D008207), CRC (MESH:D015179), serrated lesions (MESH:D009059), cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12906262/full.md

---
Source: https://tomesphere.com/paper/PMC12906262