Gene Expression-Based Colorectal Cancer Prediction Using Machine Learning and SHAP Analysis
Yulai Yin, Zhen Yang, Xueqing Li, Shuo Gong, Chen Xu

TL;DR
This study developed a machine learning model using gene expression data to predict colorectal cancer with high accuracy, identifying key genes that could aid in early diagnosis.
Contribution
The novel contribution is the development of a CRC genetic diagnostic model using ten genes and XGBoost with strong predictive performance validated across datasets.
Findings
A genetic diagnostic model using ten genes achieved an AUC of 0.9875 in training and 0.9601 in validation.
XGBoost outperformed other machine learning algorithms with an AUC of 0.990.
SHAP analysis identified IFITM1 and DBNDD1 as the most influential genes in the model.
Abstract
Objective: To develop and validate a genetic diagnostic model for colorectal cancer (CRC). Methods: First, differential expression genes (DEGs) between colorectal cancer and normal groups were screened using the TCGA database. Subsequently, a two-sample Mendelian randomization analysis was performed using the eQTL genomic data from the IEU OpenGWAS database and colorectal cancer outcomes from the R12 Finnish database to identify associated genes. The intersecting genes from both methods were selected for the development and validation of the CRC genetic diagnostic model using nine machine learning algorithms: Lasso Regression, XGBoost, Gradient Boosting Machine (GBM), Generalized Linear Model (GLM), Neural Network (NN), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Random Forest (RF), and Decision Tree (DT). Results: A total of 3716 DEGs were identified from the TCGA…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroptosis and cancer prognosis · Genetic factors in colorectal cancer · Colorectal Cancer Surgical Treatments
