In silico tool for identification of colorectal cancer from cell-free DNA biomarkers
Kartavya Mathur (1, 2), Shipra Jain (1), Nisha Bajiya (1), Nishant Kumar (1), Gajendra P. S. Raghava (1) ((1) Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, (2) School of Biotechnology, Gautam Buddha University, Uttar Pradesh)

TL;DR
This study develops an in silico diagnostic tool using machine learning on cfDNA methylation data to accurately detect colorectal cancer noninvasively, with promising performance metrics.
Contribution
The paper introduces a novel combination of methylation biomarker selection and machine learning models for CRC detection from cfDNA, achieving high accuracy.
Findings
Best model (MLP) achieved AUROC of 0.89
Selected 25 CpG features for optimal performance
Deep learning model achieved AUROC of 0.78
Abstract
Colorectal cancer remains a major global health concern, with early detection being pivotal for improving patient outcomes. In this study, we leveraged high throughput methylation profiling of cellfree DNA to identify and validate diagnostic biomarkers for CRC. The GSE124600 study data were downloaded from the Gene Expression Omnibus, as the discovery cohort, comprising 142 CRC and 132 normal cfDNA methylation profiles obtained via MCTA seq. After preprocessing and filtering, 97,863 CpG sites were retained for further analysis. Differential methylation analysis using statistical tests identified 30,791 CpG sites as significantly altered in CRC samples, where p is less than 0.05. Univariate scoring enabled the selection of top ranking features, which were further refined using multiple feature selection algorithms, including Recursive Feature Elimination, Sequential Feature Selection,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation · Machine Learning in Bioinformatics · Cancer Genomics and Diagnostics
MethodsLogistic Regression · Feature Selection · Rank Flow Embedding
