Identifying Epigenetic Signature of Breast Cancer with Machine Learning
Maxim Vaysburd

TL;DR
This study uses machine learning on methylation data to identify a small set of CpG sites that serve as biomarkers for breast cancer, enabling early diagnosis with high accuracy.
Contribution
It introduces a method to pinpoint key epigenetic biomarkers for breast cancer using regularized machine learning on large methylation datasets.
Findings
Identified 25 CpG sites as biomarkers for breast cancer.
Achieved over 94% classification accuracy with the reduced model.
Validated the importance of selected CpG sites for cancer detection.
Abstract
The research reported in this paper identifies the epigenetic biomarker (methylation beta pattern) of breast cancer. Many cancers are triggered by abnormal gene expression levels caused by aberrant methylation of CpG sites in the DNA. In order to develop early diagnostics of cancer-causing methylations and to develop a treatment, it is necessary to identify a few dozen key cancer-related CpG methylation sites out of the millions of locations in the DNA. This research used public TCGA dataset to train a TensorFlow machine learning model to classify breast cancer versus non-breast-cancer tissue samples, based on over 300,000 methylation beta values in each sample. L1 regularization was applied to identify the CpG methylation sites most important for accurate classification. It was hypothesized that CpG sites with the highest learned model weights correspond to DNA locations most relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation · Cancer Genomics and Diagnostics · Cancer-related gene regulation
MethodsL1 Regularization
