A DNA Methylation Classification Model Predicts Organ and Disease Site
Keng-Jung Lee, Dharanya Sampath, Konstantinos Mavrommatis

TL;DR
This study presents a machine learning model that classifies tissue and disease origin from cfDNA methylation data across multiple platforms, aiding non-invasive diagnostics.
Contribution
It introduces a cross-platform, methylation-based classification framework that improves tissue origin detection from cfDNA, addressing platform variability and data sparsity.
Findings
Achieved 75-80% accuracy across platforms.
Successfully distinguished tissues like inflamed synovium and PBMCs.
Accurately estimated tissue proportions in synthetic cfDNA samples.
Abstract
Cell-free DNA (cfDNA) analysis is a powerful, minimally invasive tool for monitoring disease progression, treatment response, and early detection. A major challenge, however, is accurately determining the tissue of origin, especially in complex or heterogeneous disease contexts. To address this, we developed a machine learning framework that leverages tissue-specific DNA methylation signatures to classify both tissue and disease origin from cfDNA data. Our model integrates methylation datasets across diverse epigenomic platforms, including Whole Genome Bisulfite Sequencing (WGBS), Illumina Infinium Bead Arrays, and Enzymatic Methyl-seq (EM-seq). To account for platform variability and data sparsity, we applied imputation strategies and harmonized CpG features to enable cross-platform learning. Dimensionality reduction revealed clear tissue-specific clustering of methylation profiles. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEpigenetics and DNA Methylation
