Genomics and integrative clinical data machine learning scoring model to ascertain likely Lynch syndrome patients
Ramadhani Chambuso, Takudzwa Nyasha Musarurwa, Alessandro Pietro Aldera, Armin Deffur, Hayli Geffen, Douglas Perkins, Raj Ramesar

TL;DR
A machine learning model was developed to accurately identify likely Lynch syndrome cases in colorectal cancer patients using clinical and genetic data, offering a cost-effective alternative to traditional testing.
Contribution
The novel contribution is a machine learning scoring model that integrates clinicopathologic and genomic data to accurately identify likely Lynch syndrome cases.
Findings
The model achieved 100% sensitivity and specificity when using both clinical and genetic data.
Models relying solely on clinical or pathological data had lower accuracy and AUCPR values.
The approach reduces reliance on expensive germline testing for Lynch syndrome screening.
Abstract
Lynch syndrome (LS) screening methods include multistep molecular somatic tumor testing to distinguish likely-LS patients from sporadic cases, which can be costly and complex. Also, direct germline testing for LS for every diagnosed solid cancer patient is a challenge in resource limited settings. We developed a unique machine learning scoring model to ascertain likely-LS cases from a cohort of colorectal cancer (CRC) patients. We used CRC patients from the cBioPortal database (TCGA studies) with complete clinicopathologic and somatic genomics data. We determined the rate of pathogenic/likely pathogenic variants in five (5) LS genes (MLH1, MSH2, MSH6, PMS2, EPCAM), and the BRAF mutations using a pre-designed bioinformatic annotation pipeline. Annovar, Intervar, Variant Effect Predictor (VEP), and OncoKB software tools were used to functionally annotate and interpret somatic variants…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic factors in colorectal cancer · Cancer Genomics and Diagnostics · Genomics and Rare Diseases
