Precision in prediction: tailoring machine learning models for breast cancer missense variants pathogenicity prediction
Rahaf M Ahmad, Noura AlDhaheri, Mohd Saberi Mohamad, Bassam R Ali

TL;DR
This study improves breast cancer variant predictions using machine learning models tailored to breast cancer genes, offering more accurate and interpretable results than general genome-wide tools.
Contribution
The study introduces a disease-specific machine learning approach for breast cancer missense variant prediction with integrated interpretability techniques.
Findings
The Extra Trees model achieved 99.1% accuracy on an independent ClinGen dataset.
Recursive feature elimination identified key genomic features for efficient prediction.
Interpretability techniques enhanced transparency and highlighted key drivers of predictions.
Abstract
Accurate classification of genetic variants is critical for precision medicine, particularly hereditary diseases such as breast cancer. However, widely used tools like MutPred and Combined Annotation Dependent Depletion (CADD) offer genome-wide pathogenicity predictions that often overlook disease-specific variant behavior, limiting their clinical utility. This study addresses that gap by training and benchmarking nine machine learning (ML) models-including ensemble and baseline classifiers-on a breast cancer gene-specific dataset rich in conservation scores, functional annotations, and allele frequency features. Among all models, the Extra Trees model achieved the highest performance, with an accuracy of 0.999 and a 95% confidence interval of (0.998–1.000). recursive feature elimination identified the most informative genomic features, enhancing model efficiency. To ensure clinical…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases · Genetic Associations and Epidemiology · BRCA gene mutations in cancer
