CYTO-SV-ML: A Machine Learning Tool for Cytogenetic Structural Variant Analysis in Somatic Cell Type Using Genome Sequences
Tao Zhang, Paul Auer, Stephen R. Spellman, Jing Dong, Wael Saber, Yung-Tsi Bolon

TL;DR
CYTO-SV-ML is a machine learning tool that improves the detection of large somatic structural variants in genome sequencing data, outperforming traditional methods in accuracy and uncovering new variants in most patients.
Contribution
A high-performance machine learning pipeline for identifying large somatic structural variants in genomic data, with improved accuracy over conventional methods.
Findings
CYTO-SV-ML achieved an AUCROC of 0.94 for translocations and 0.92 for non-translocations in classifying somatic SVs.
The tool identified 207 somatic SVs compared to 143 by a conventional pipeline in clinical validation.
CYTO-SV-ML uncovered novel SVs in 89% of patients with unsuccessful clinical cytogenetic results.
Abstract
(1) Background: Although whole genome sequencing (WGS) has enabled the comprehensive analyses of structural variants (SVs), more accurate and efficient methods are needed to distinguish large somatic SVs (SV size ≥ 1 Mb) traditionally detected through cytogenetic testing from germline SVs. (2) Methods: A customized machine learning pipeline (CYTO-SV-ML) under Snakemake automation workflow was developed with a user interface to identify somatic cytogenetic SVs in WGS data. And this tool was applied for characterizing structural variation profiles in the whole blood of patients with myelodysplastic syndromes (MDSs). Known SVs mapped from well-established open databases were split into training and validation subsets for an AUTO-ML machine learning model in a CYTO-SV-ML pipeline. (3) Results: The benchmarking performance of the CYTO-SV-ML pipeline on somatic cytogenetic SV classification…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics · Genomics and Rare Diseases · Genetic factors in colorectal cancer
