Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques
Ahmed M. A. Elsherbini, Amr Hassan Elkholy, Youssef M. Fadel, Gleb Goussarov, Ahmed Mohamed Elshal, Mohamed El-Hadidi, Mohamed Mysara

TL;DR
This paper introduces GenoSig, a fast alignment-free tool using machine and deep learning to classify SARS-CoV-2 lineages based on nucleotide frequency patterns.
Contribution
GenoSig offers a novel alignment-free method for SARS-CoV-2 lineage classification with high accuracy using Di and Tri nucleotide frequencies.
Findings
GenoSig achieved 87.88% accuracy with deep learning and 86.37% with Random Forest for lineage classification.
Later clades like GRA, GRY, and GK showed better performance than earlier clades G and GH.
Both models favored cytosine and guanine in nucleotide frequency patterns for classification.
Abstract
The global spread of the SARS-CoV-2 pandemic, originating in Wuhan, China, has had profound consequences on both health and the economy. Traditional alignment-based phylogenetic tree methods for tracking epidemic dynamics demand substantial computational power due to the growing number of sequenced strains. Consequently, there is a pressing need for an alignment-free approach to characterize these strains and monitor the dynamics of various variants. In this work, we introduce a swift and straightforward tool named GenoSig, implemented in C++. The tool exploits the Di and Tri nucleotide frequency signatures to delineate the taxonomic lineages of SARS-CoV-2 by employing diverse machine learning (ML) and deep learning (DL) models. Our approach achieved a tenfold cross-validation accuracy of 87.88% (± 0.013) for DL and 86.37% (± 0.0009) for Random Forest (RF) model, surpassing the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRocket and propulsion systems research · Planetary Science and Exploration · Space Exploration and Technology
