Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences
Sarwan Ali, Murray Patterson

TL;DR
Spike2Vec is a new scalable embedding method designed for COVID-19 spike sequences, enabling efficient analysis of millions of sequences and improving classification accuracy over existing methods.
Contribution
The paper introduces Spike2Vec, a scalable embedding approach for COVID-19 spike sequences that enhances machine learning tasks with high accuracy and efficiency.
Findings
Spike2Vec scales to several million sequences effectively.
It outperforms baseline models in prediction accuracy.
It improves F1 scores in classification tasks.
Abstract
With the rapid global spread of COVID-19, more and more data related to this virus is becoming available, including genomic sequence data. The total number of genomic sequences that are publicly available on platforms such as GISAID is currently several million, and is increasing with every day. The availability of such \emph{Big Data} creates a new opportunity for researchers to study this virus in detail. This is particularly important with all of the dynamics of the COVID-19 variants which emerge and circulate. This rich data source will give us insights on the best ways to perform genomic surveillance for this and future pandemic threats, with the ultimate goal of mitigating or eliminating such threats. Analyzing and processing the several million genomic sequences is a challenging task. Although traditional methods for sequence classification are proven to be effective, they are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Fractal and DNA sequence analysis
