Virus2Vec: Viral Sequence Classification Using Machine Learning
Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil,, Pin-Yu Chen, Imdad Ullah Khan, Murray Patterson

TL;DR
Virus2Vec introduces a novel feature-vector representation for viral sequences that enables efficient host prediction using machine learning, bypassing sequence alignment and outperforming existing methods.
Contribution
It proposes Virus2Vec, a new method for converting viral sequences into numerical vectors for machine learning, improving prediction accuracy and computational efficiency.
Findings
Virus2Vec achieves higher accuracy than baseline methods.
It effectively predicts viral hosts from sequence data.
The method reduces computational costs by avoiding sequence alignment.
Abstract
Understanding the host-specificity of different families of viruses sheds light on the origin of, e.g., SARS-CoV-2, rabies, and other such zoonotic pathogens in humans. It enables epidemiologists, medical professionals, and policymakers to curb existing epidemics and prevent future ones promptly. In the family Coronaviridae (of which SARS-CoV-2 is a member), it is well-known that the spike protein is the point of contact between the virus and the host cell membrane. On the other hand, the two traditional mammalian orders, Carnivora (carnivores) and Chiroptera (bats) are recognized to be responsible for maintaining and spreading the Rabies Lyssavirus (RABV). We propose Virus2Vec, a feature-vector representation for viral (nucleotide or amino acid) sequences that enable vector-space-based machine learning models to identify viral hosts. Virus2Vec generates numerical feature vectors for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRabies epidemiology and control · Virology and Viral Diseases · Anomaly Detection Techniques and Applications
