PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences
Sarwan Ali, Babatunde Bello, Prakash Chourasia, Ria Thazhe Punathil,, Yijing Zhou, Murray Patterson

TL;DR
This paper introduces PWM2Vec, a novel embedding method based on position-weight matrices, for classifying coronavirus hosts from spike protein sequences, achieving competitive results and identifying key amino acids involved in host specificity.
Contribution
The paper presents the first use of PWM-based embeddings for host classification from viral sequences, providing a new approach for analyzing coronavirus host specificity.
Findings
PWM2Vec performs comparably to baseline models.
Identifies important amino acids for host prediction.
Effective in classifying diverse coronavirus hosts.
Abstract
COVID-19 pandemic, is still unknown and is an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona-) viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating and preventing current and future pandemics. In coronaviruses, the surface (S) protein, or spike protein, is an important part of determining host specificity since it is the point of contact between the virus and the host cell membrane. In this paper, we classify the hosts of over five thousand coronaviruses from their spike protein sequences, segregating them into clusters of distinct hosts among avians, bats, camels, swines, humans and weasels, to name a few. We propose a feature embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Virus Infections Studies · Machine Learning in Bioinformatics · Identification and Quantification in Food
