A k-mer Based Approach for SARS-CoV-2 Variant Identification

Sarwan Ali; Bikram Sahoo; Naimat Ullah; Alexander Zelikovskiy; Murray; Patterson; Imdadullah Khan

arXiv:2108.03465·q-bio.QM·October 13, 2021

A k-mer Based Approach for SARS-CoV-2 Variant Identification

Sarwan Ali, Bikram Sahoo, Naimat Ullah, Alexander Zelikovskiy, Murray, Patterson, Imdadullah Khan

PDF

TL;DR

This paper introduces a k-mer based method for identifying SARS-CoV-2 variants using spike protein sequences, demonstrating improved accuracy with limited data and highlighting key amino acids relevant to variant classification.

Contribution

It presents a novel k-mer based approach that leverages amino acid order and minimal training data to classify SARS-CoV-2 variants effectively.

Findings

01

Outperforms baseline algorithms with only 1% training data

02

Preserving amino acid order improves classification accuracy

03

Identifies key amino acids aligned with CDC reports

Abstract

With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2. Identifying particular variants helps to understand and model their spread patterns, design effective mitigation strategies, and prevent future outbreaks. It also plays a crucial role in studying the efficacy of known vaccines against each variant and modeling the likelihood of breakthrough infections. It is well known that the spike protein contains most of the information/variation pertaining to coronavirus variants. In this paper, we use spike sequences to classify different variants of the coronavirus in humans. We show that preserving the order of the amino acids helps the underlying classifiers to achieve better performance. We also show that we can train our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.