Robust Representation and Efficient Feature Selection Allows for   Effective Clustering of SARS-CoV-2 Variants

Zahra Tayebi; Sarwan Ali; Murray Patterson

arXiv:2110.09622·cs.LG·October 20, 2021

Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants

Zahra Tayebi, Sarwan Ali, Murray Patterson

PDF

1 Repo

TL;DR

This paper introduces a robust feature selection method combined with efficient representation techniques to improve clustering accuracy of SARS-CoV-2 spike protein sequences, aiding in variant analysis and pandemic response.

Contribution

The study presents a novel feature selection approach that enhances clustering of SARS-CoV-2 spike sequences, improving variant differentiation accuracy.

Findings

01

Higher F1 scores achieved with proposed feature selection.

02

Effective clustering of spike sequences using k-mers and feature selection.

03

Improved differentiation of SARS-CoV-2 variants.

Abstract

The widespread availability of large amounts of genomic data on the SARS-CoV-2 virus, as a result of the COVID-19 pandemic, has created an opportunity for researchers to analyze the disease at a level of detail unlike any virus before it. One one had, this will help biologists, policy makers and other authorities to make timely and appropriate decisions to control the spread of the coronavirus. On the other hand, such studies will help to more effectively deal with any possible future pandemic. Since the SARS-CoV-2 virus contains different variants, each of them having different mutations, performing any analysis on such data becomes a difficult task. It is well known that much of the variation in the SARS-CoV-2 genome happens disproportionately in the spike region of the genome sequence -- the relatively short region which codes for the spike protein(s). Hence, in this paper, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

omadson/fuzzy-c-means
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFeature Selection