Efficient Approximate Kernel Based Spike Sequence Classification
Sarwan Ali, Bikram Sahoo, Muhammad Asad Khan, Alexander Zelikovsky,, Imdad Ullah Khan, Murray Patterson

TL;DR
This paper introduces improved approximate kernel methods for classifying coronavirus spike protein sequences, leveraging domain knowledge and efficient preprocessing to enhance predictive accuracy over existing approaches.
Contribution
It proposes novel enhancements to approximate kernel algorithms using minimizers and information gain tailored for coronavirus sequence classification.
Findings
Improved classification accuracy on coronavirus datasets
Enhanced kernel performance over baseline and state-of-the-art methods
Effective use of domain knowledge and preprocessing techniques
Abstract
Machine learning (ML) models, such as SVM, for tasks like classification and clustering of sequences, require a definition of distance/similarity between pairs of sequences. Several methods have been proposed to compute the similarity between sequences, such as the exact approach that counts the number of matches between -mers (sub-sequences of length ) and an approximate approach that estimates pairwise similarity scores. Although exact methods yield better classification performance, they pose high computational costs, limiting their applicability to a small number of sequences. The approximate algorithms are proven to be more scalable and perform comparably to (sometimes better than) the exact methods -- they are designed in a "general" way to deal with different types of sequences (e.g., music, protein, etc.). Although general applicability is a desired property of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · COVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications
MethodsSupport Vector Machine
