Efficient Approximation Algorithms for String Kernel Based Sequence Classification
Muhammad Farhan, Juvaria Tariq, Arif Zaman, Mudassir Shabbir, Imdad, Ullah Khan

TL;DR
This paper introduces efficient approximation algorithms for string kernel-based sequence classification, enabling the use of larger parameters and improving accuracy across various data types, with theoretical guarantees and empirical validation.
Contribution
The authors develop novel algorithms that accurately estimate sequence similarity scores, overcoming computational limitations and solving an open combinatorial problem for scalable sequence classification.
Findings
Achieves high-quality approximation with theoretical bounds.
Enables use of larger k and m parameters for better accuracy.
Demonstrates effectiveness on biological and music datasets.
Abstract
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between -mers (-length subsequences) in the two sequences. Extending this definition, by considering two -mers to match if their distance is at most , yields better classification performance. This, however, makes the problem computationally much more complex. Known algorithms to compute this similarity have computational complexity that render them applicable only for small values of and . In this work, we develop novel techniques to efficiently and accurately estimate the pairwise similarity score, which enables us to use much larger values of and , and get higher predictive accuracy. This opens up a broad avenue of applying this classification approach to audio, images, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Advanced Image and Video Retrieval Techniques
