A Coverage Criterion for Spaced Seeds and its Applications to Support Vector Machine String Kernels and k-Mer Distances
Laurent No\'e (LIFL, INRIA Lille - Nord Europe), Donald E. K. Martin

TL;DR
This paper introduces a coverage criterion for spaced seeds, demonstrating its effectiveness in improving alignment detection, phylogenetic distance measurement, and SVM classification accuracy through automaton-based measurement and comparison with other criteria.
Contribution
It proposes a novel coverage criterion for spaced seeds, with methods for direct measurement and application to alignment-free distances, enhancing seed pattern design.
Findings
Coverage criterion correlates well with classification accuracy
Automaton-based approach enables efficient measurement of seed efficiency
Coverage extension improves alignment-free distance estimation
Abstract
Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances (Boden et al., 2013, Horwege et al., 2014, Leimeister et al., 2014), and to provide a lower misclassification rate when used with Support Vector Machines (SVMs) (On-odera and Shibuya, 2013), We confirm by independent experiments these two results, and propose in this article to use a coverage criterion (Benson and Mak, 2008, Martin, 2013, Martin and No{\'e}, 2014), to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
