TL;DR
The paper introduces OSS, a dynamic programming algorithm that optimizes seed selection in read mapping by minimizing seed frequency, resulting in faster and more sensitive mapping.
Contribution
It presents the first efficient algorithm to jointly optimize seed length and placement, reducing seed frequency significantly compared to existing methods.
Findings
OSS achieves a 3-fold reduction in seed frequency.
OSS operates in polynomial time with respect to read length and seed count.
Compared to state-of-the-art, OSS improves seed selection quality.
Abstract
Motivation: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the potential of the mapper to select less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds. Results: We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-bp read in operations on average and in …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
