Multiseed Lossless Filtration
Gregory Kucherov (LIFL, INRIA Lille - Nord Europe), Laurent No\'e, (LIFL, INRIA Lille - Nord Europe), Mikhail A. Roytberg (IMPB)

TL;DR
This paper introduces a multiseed lossless filtration method for approximate string matching, leveraging multiple spaced seeds to improve bioinformatics applications, with algorithms for seed family parameters and large-scale oligonucleotide selection.
Contribution
It presents novel algorithms and techniques for constructing efficient seed families using multiple spaced seeds, enhancing bioinformatics sequence matching.
Findings
Effective seed family parameters computed
Improved seed family construction techniques
Successful large-scale oligonucleotide selection application
Abstract
We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt and K\"arkk\"ainen [1]. We present algorithms to compute several important parameters of seed families, study their combinatorial properties, and describe several techniques to construct efficient families. We also report a large-scale application of the proposed technique to the problem of oligonucleotide selection for an EST sequence database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
