A unifying framework for seed sensitivity and its application to subset   seeds

Gregory Kucherov (LIFL); Laurent No\'e (LIFL); Mihkail Roytberg (LIFL)

arXiv:cs/0601116·cs.DS·January 19, 2010

A unifying framework for seed sensitivity and its application to subset seeds

Gregory Kucherov (LIFL), Laurent No\'e (LIFL), Mihkail Roytberg (LIFL)

PDF

TL;DR

This paper introduces a unified automaton-based framework for computing seed sensitivity, enabling the design of more effective subset seeds for similarity search, outperforming traditional spaced seeds.

Contribution

It presents a general automaton-based method for seed sensitivity computation and introduces a novel subset seed concept with efficient automaton construction.

Findings

01

Sensitive subset seeds outperform spaced seeds in similarity search

02

Efficient automaton construction enables practical seed design

03

The framework is adaptable to various seed definitions

Abstract

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem -- a set of target alignments, an associated probability distribution, and a seed model -- that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.