Motif matching using gapped patterns
Emanuele Giaquinta, Kimmo Fredriksson, Szymon Grabowski, Alexandru I., Tomescu, Esko Ukkonen

TL;DR
This paper introduces new practical algorithms for matching gapped patterns in DNA sequences, improving efficiency and applicability in identifying transcription factor binding sites.
Contribution
The paper presents novel dynamic programming and bit-parallel algorithms for gapped pattern matching that balance theoretical efficiency and practical performance.
Findings
Algorithms are close to optimal in time complexity.
Experimental results show high practical speed.
Algorithms are especially effective for unit-length pattern strings.
Abstract
We present new algorithms for the problem of multiple string matching of gapped patterns, where a gapped pattern is a sequence of strings such that there is a gap of fixed length between each two consecutive strings. The problem has applications in the discovery of transcription factor binding sites in DNA sequences when using generalized versions of the Position Weight Matrix model to describe transcription factor specificities. In these models a motif can be matched as a set of gapped patterns with unit-length keywords. The existing algorithms for matching a set of gapped patterns are worst-case efficient but not practical, or vice versa, in this particular case. The novel algorithms that we present are based on dynamic programming and bit-parallelism, and lie in a middle-ground among the existing algorithms. In fact, their time complexity is close to the best existing bound and, yet,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
