Sublinear Time Motif Discovery from Multiple Sequences
Bin Fu, Yunhui Fu

TL;DR
This paper introduces three algorithms for motif discovery in multiple sequences under a probabilistic model, achieving high accuracy with sublinear time complexity and demonstrating improved performance over existing software.
Contribution
The paper presents novel sublinear time algorithms for motif discovery that balance computational efficiency and mutation probability, with practical software implementation.
Findings
Algorithms successfully find motifs with high probability
Improved motif detection performance over existing software
Demonstrated efficiency and accuracy in probabilistic model
Abstract
A natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are background sequences, and each character in a background sequence is a random character from an alphabet . A motif is a string of characters. Each background sequence is implanted a probabilistically generated approximate copy of . For a probabilistically generated approximate copy of , every character is probabilistically generated such that the probability for is at most . We develop three algorithms that under the probabilistic model can find the implanted motif with high probability via a tradeoff between computational time and the probability of mutation. The methods developed in this paper have been used in the software implementation. We observed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Chromatin Dynamics · DNA and Biological Computing
