Sublinear Time Motif Discovery from Multiple Sequences

Bin Fu; Yunhui Fu

arXiv:1007.2618·cs.DS·March 14, 2012

Sublinear Time Motif Discovery from Multiple Sequences

Bin Fu, Yunhui Fu

PDF

Open Access

TL;DR

This paper introduces three algorithms for motif discovery in multiple sequences under a probabilistic model, achieving high accuracy with sublinear time complexity and demonstrating improved performance over existing software.

Contribution

The paper presents novel sublinear time algorithms for motif discovery that balance computational efficiency and mutation probability, with practical software implementation.

Findings

01

Algorithms successfully find motifs with high probability

02

Improved motif detection performance over existing software

03

Demonstrated efficiency and accuracy in probabilistic model

Abstract

A natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are $k$ background sequences, and each character in a background sequence is a random character from an alphabet $Σ$ . A motif $G = g_{1} g_{2} ... g_{m}$ is a string of $m$ characters. Each background sequence is implanted a probabilistically generated approximate copy of $G$ . For a probabilistically generated approximate copy $b_{1} b_{2} ... b_{m}$ of $G$ , every character $b_{i}$ is probabilistically generated such that the probability for $b_{i} \neq = g_{i}$ is at most $α$ . We develop three algorithms that under the probabilistic model can find the implanted motif with high probability via a tradeoff between computational time and the probability of mutation. The methods developed in this paper have been used in the software implementation. We observed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Genomics and Chromatin Dynamics · DNA and Biological Computing