Map-Reduce Parallelization of Motif Discovery
Umang Vipul

TL;DR
This paper proposes a Map-Reduce based parallelization approach for motif discovery in DNA sequences using HOMER, aiming to improve scalability but facing challenges with motif quality.
Contribution
It introduces a novel parallelization methodology for HOMER using sub-sampling and Map-Reduce, enabling potential scalability improvements in motif discovery.
Findings
Marginal speed gains achieved with parallelization.
Significant quality loss in discovered motifs.
Method demonstrates potential for scalable motif discovery.
Abstract
Motif discovery is one of the most challenging problems in bioinformatics today. DNA sequence motifs are becoming increasingly important in analysis of gene regulation. Motifs are short, recurring patterns in DNA that have a biological function. For example, they indicate binding sites for Transcription Factors (TFs) and nucleases. There are a number of Motif Discovery algorithms that run sequentially. The sequential nature stops these algorithms from being parallelized. HOMER is one such Motif discovery tool, that we have decided to use to overcome this limitation. To overcome this limitation, we propose a new methodology for Motif Discovery, using HOMER, that parallelizes the task. Parallelized version can potentially yield better scalability and performance. To achieve this, we have decided to use sub-sampling and the Map Reduce model. At each Map node, a sub-sampled version of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Genomics and Phylogenetic Studies · Algorithms and Data Compression
