MADMX: A Novel Strategy for Maximal Dense Motif Extraction
Roberto Grossi, Andrea Pietracaprina, Nadia Pisanti, Geppino Pucci,, Eli Upfal, Fabio Vandin

TL;DR
MADMX is a new tool for extracting maximal dense motifs from biological sequences, using a novel density measure and fusion operation to improve efficiency and motif quality.
Contribution
It introduces the concept of density for motif bounding and a fusion operation for efficient maximal dense motif extraction.
Findings
MADMX reduces output size and improves performance.
MADMX enhances the quality of motif discoveries.
Experimental results confirm efficiency and effectiveness.
Abstract
We develop, analyze and experiment with a new tool, called MADMX, which extracts frequent motifs, possibly including don't care characters, from biological sequences. We introduce density, a simple and flexible measure for bounding the number of don't cares in a motif, defined as the ratio of solid (i.e., different from don't care) characters to the total length of the motif. By extracting only maximal dense motifs, MADMX reduces the output size and improves performance, while enhancing the quality of the discoveries. The efficiency of our approach relies on a newly defined combining operation, dubbed fusion, which allows for the construction of maximal dense motifs in a bottom-up fashion, while avoiding the generation of nonmaximal ones. We provide experimental evidence of the efficiency and the quality of the motifs returned by MADMX
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Genomics and Chromatin Dynamics
