Spectral Sequence Motif Discovery
Nicol\`o Colombo, Nikos Vlassis

TL;DR
This paper introduces a fast, spectral sequence motif discovery algorithm based on the Method of Moments, capable of processing large-scale biological datasets efficiently and accurately.
Contribution
It presents a novel spectral decomposition-based motif discovery method that is robust, computationally efficient, and suitable for large high-throughput sequencing data.
Findings
Processes hundreds of thousands of sequences in minutes
Matches motif profiles of state-of-the-art algorithms
Robust under model misspecification
Abstract
Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big datasets produced by new high-throughput sequencing technologies. Most existing algorithms are computationally demanding and often cannot support the large size of new experimental data. We present a new motif discovery algorithm that is built on a recent machine learning technique, referred to as Method of Moments. Based on spectral decompositions, this method is robust under model misspecification and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. In a few minutes, we can process datasets of hundreds of thousand sequences and extract motif profiles that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Genomics and Phylogenetic Studies · Gene expression and cancer classification
