TetRex: a novel algorithm for index-accelerated search of highly conserved motifs
Remy M Schwab, Simon Gene Gottlieb, Knut Reinert

TL;DR
TetRex is a new algorithm that improves the speed of searching for conserved biological sequences using a novel data structure.
Contribution
The novel contribution is a regex search algorithm that leverages hierarchical interleaved bloom filters to reduce search runtime.
Findings
TetRex reduces the search space for regex queries in biological datasets.
The algorithm outperforms state-of-the-art tools when combined with traditional search methods.
Abstract
The scale of modern datasets has necessitated innovations to solve even the most traditional and fundamental of computational problems. Set membership and set cardinality are both examples of simple queries that, for large enough datasets, quickly become challenging using traditional approaches. Interestingly, we find a need for these innovations within the field of biology. Despite the vast differences among living organisms, there exist functions so critical to life that they are conserved in the genomes and proteomes across seemingly unrelated species. Regular expressions (regexes) can serve as a convenient way to represent these conserved sequences compactly. However, despite the strong theoretical foundation and maturity of tools available, the state-of-the-art regex search falls short of what is necessary to quickly scan an enormous database of biological sequences. In this work,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
