Genoogle: an indexed and parallelized search engine for similar DNA sequences
Felipe Albrecht

TL;DR
Genoogle is a parallelized, indexed search engine that significantly accelerates DNA sequence searches, outperforming traditional tools like BLAST in speed while maintaining comparable result quality.
Contribution
This work introduces a novel combination of data indexing and parallel processing techniques for efficient DNA sequence search, leveraging multi-core processors.
Findings
Search time was 20 times faster than parallelized NCBI BLAST.
Parallelism exceeded expected speedup gains.
Search results maintained high quality compared to existing tools.
Abstract
The search for similar genetic sequences is one of the main bioinformatics tasks. The genetic sequences data banks are growing exponentially and the searching techniques that use linear time are not capable to do the search in the required time anymore. Another problem is that the clock speed of the modern processors are not growing as it did before, instead, the processing capacity is growing with the addiction of more processing cores and the techniques which does not use parallel computing does not have benefits from these extra cores. This work aims to use data indexing techniques to reduce the searching process computation cost united with the parallelization of the searching techniques to use the computational capacity of the multi core processors. To verify the viability of using these two techniques simultaneously, a software which uses parallelization techniques with inverted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
