iSeqSearch: incremental protein search for iBlast/iMMSeqs2/iDiamond
Hyunwoo Yoo, Mohammadsaleh Refahi, Robi Polikar, Bahrad A. Sokhansanj, James R. Brown, Gail L. Rosen

TL;DR
iSeqSearch improves protein sequence searches by efficiently reusing prior results, reducing resource use while maintaining accuracy.
Contribution
iSeqSearch generalizes incremental search to support MMseqs2 and Diamond, extending and improving upon iBlast.
Findings
iMMseqs2 and iDiamond show high concordance (over 0.9) with their non-incremental counterparts.
The incremental approach provides more hits in some cases compared to conventional methods.
iSeqSearch efficiently reuses prior data, reducing resource waste in growing genomic and proteomic databases.
Abstract
The advancement of sequencing technology has led to a rapid increase in the amount of DNA and protein sequence data; consequently, the size of genomic and proteomic databases is constantly growing. As a result, database searches need to be continually updated to account for the new data being added. However, continually re-searching the entire existing dataset wastes resources. Incremental database search can address this problem. One recently introduced incremental search method is iBlast, which wraps the BLAST sequence search method with an algorithm to reuse previously processed data and thereby increase search efficiency. The iBlast wrapper, however, must be generalized to support better performing DNA/protein sequence search methods that have been developed, namely MMseqs2 and Diamond. To address this need, we propose iSeqsSearch, which extends iBlast by incorporating support for…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Glycosylation and Glycoproteins Research
