TL;DR
This paper introduces IBB, an efficient external algorithm for constructing the BWT of diverse-length DNA data, significantly improving speed while maintaining low memory usage, benefiting bioinformatics applications.
Contribution
The paper presents IBB, a novel external algorithm that efficiently constructs the BWT for length-diverse DNA datasets using a right-aligned approach and tree-based data structures.
Findings
IBB is 10% to 40% faster than existing algorithms on most datasets.
IBB maintains competitive memory consumption.
IBB effectively handles highly diverse sequence lengths.
Abstract
The Burrows-Wheeler transform (BWT) is integral to the FM-index, which is used extensively in text compression, indexing, pattern search, and bioinformatic problems as de novo assembly and read alignment. Thus, efficient construction of the BWT in terms of time and memory usage is key to these applications. We present a novel external algorithm called Improved-Bucket Burrows-Wheeler transform (IBB) for constructing the BWT of DNA datasets with highly diverse sequence lengths. IBB uses a right-aligned approach to efficiently handle sequences of varying lengths, a tree-based data structure to manage relative insert positions and ranks, and fine buckets to reduce the necessary amount of input and output to external memory. Our experiments demonstrate that IBB is 10% to 40% faster than the best existing state-of-the-art BWT construction algorithms on most datasets while maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
