Nucleotide String Indexing using Range Matching
Alon Rashelbach, Ori Rottensterich, Mark Silberstien

TL;DR
Ranger is a novel nucleotide sequence indexing method that uses range matching with neural networks, offering significant memory savings and speed improvements over traditional methods like FM-indices and hash-tables, with potential for hardware acceleration.
Contribution
We introduce Ranger, a memory-efficient, neural network-based range matching index for nucleotide sequences that matches or exceeds the performance of existing methods.
Findings
Ranger reduces memory usage by up to 1.7x for short reads.
Ranger achieves up to 4.3x speedup with limited memory.
Ranger enables faster alignment with hardware acceleration, reducing memory footprint significantly.
Abstract
The two most common data-structures for genome indexing, FM-indices and hash-tables, exhibit a fundamental trade-off between memory footprint and performance. We present Ranger, a new indexing technique for nucleotide sequences that is both memory efficient and fast. We observe that nucleotide sequences can be represented as integer ranges and leverage a range-matching algorithm based on neural networks to perform the lookup. We prototype Ranger in software and integrate it into the popular Minimap2 tool. Ranger achieves almost identical end-to-end performance as the original Minimap2, while occupying 1.7 and 1.2 less memory for short- and long-reads, respectively. With a limited memory capacity, Ranger achieves up to 4.3 speedup for short reads compared to FM-Index, and up to 4.2 and 1.8 speedups for short- and long-reads, compared to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
