TL;DR
MetaCache-GPU is a GPU-accelerated tool that enables ultra-fast construction of large genomic reference databases for metagenomic classification, significantly reducing index building time and facilitating real-time analysis pipelines.
Contribution
It introduces a novel hash table variant with minhash fingerprinting and warp-aggregated insertion, optimized for CUDA accelerators, enabling rapid index construction for metagenomics.
Findings
Builds large reference databases in seconds
Outperforms CPU tools like Kraken2 in index construction time
Enables on-demand large-scale reference genome analysis
Abstract
The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their -mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growing amount of available reference genomes establishes the need for index construction and querying at interactive speeds. In this paper, we introduce MetaCache-GPU -- an ultra-fast metagenomic short…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
