MetaCache-GPU: Ultra-Fast Metagenomic Classification

Robin Kobus (1); Andr\'e M\"uller (1); Daniel J\"unger (1); Christian; Hundt (2); Bertil Schmidt (1) ((1) Johannes Gutenberg University Mainz,; Germany; (2) NVIDIA AI Technology Center Luxembourg)

arXiv:2106.08150·q-bio.GN·June 16, 2021

MetaCache-GPU: Ultra-Fast Metagenomic Classification

Robin Kobus (1), Andr\'e M\"uller (1), Daniel J\"unger (1), Christian, Hundt (2), Bertil Schmidt (1) ((1) Johannes Gutenberg University Mainz,, Germany, (2) NVIDIA AI Technology Center Luxembourg)

PDF

1 Repo

TL;DR

MetaCache-GPU is a GPU-accelerated tool that enables ultra-fast construction of large genomic reference databases for metagenomic classification, significantly reducing index building time and facilitating real-time analysis pipelines.

Contribution

It introduces a novel hash table variant with minhash fingerprinting and warp-aggregated insertion, optimized for CUDA accelerators, enabling rapid index construction for metagenomics.

Findings

01

Builds large reference databases in seconds

02

Outperforms CPU tools like Kraken2 in index construction time

03

Enables on-demand large-scale reference genome analysis

Abstract

The cost of DNA sequencing has dropped exponentially over the past decade, making genomic data accessible to a growing number of scientists. In bioinformatics, localization of short DNA sequences (reads) within large genomic sequences is commonly facilitated by constructing index data structures which allow for efficient querying of substrings. Recent metagenomic classification pipelines annotate reads with taxonomic labels by analyzing their $k$ -mer histograms with respect to a reference genome database. CPU-based index construction is often performed in a preprocessing phase due to the relatively high cost of building irregular data structures such as hash maps. However, the rapidly growing amount of available reference genomes establishes the need for index construction and querying at interactive speeds. In this paper, we introduce MetaCache-GPU -- an ultra-fast metagenomic short…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

muellan/metacache
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.