RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes
Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh,, Meryem Banu Cavlak, Haiyu Mao, Onur Mutlu

TL;DR
RawHash is a novel method that enables fast, accurate, and scalable real-time analysis of nanopore raw signals for large genomes using hash-based similarity search, improving throughput and accuracy over existing tools.
Contribution
RawHash introduces a hash-based similarity search approach that ensures consistent hashing of raw signals despite variations, enabling real-time analysis of large genomes.
Findings
RawHash achieves 25.8x higher throughput than UNCALLED.
RawHash provides significantly better accuracy for large genomes.
RawHash successfully performs real-time read mapping, abundance estimation, and contamination analysis.
Abstract
Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either 1) require powerful computational resources that may not be available for portable sequencers or 2) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
