Achievable Rates of Nanopore-based DNA Storage
Brendon McBain, Emanuele Viterbo

TL;DR
This paper analyzes the achievable data rates in nanopore-based DNA storage using a tractable channel model and demonstrates rates of up to 1.18 bits per base with real sequencing data, advancing understanding of DNA storage capacity.
Contribution
It introduces a new channel model (NNC-Scrappie) and a rate estimation method based on dynamic time-warping applicable to real nanopore sequencing datasets.
Findings
Achievable rates of 0.64-1.18 bits per base depending on channel quality.
Decoding with NNC-Scrappie yields high data rates on real datasets.
Rates are conservative, not accounting for calibration or multiple reads.
Abstract
This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNanopore and Nanochannel Transport Studies · DNA and Biological Computing · Genomics and Phylogenetic Studies
