On Coding for an Abstracted Nanopore Channel for DNA Storage

Reyna Hulett; Shubham Chandak; Mary Wootters

arXiv:2102.01839·cs.IT·February 4, 2021

On Coding for an Abstracted Nanopore Channel for DNA Storage

Reyna Hulett, Shubham Chandak, Mary Wootters

PDF

TL;DR

This paper investigates the theoretical capacity of an abstracted nanopore channel model for DNA storage, proposing new coding schemes and algorithms to improve data encoding and decoding efficiency.

Contribution

It introduces a highly abstracted deterministic model of nanopore sequencing, providing new theoretical insights and practical coding solutions for DNA data storage.

Findings

01

Derived capacity bounds for the abstracted nanopore model

02

Developed efficient coding schemes for DNA storage

03

Proposed algorithms for encoding and decoding

Abstract

In the emerging field of DNA storage, data is encoded as DNA sequences and stored. The data is read out again by sequencing the stored DNA. Nanopore sequencing is a new sequencing technology that has many advantages over other methods; in particular, it is cheap, portable, and can support longer reads. While several practical coding schemes have been developed for DNA storage with nanopore sequencing, the theory is not well understood. Towards that end, we study a highly abstracted (deterministic) version of the nanopore sequencer, which highlights key features that make its analysis difficult. We develop methods and theory to understand the capacity of our abstracted model, and we propose efficient coding schemes and algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.