DNA data storage, sequencing data-carrying DNA
Jasmine Quah, Omer Sella, Thomas Heinis

TL;DR
This paper explores how combining model compression with error correcting codes in DNA data storage can maintain high accuracy while enabling portable, efficient DNA sequencing read heads.
Contribution
It demonstrates that reducing deep learning model size and applying error correction codes can achieve high accuracy in DNA data storage, facilitating portable sequencing devices.
Findings
Model compression can be compensated by error correction codes.
Reduced model size does not significantly impact accuracy.
Joint use of compression and error correction improves read accuracy.
Abstract
DNA is a leading candidate as the next archival storage media due to its density, durability and sustainability. To read (and write) data DNA storage exploits technology that has been developed over decades to sequence naturally occurring DNA in the life sciences. To achieve higher accuracy for previously unseen, biological DNA, sequencing relies on extending and training deep machine learning models known as basecallers. This growth in model complexity requires substantial resources, both computational and data sets. It also eliminates the possibility of a compact read head for DNA as a storage medium. We argue that we need to depart from blindly using sequencing models from the life sciences for DNA data storage. The difference is striking: for life science applications we have no control over the DNA, however, in the case of DNA data storage, we control how it is written, as well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression · Advanced Data Storage Technologies
