A Characterization of the DNA Data Storage Channel
Reinhard Heckel, Gediminas Mikutis, Robert N. Grass

TL;DR
This paper provides a detailed characterization of the error types and molecule loss in DNA data storage, aiding the design of more reliable systems by analyzing experimental data from multiple sources.
Contribution
It offers a comprehensive analysis of error probabilities and molecule loss in DNA storage, combining experimental data to inform future system design.
Findings
Errors mainly due to synthesis and sequencing
Significant sequence loss from handling and storage
Quantitative error and loss models for DNA storage
Abstract
Owing to its longevity and enormous information density, DNA, the molecule encoding biological information, has emerged as a promising archival storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules that are stored in an unordered way, and can only be read by sampling from this DNA pool. Moreover, imperfections in writing (synthesis), reading (sequencing), storage, and handling of the DNA, in particular amplification via PCR, lead to a loss of DNA molecules and induce errors within the molecules. In order to design DNA storage systems, a qualitative and quantitative understanding of the errors and the loss of molecules is crucial. In this paper, we characterize those error probabilities by analyzing data from our own experiments as well as from experiments of two different groups. We find that errors within molecules are mainly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
