Information-Theoretic Foundations of DNA Data Storage
Ilan Shomorony, Reinhard Heckel

TL;DR
This paper explores the fundamental information-theoretic limits of DNA data storage systems, modeling the unique challenges of short, unordered, noisy DNA molecules to understand maximum reliable storage capacity.
Contribution
It introduces a probabilistic channel model capturing key aspects of DNA storage and analyzes the theoretical capacity limits, providing foundational insights beyond coding strategies.
Findings
Proposed a model for unordered, noisy DNA storage channels
Derived capacity bounds considering technological constraints
Provided theoretical tools for analyzing DNA storage limits
Abstract
Due to its longevity and enormous information density, DNA is an attractive medium for archival data storage. Thanks to rapid technological advances, DNA storage is becoming practically feasible, as demonstrated by a number of experimental storage systems, making it a promising solution for our society's increasing need of data storage. While in living things, DNA molecules can consist of millions of nucleotides, due to technological constraints, in practice, data is stored on many short DNA molecules, which are preserved in a DNA pool and cannot be spatially ordered. Moreover, imperfections in sequencing, synthesis, and handling, as well as DNA decay during storage, introduce random noise into the system, making the task of reliably storing and retrieving information in DNA challenging. This unique setup raises a natural information-theoretic question: how much information can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
