DNA Storage in the Short Molecule Regime
Ran Tamir, Nir Weinberger, Albert Guill\'en i F\`abregas

TL;DR
This paper proves a conjecture about the maximum reliable information storage in short DNA molecules, matching theoretical bounds and proposing efficient coding schemes for DNA data storage.
Contribution
It completes the proof of a conjecture on DNA storage capacity in the short molecule regime and introduces coding schemes that achieve optimal scaling.
Findings
Achieves an achievability bound matching the converse bound.
Proposes a low-complexity coding scheme with near-optimal scaling.
Provides theoretical validation for DNA storage capacity limits.
Abstract
We study the amount of reliable information that can be stored in a DNA-based storage system composed of short DNA molecules. In this regime, Shomorony and Heckel (2022) put forward a conjecture on the scaling of the number of information bits that can be reliably stored. In this paper, we complete the proof of this conjecture. We analyze a random-coding scheme in which each codeword is obtained by quantizing a randomly generated probability mass function drawn from the probability simplex. By analyzing the optimal maximum-likelihood decoder, we derive an achievability bound that matches a recently established converse bound across the entire short-molecule regime. We also propose a second coding scheme, which operates with significantly lower computational complexity but achieves the optimal scaling, except for a specific range of very short molecules.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Diffusion and Search Dynamics · DNA and Nucleic Acid Chemistry
