DNA storage approaching the information-theoretic ceiling
James L. Banal

TL;DR
This paper introduces an advanced error-correction coding scheme for DNA data storage that retains probabilistic sequencing information, significantly increasing storage density and approaching the theoretical Shannon limit.
Contribution
The authors develop an integrated decoding approach that preserves sequencing uncertainty, achieving higher data densities than previous methods in DNA storage.
Findings
Recovered 155.8 and 25.9 exabytes per gram under high- and low-fidelity conditions.
Exceeds prior art density by 11% and 52% on respective channels.
Projects 282 years of decodable storage at 17.1 exabytes per gram.
Abstract
Synthetic DNA approaches 227.5 exabytes per gram of storage density with stability over millennial timescales. Realising this capacity requires error-correction codes that recover data from substantial synthesis and sequencing errors. Existing codecs convert noisy sequencer output into discrete base calls before error correction, discarding probabilistic information about which positions are reliable. Here we present a coding scheme that retains the sequencer's per-position posterior distributions through an integrated decoder of profile hidden Markov model alignment, log-product fusion across reads, and ordered-statistics decoding. On the DT4DDS channel simulator, the codec recovers 155.8 and 25.9 exabytes per gram of dsDNA under high- and low-fidelity conditions, exceeding the highest prior-art density on each channel by 11 and 52 percent. Under a single-encode-then-degrade protocol…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
