DNA-MGC+: A versatile codec for reliable and resource-efficient data storage on synthetic DNA
Ramy Khabbaz, J\'er\'emy Mateos, Marc Antonini, Serge Kas Hanna

TL;DR
DNA-MGC+ is a new DNA storage codec that significantly improves reliability and efficiency across various sequencing methods and error conditions, enabling practical and cost-effective DNA data storage.
Contribution
It introduces DNA-MGC+, a versatile codec that enhances error correction and resource efficiency in DNA data storage under diverse conditions.
Findings
Reliable decoding at up to 24% IDS error rate.
Successful retrieval at sequencing depths below 3x.
Reduced read costs to below 3.5 bits/nt.
Abstract
The biochemical processes underlying DNA data storage, including synthesis, amplification, and sequencing, are inherently noisy. Consequently, base-level insertion, deletion, and substitution (IDS) errors, as well as sequence-level dropouts, occur and pose major challenges for reliable data retrieval. Here we introduce DNA-MGC+, a DNA storage codec designed to enable reliable and resource-efficient data retrieval under diverse operating conditions. We evaluate DNA-MGC+ across a wide range of in silico and in vitro settings, including experiments with both Illumina and Nanopore sequencing, and show that it consistently outperforms existing codecs. In particular, DNA-MGC+ achieves simultaneous gains in sequencing depth requirements, read cost, decoding time, storage density, and error-correction capability under explicit reliability constraints. Notable results include reliable decoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Genomics and Phylogenetic Studies · Advanced biosensing and bioanalysis techniques
