Coding for Strand Breaks in Composite DNA
Frederik Walter, Yonatan Yehezkeally

TL;DR
This paper extends DNA storage error models to composite DNA, proposing marker codes to correct strand breaks and generalising RLL codes for improved data integrity in DNA-based storage systems.
Contribution
It introduces a novel coding scheme for composite DNA that corrects strand breaks and generalises RLL codes, advancing DNA storage reliability.
Findings
Extended strand-break channel model for composite DNA
Proposed marker codes for single strand break correction
Derived bounds on redundancy for generalized RLL codes
Abstract
Due to their sequential nature, traditional DNA synthesis methods are expensive in terms of time and resources. They also fabricate multiple copies of the same strand, introducing redundancy. This redundancy can be leveraged to enhance the information capacity of each synthesis cycle and DNA storage systems in general by employing composite DNA symbols. Unlike conventional DNA storage, composite DNA encodes information in the distribution of bases across a pool of strands rather than in the individual strands themselves. Consequently, error models for DNA storage must be adapted to account for this unique characteristic. One significant error model for long-term DNA storage is strand breaks, often caused by the decay of individual bases. This work extends the strand-break channel model to the composite DNA setting. To address this challenge, we propose a coding scheme that uses marker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · DNA and Nucleic Acid Chemistry · Genomics and Chromatin Dynamics
