Embracing Errors Is More Efficient Than Avoiding Them Through Constrained Coding for DNA Data Storage
Franziska Weindel, Andreas L. Gimpel, Robert N. Grass, Reinhard, Heckel

TL;DR
This paper compares constrained coding and error embracing strategies in DNA data storage, finding that embracing errors is generally more efficient given current error rates and constraints.
Contribution
It provides a theoretical and empirical analysis showing that constrained coding is inefficient for substitution errors in existing DNA storage systems.
Findings
Constrained coding increases redundancy without significant error reduction.
Embracing errors is more efficient than constrained coding under current error regimes.
Empirical data shows minimal error increase in homopolymers and unbalanced GC sequences.
Abstract
DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, which makes error-correcting coding techniques critical for reliable DNA data storage. To reduce the errors, a common technique is to include constraints that avoid homopolymers (consecutive repeated nucleotides) and balance the GC content, as sequences with homopolymers and unbalanced GC content are often associated with higher error rates. However, constrained coding comes at the cost of an increase in redundancy. An alternative is to control errors by randomizing the sequences, embracing errors, and paying for them with additional coding redundancy. In this paper, we determine the error regimes in which embracing substitutions is more efficient than constrained coding for DNA data storage. Our results suggest that constrained coding for substitution errors is inefficient for existing DNA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Algorithms and Data Compression
