Constrained Coding for Composite DNA: Channel Capacity and Efficient Constructions
Tuan Thanh Nguyen, Chen Wang, Kui Cai, Yiwei Zhang, and Zohar Yakhini

TL;DR
This paper develops constrained coding techniques for composite DNA data storage, enhancing capacity and reliability by enforcing biological constraints and designing efficient encoding schemes.
Contribution
It introduces capacity analysis and construction of capacity-approaching codes for composite DNA, addressing sequencing errors and biological constraints.
Findings
Capacity of constrained composite DNA channel computed
Efficient encoders/decoders designed with minimal redundancy
Achieved near-capacity coding with only one redundant symbol for some parameters
Abstract
Composite DNA is a recent novel method to increase the information capacity of DNA-based data storage above the theoretical limit of 2 bits/symbol. In this method, every composite symbol does not store a single DNA nucleotide but a mixture of the four nucleotides in a predetermined ratio. By using different mixtures and ratios, the alphabet can be extended to have much more than four symbols in the naive approach. While this method enables higher data content per synthesis cycle, potentially reducing the DNA synthesis cost, it also imposes significant challenges for accurate DNA sequencing since the base-level errors can easily change the mixture of bases and their ratio, resulting in changes to the composite symbols. With this motivation, we propose efficient constrained coding techniques to enforce the biological constraints, including the runlength-limited constraint and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · DNA and Nucleic Acid Chemistry · Gene expression and cancer classification
