Reconstruction Codes for DNA Sequences with Uniform Tandem-Duplication Errors
Yonatan Yehezkeally, Moshe Schwartz

TL;DR
This paper introduces reconstruction codes tailored for DNA data storage, specifically addressing uniform tandem-duplication errors, and demonstrates their superior capacity over traditional error-correcting codes.
Contribution
It develops a novel coding scheme for DNA storage that leverages the structure of tandem-duplication errors, providing analytically determined capacity bounds.
Findings
Reconstruction codes outperform existing error-correcting codes in capacity.
The relation to constant-weight integer codes in the Manhattan metric is established.
Existence of codes with greater capacity is proven for various parameters.
Abstract
DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
