Error-correcting Codes for Noisy Duplication Channels
Yuanyuan Tang, Farzad Farnoud

TL;DR
This paper develops error-correcting codes for DNA data storage that can handle both exact and noisy tandem duplication errors, ensuring data integrity despite complex error patterns.
Contribution
It introduces novel codes capable of correcting multiple exact duplications and a single noisy duplication, based on duplication root recovery, with proven asymptotic optimality.
Findings
Codes correct multiple exact duplications.
Codes correct one noisy duplication with substitution noise.
Construction is asymptotically optimal.
Abstract
Because of its high data density and longevity, DNA is emerging as a promising candidate for satisfying increasing data storage needs. Compared to conventional storage media, however, data stored in DNA is subject to a wider range of errors resulting from various processes involved in the data storage pipeline. In this paper, we consider correcting duplication errors for both exact and noisy tandem duplications of a given length k. An exact duplication inserts a copy of a substring of length k of the sequence immediately after that substring, e.g., ACGT to ACGACGT, where k = 3, while a noisy duplication inserts a copy suffering from substitution noise, e.g., ACGT to ACGATGT. Specifically, we design codes that can correct any number of exact duplication and one noisy duplication errors, where in the noisy duplication case the copy is at Hamming distance 1 from the original. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Cellular Automata and Applications · Algorithms and Data Compression
