Error-correcting Codes for Short Tandem Duplication and Substitution Errors
Yuanyuan Tang, Farzad Farnoud

TL;DR
This paper develops error-correcting codes for DNA data storage that can simultaneously correct short tandem duplication and substitution errors, addressing the unique challenges posed by DNA's error patterns.
Contribution
It introduces a novel coding scheme capable of correcting both duplication and substitution errors in DNA sequences, with proven bounds and minimal additional redundancy.
Findings
The code corrects an arbitrary number of duplication errors and one substitution error.
The additional redundancy cost for correcting substitutions is only 0.003 bits/symbol for DNA alphabet.
The approach effectively limits substitution effects to finite substrings, enabling efficient correction.
Abstract
Due to its high data density and longevity, DNA is considered a promising medium for satisfying ever-increasing data storage needs. However, the diversity of errors that occur in DNA sequences makes efficient error-correction a challenging task. This paper aims to address simultaneously correcting two types of errors, namely, short tandem duplication and substitution errors. We focus on tandem repeats of length at most 3 and design codes for correcting an arbitrary number of duplication errors and one substitution error. Because a substituted symbol can be duplicated many times (as part of substrings of various lengths), a single substitution can affect an unbounded substring of the retrieved word. However, we show that with appropriate preprocessing, the effect may be limited to a substring of finite length, thus making efficient error-correction possible. We construct a code for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression · Cellular Automata and Applications
