Low-redundancy codes for correcting multiple short-duplication and edit errors
Yuanyuan Tang, Shuche Wang, Hao Lou, Ryan Gabrys, and Farzad Farnoud

TL;DR
This paper introduces error-correcting codes for DNA storage that can simultaneously correct short tandem duplications and multiple edit errors, achieving high data density with efficient encoding and decoding.
Contribution
It presents the first codes capable of correcting both short duplications and multiple edits in DNA data storage, with near-optimal redundancy and polynomial-time algorithms.
Findings
Codes correct up to p edits plus duplications with minimal redundancy.
Redundancy increases roughly by 8p(log_q n) symbols for correction.
Encoding and decoding are polynomial-time for constant p.
Abstract
Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current paper constructs error-correcting codes for simultaneously correcting short (tandem) duplications and at most edits, where a short duplication generates a copy of a substring with length and inserts the copy following the original substring, and an edit is a substitution, deletion, or insertion. Compared to the state-of-the-art codes for duplications only, the proposed codes correct up to edits (in addition to duplications) at the additional cost of roughly symbols of redundancy, thus achieving the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Advanced Data Storage Technologies
