Optimal Codes Correcting a Single Indel / Edit for DNA-Based Data Storage
Kui Cai, Yeow Meng Chee, Ryan Gabrys, Han Mao Kiah, and Tuan Thanh, Nguyen

TL;DR
This paper develops efficient coding schemes for correcting single indels or edits in DNA-based data storage, achieving near-optimal redundancy and introducing the first GC-balanced codes capable of correcting such errors.
Contribution
It introduces the first order-optimal linear-time encoders for single indel and edit correction over quaternary alphabets, including GC-balanced codes for DNA storage.
Findings
Linear-time encoders correct a single indel or edit with minimal redundancy.
New GC-balanced codes can correct a single indel or edit in DNA sequences.
Redundancy is reduced compared to previous codes, improving efficiency.
Abstract
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with log n + O(log log n) redundancy bits, while the other corrects a single indel with log n + 2 redundant bits. These two encoders are order-optimal. The former encoder is the first known order-optimal encoder that corrects a single edit, while the latter encoder (that corrects a single indel) reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, we impose an additional constraint: the GC-balanced constraint and require that exactly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Modular Robots and Swarm Intelligence
