Error-Correcting Codes for Labeled DNA Sequences
Dganit Hanania, Eitan Yaakobi

TL;DR
This paper develops error-correcting codes for labeled DNA sequences, enabling accurate DNA visualization and analysis by correcting substitution, insertion, and deletion errors, with explicit encoders and bounds for different labeling schemes.
Contribution
It introduces new error-correcting codes tailored for labeled DNA sequences, including systematic encoders and bounds for specific label sets, advancing DNA data encoding techniques.
Findings
Established bounds for error correction in labeled DNA sequences.
Constructed explicit systematic encoders for single errors.
Analyzed two labeling schemes for DNA sequence recovery.
Abstract
Labeling of DNA molecules is a fundamental technique for DNA visualization and analysis. This process was mathematically modeled in [1], where the received sequence indicates the positions of the used labels. In this work, we develop error correcting codes for labeled DNA sequences, establishing bounds and constructing explicit systematic encoders for single substitution, insertion, and deletion errors. We focus on two cases: (1) using the complete set of length-two labels and (2) using the minimal set of length-two labels that ensures the recovery of DNA sequences from their labeling for 'almost' all DNA sequences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Fractal and DNA sequence analysis · Advanced biosensing and bioanalysis techniques
