
TL;DR
This paper introduces a modular coding strategy for DNA data storage using finite-state transducers, enabling error correction, sequence optimization, and reliable decoding in DNA sequencing technologies.
Contribution
It presents a novel modular approach combining finite-state transducers for efficient, error-resilient DNA data encoding and decoding, with software implementation and simulation results.
Findings
DNA sequences generated are free of short repeats
Codes can correct substitutions, duplications, and deletions
Decoding remains accurate despite sequencing errors
Abstract
We describe a strategy for constructing codes for DNA-based information storage by serial composition of weighted finite-state transducers. The resulting state machines can integrate correction of substitution errors; synchronization by interleaving watermark and periodic marker signals; conversion from binary to ternary, quaternary or mixed-radix sequences via an efficient block code; encoding into a DNA sequence that avoids homopolymer, dinucleotide, or trinucleotide runs and other short local repeats; and detection/correction of errors (including local duplications, burst deletions, and substitutions) that are characteristic of DNA sequencing technologies. We present software implementing these codes, available at github.com/ihh/dnastore, with simulation results demonstrating that the generated DNA is free of short repeats and can be accurately decoded even in the presence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
