Coding for Composite DNA to Correct Substitutions, Strand Losses, and Deletions

Frederik Walter; Omer Sabary; Antonia Wachter-Zeh; Eitan Yaakobi

arXiv:2404.12868·cs.IT·October 29, 2025

Coding for Composite DNA to Correct Substitutions, Strand Losses, and Deletions

Frederik Walter, Omer Sabary, Antonia Wachter-Zeh, Eitan Yaakobi

PDF

Open Access

TL;DR

This paper develops coding strategies for composite DNA data storage that effectively correct substitutions, strand losses, and deletions, providing theoretical bounds and explicit constructions to enhance data integrity.

Contribution

It introduces novel coding techniques tailored for composite DNA storage, including bounds and explicit constructions for correcting multiple error types.

Findings

01

Derived non-asymptotic upper bounds on code sizes for multiple error types

02

Presented explicit code constructions achieving these bounds

03

Enhanced reliability of DNA data storage systems

Abstract

Composite DNA is a recent method to increase the base alphabet size in DNA-based data storage.This paper models synthesizing and sequencing of composite DNA and introduces coding techniques to correct substitutions, losses of entire strands, and symbol deletion errors. Non-asymptotic upper bounds on the size of codes with $t$ occurrences of these error types are derived. Explicit constructions are presented which can achieve the bounds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing · DNA and Nucleic Acid Chemistry · RNA and protein synthesis mechanisms