Coding over Sets for DNA Storage

Andreas Lenz; Paul H. Siegel; Antonia Wachter-Zeh; Eitan Yaakobi

arXiv:1801.04882·cs.IT·May 10, 2018

Coding over Sets for DNA Storage

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, Eitan Yaakobi

PDF

TL;DR

This paper develops error-correcting codes for DNA data storage modeled as unordered sets of sequences, addressing sequence loss and internal errors, with constructions close to theoretical limits.

Contribution

It introduces new code constructions for DNA storage that correct sequence and internal errors, with bounds showing near-optimal performance.

Findings

01

Codes can correct sequence losses and internal errors.

02

Proposed codes are close to theoretical upper bounds.

03

Efficient encoding and decoding algorithms are developed.

Abstract

In this paper, we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where data is represented by an unordered set of $M$ sequences, each of length $L$ . Errors within that model are losses of whole sequences and point errors inside the sequences, such as substitutions, insertions and deletions. We propose code constructions which can correct these errors with efficient encoders and decoders. By deriving upper bounds on the cardinalities of these codes using sphere packing arguments, we show that many of our codes are close to optimal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.