# Coding over Sets for DNA Storage

**Authors:** Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

arXiv: 1812.02936 · 2020-02-13

## TL;DR

This paper develops error-correcting codes for DNA data storage modeled as unordered sets of sequences, providing bounds and explicit constructions that are near optimal for correcting sequence loss and point errors.

## Contribution

It introduces new bounds and explicit code constructions for DNA storage, addressing both sequence loss and point errors in an unordered set model.

## Key findings

- Derived Gilbert-Varshamov lower bounds
- Established sphere packing upper bounds
- Proposed explicit, efficient codes close to bounds

## Abstract

In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of $M$ sequences, each of length $L$. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.02936/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1812.02936/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1812.02936/full.md

---
Source: https://tomesphere.com/paper/1812.02936