Codes for DNA Sequence Profiles
Han Mao Kiah, Gregory J. Puleo, Olgica Milenkovic

TL;DR
This paper explores the design of DNA sequences for reliable storage and retrieval, introducing new coding techniques and mathematical tools to address noise in synthesis and sequencing processes.
Contribution
It connects DNA sequence reconstruction with synthesis and sequencing, proposing asymmetric coding methods and analyzing sequence equivalence classes using advanced mathematical frameworks.
Findings
Proposes new asymmetric coding techniques for DNA storage.
Analyzes sequence equivalence classes under noisy channels.
Utilizes de Bruijn graphs and Ehrhart theory for analysis.
Abstract
We consider the problem of storing and retrieving information from synthetic DNA media. The mathematical basis of the problem is the construction and design of sequences that may be discriminated based on their collection of substrings observed through a noisy channel. This problem of reconstructing sequences from traces was first investigated in the noiseless setting under the name of "Markov type" analysis. Here, we explain the connection between the reconstruction problem and the problem of DNA synthesis and sequencing, and introduce the notion of a DNA storage channel. We analyze the number of sequence equivalence classes under the channel mapping and propose new asymmetric coding techniques to combat the effects of synthesis and sequencing noise. In our analysis, we make use of restricted de Bruijn graphs and Ehrhart theory for rational polytopes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression · Cellular Automata and Applications
