Representing Information on DNA using Patterns Induced by Enzymatic Labeling
Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi, Zohar Yakhini

TL;DR
This paper introduces a formal framework for representing information on DNA through enzymatic labeling patterns, advancing DNA data storage techniques with optimized encoding strategies.
Contribution
It presents a novel modeling approach for DNA labeling in data storage, including bounds on code size and an optimal encoder-decoder pair.
Findings
Upper bounds on code sizes for DNA labeling channels
Development of an optimal encoder-decoder pair
Analysis of fixed-length label constraints
Abstract
Enzymatic DNA labeling is a powerful tool with applications in biochemistry, molecular biology, biotechnology, medical science, and genomic research. This paper contributes to the evolving field of DNA-based data storage by presenting a formal framework for modeling DNA labeling in strings, specifically tailored for data storage purposes. Our approach involves a known DNA molecule as a template for labeling, employing patterns induced by a set of designed labels to represent information. One hypothetical implementation can use CRISPR-Cas9 and gRNA reagents for labeling. Various aspects of the general labeling channel, including fixed-length labels, are explored, and upper bounds on the maximal size of the corresponding codes are given. The study includes the development of an efficient encoder-decoder pair that is proven optimal in terms of maximum code size under specific conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Fractal and DNA sequence analysis · Genetics, Bioinformatics, and Biomedical Research
