A Combinatorial Perspective on Random Access Efficiency for DNA Storage
Anina Gruica, Daniella Bar-Lev, Alberto Ravagnani, Eitan Yaakobi

TL;DR
This paper analyzes the fundamental limits of random access in DNA storage, introducing combinatorial techniques to optimize generator matrices for minimal expected reads needed to retrieve specific data strands.
Contribution
It develops new formulas and structural insights for generator matrices, including recovery balanced codes, to improve random access efficiency in DNA data storage.
Findings
Derived formulas for maximum expected reads based on matrix structure
Identified conditions for recovery balanced codes
Optimized matrix designs for improved random access performance
Abstract
We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of information strands, which are encoded into strands via a generator matrix . During the sequencing process, the strands are read uniformly at random, as each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) required to decode a specific information strand requested by the user. This problem heavily depends on the generator matrix , and besides computing the expectation for different choices of , the goal is to construct matrices that minimize the maximum expectation over all possible requested information strands, denoted by . In this paper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Algorithms and Data Compression
