The Random Variables of the DNA Coverage Depth Problem
\c{S}eyma Bodur, Stefano Lia, Hiram H. L\'opez, Rati Ludhani, Alberto Ravagnani, Lisa Seccia

TL;DR
This paper analyzes the statistical properties of coverage depth in DNA data storage, providing new insights into code performance and optimizing code constructions for efficient data retrieval.
Contribution
It offers the first detailed distribution analysis of coverage depth variables and refines performance bounds for specific code constructions in DNA storage.
Findings
Asymptotic performance of a recent code construction is established.
A geometric code based on balanced quasi-arcs is optimized.
Distribution analysis distinguishes between similar code behaviors.
Abstract
DNA data storage systems encode digital data into DNA strands, enabling dense and durable storage. Efficient data retrieval depends on coverage depth, a key performance metric. We study the random access coverage depth problem and focus on minimizing the expected number of reads needed to recover information strands encoded via a linear code. We compute the asymptotic performance of a recently proposed code construction, establishing and refining a conjecture in the field by giving two independent proofs. We also analyze a geometric code construction based on balanced quasi-arcs and optimize its parameters. Finally, we investigate the full distribution of the random variables that arise in the coverage depth problem, of which the traditionally studied expectation is just the first moment. This allows us to distinguish between code constructions that, at first glance, may appear to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
