Making it to First: The Random Access Problem in DNA Storage

Avital Boruchovsky; Ohad Elishco; Ryan Gabrys; Anina Gruica; Itzhak Tamo; and Eitan Yaakobi

arXiv:2501.12274·cs.IT·August 26, 2025

Making it to First: The Random Access Problem in DNA Storage

Avital Boruchovsky, Ohad Elishco, Ryan Gabrys, Anina Gruica, Itzhak Tamo, and Eitan Yaakobi

PDF

Open Access

TL;DR

This paper investigates the random access problem in DNA storage, providing optimal code designs for small cases and generalized constructions that outperform previous methods for larger data sets.

Contribution

It fully solves the case for two strands and introduces a generalized construction for three or more strands that improves random access efficiency.

Findings

01

Optimal code for k=2 achieves expectation of approximately 0.914 times the number of strands.

02

Generalized construction for k≥4 outperforms previous methods in reducing expected reads.

03

Construction uses B_{k-1} sequences over large finite fields, ensuring existence and efficiency.

Abstract

In this paper, we study the Random Access Problem in DNA storage, which addresses the challenge of retrieving a specific information strand from a DNA-based storage system. In this framework, the data is represented by $k$ information strands which represent the data and are encoded into $n$ strands using a linear code. Then, each sequencing read returns one encoded strand which is chosen uniformly at random. The goal under this paradigm is to design codes that minimize the expected number of reads required to recover an arbitrary information strand. We fully solve the case when $k = 2$ , showing that the best possible code attains a random access expectation of $1 + \frac{2}{2 + 1} \approx 0.914 \cdot 2$ for $q$ large enough. Moreover, we generalize a construction from~\cite{GMZ24}, specifically to $k = 3$ , for any value of $k$ . Our construction uses $B_{k - 1}$ sequences over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing