Reconstruction from Substrings with Partial Overlap
Yonatan Yehezkeally, Daniella Bar-Lev, Sagi Marcovich, Eitan, Yaakobi

TL;DR
This paper develops a new class of reconstruction codes for DNA data storage, focusing on reading overlapping substrings, providing bounds and optimal constructions for reliable sequence reconstruction.
Contribution
It introduces a novel family of codes for reconstructing sequences from overlapping substrings, extending previous models and achieving asymptotic optimality.
Findings
Derived upper bounds on code rates for unique reconstruction
Constructed asymptotically optimal codes meeting these bounds
Enhanced understanding of reconstruction from overlapping substrings
Abstract
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the setup in which consecutive substrings are read with some given minimum overlap. First, upper bounds are provided on the attainable rates of codes that guarantee unique reconstruction. Then, we present efficient constructions of asymptotically optimal codes that meet the upper bound.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Advanced biosensing and bioanalysis techniques · Algorithms and Data Compression
