Unique Reconstruction of Coded Strings from Multiset Substring Spectra
Ryan Gabrys, Olgica Milenkovic

TL;DR
This paper introduces a new coding method called repeat replacement for uniquely reconstructing binary strings from multiset substring spectra, with applications in DNA data storage, and provides algorithms and bounds for the scheme.
Contribution
It proposes the repeat replacement technique for constructing codes that enable unique string reconstruction from substring spectra, including in noisy environments.
Findings
Algorithmic solutions for repeat replacement
Constructive redundancy bounds for coding schemes
Extensions to noisy substring spectra scenarios
Abstract
The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. We also consider extensions of the problem to noisy settings in which substrings are compromised by burst and random errors. The study is motivated by applications in DNA-based data storage systems that use high throughput readout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
