Reconstructing Mixtures of Coded Strings from Prefix and Suffix Compositions
Ryan Gabrys, Srilakshmi Pattabiraman, Olgica Milenkovic

TL;DR
This paper introduces coding methods for unique reconstruction of string mixtures from prefix and suffix composition data, with bounds on code rates, relevant for DNA and polymer data storage applications.
Contribution
It proposes new coding techniques enabling unique joint reconstruction of string subsets from prefix-suffix compositions, with proven bounds on code rates.
Findings
Maximum code rate for reconstructability is 1/h.
Matching upper and lower bounds on codebook asymptotic rate.
Reconstruction is feasible under mild parameter constraints.
Abstract
The problem of string reconstruction from substring information has found many applications due to its relevance in DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry readouts. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide matching upper and lower bounds on the asymptotic rate of the underlying codebooks. Under certain mild constraints on the problem parameters, one can show that the largest possible rate of a codebook that allows for all subcollections of codestrings to be uniquely reconstructable from the prefix-suffix information equals .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
