Reconstructing Mixtures of Coded Strings from Prefix and Suffix   Compositions

Ryan Gabrys; Srilakshmi Pattabiraman; Olgica Milenkovic

arXiv:2010.11116·cs.IT·October 22, 2020

Reconstructing Mixtures of Coded Strings from Prefix and Suffix Compositions

Ryan Gabrys, Srilakshmi Pattabiraman, Olgica Milenkovic

PDF

TL;DR

This paper introduces coding methods for unique reconstruction of string mixtures from prefix and suffix composition data, with bounds on code rates, relevant for DNA and polymer data storage applications.

Contribution

It proposes new coding techniques enabling unique joint reconstruction of string subsets from prefix-suffix compositions, with proven bounds on code rates.

Findings

01

Maximum code rate for reconstructability is 1/h.

02

Matching upper and lower bounds on codebook asymptotic rate.

03

Reconstruction is feasible under mild parameter constraints.

Abstract

The problem of string reconstruction from substring information has found many applications due to its relevance in DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry readouts. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide matching upper and lower bounds on the asymptotic rate of the underlying codebooks. Under certain mild constraints on the problem parameters, one can show that the largest possible rate of a codebook that allows for all subcollections of $\leq h$ codestrings to be uniquely reconstructable from the prefix-suffix information equals $1/ h$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.