Reconstruction of Sets of Strings from Prefix/Suffix Compositions
Ryan Gabrys, Srilakshmi Pattabiraman, and Olgica Milenkovic

TL;DR
This paper introduces new coding methods for reconstructing sets of strings from prefix and suffix composition data, with applications in genomic sequencing and DNA data storage, providing bounds and error models.
Contribution
It presents novel coding techniques for unique string set reconstruction from prefix/suffix compositions, extending properties of binary Bh and Dyck strings, and correcting prior bounds.
Findings
Derived bounds on binary Bh sequences for even parameters h
Developed codes accommodating missing substrings in mass spectrometry data
Described error models relevant to mass spectrometry analysis
Abstract
The problem of reconstructing strings from substring information has found many applications due to its importance in genomic data sequencing and DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry devices. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide upper and lower bounds on the asymptotic rate of the underlying codebooks. Our code constructions combine properties of binary Bh and Dyck strings and that can be extended to accommodate missing substrings in the pool. As auxiliary results, we obtain the first known bounds on binary Bh sequences for arbitrary even parameters h, and also describe various error models inherent to mass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Genomics and Phylogenetic Studies
