Exact Reconstruction from Insertions in Synchronization Codes
Frederic Sala, Ryan Gabrys, Clayton Schoeny, and Lara Dolecek

TL;DR
This paper derives an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance, providing bounds on the traces needed for exact reconstruction in synchronization codes, with applications to VT codes.
Contribution
It introduces a precise formula for the maximum common supersequences at a given edit distance, improving understanding of trace requirements for sequence reconstruction.
Findings
Derived an exact formula for maximum common supersequences at a given edit distance.
Provided tight upper bounds on the number of traces needed for exact reconstruction.
Showed that many VT codeword pairs reach the worst-case trace requirement.
Abstract
This work studies problems in data reconstruction, an important area with numerous applications. In particular, we examine the reconstruction of binary and non-binary sequences from synchronization (insertion/deletion-correcting) codes. These sequences have been corrupted by a fixed number of symbol insertions (larger than the minimum edit distance of the code), yielding a number of distinct traces to be used for reconstruction. We wish to know the minimum number of traces needed for exact reconstruction. This is a general version of a problem tackled by Levenshtein for uncoded sequences. We introduce an exact formula for the maximum number of common supersequences shared by sequences at a certain edit distance, yielding an upper bound on the number of distinct traces necessary to guarantee exact reconstruction. Without specific knowledge of the codewords, this upper bound is tight.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
