Duplication Distance to the Root for Binary Sequences
Noga Alon, Jehoshua Bruck, Farzad Farnoud, Siddharth Jain

TL;DR
This paper investigates the tandem duplication distance from binary sequences to their roots, establishing linear bounds for exact duplications and a phase transition for approximate duplications at a critical error rate.
Contribution
It provides the first rigorous analysis of the duplication distance to the root for binary sequences, including bounds and phase transition phenomena.
Findings
Maximum exact duplication distance grows linearly with sequence length.
Approximate duplication distance exhibits a sharp transition at a 50% error rate.
Results are relevant for understanding genomic tandem duplication mutations.
Abstract
We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form , where and are sequences and , , and are their substrings, needed to generate a binary sequence of length starting from a square-free sequence from the set . This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length by , we prove that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Coding theory and cryptography
