On the Maximum Number of Non-Confusable Strings Evolving Under Short Tandem Duplications
Mladen Kova\v{c}evi\'c

TL;DR
This paper characterizes the largest set of q-ary strings avoiding certain repeated substrings, proving its asymptotic optimality and solving the zero-error capacity problem for a specific tandem-duplication channel case.
Contribution
It introduces a maximal code of non-confusable strings under tandem duplications of length ≤ 3 and proves its asymptotic optimality, resolving a key capacity problem.
Findings
The code avoids substrings a a, a b a b, a b c a b c.
The code is asymptotically optimal in rate.
It solves the zero-error capacity problem for the root-uniqueness tandem-duplication channel.
Abstract
The set of all -ary strings that do not contain repeated substrings of length (i.e., that do not contain substrings of the form , , and ) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length . In other words, any two such strings are non-confusable in the sense that they cannot produce the same string while evolving under tandem duplications of length . We demonstrate that this code is asymptotically optimal in terms of rate, meaning that it represents the largest set of non-confusable strings up to subexponential factors. This result settles the zero-error capacity problem for the last remaining case of tandem-duplication channels satisfying the "root-uniqueness" property.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
