On the Maximum Number of Non-Confusable Strings Evolving Under Short   Tandem Duplications

Mladen Kova\v{c}evi\'c

arXiv:1911.06561·cs.IT·July 1, 2022

On the Maximum Number of Non-Confusable Strings Evolving Under Short Tandem Duplications

Mladen Kova\v{c}evi\'c

PDF

TL;DR

This paper characterizes the largest set of q-ary strings avoiding certain repeated substrings, proving its asymptotic optimality and solving the zero-error capacity problem for a specific tandem-duplication channel case.

Contribution

It introduces a maximal code of non-confusable strings under tandem duplications of length ≤ 3 and proves its asymptotic optimality, resolving a key capacity problem.

Findings

01

The code avoids substrings a a, a b a b, a b c a b c.

02

The code is asymptotically optimal in rate.

03

It solves the zero-error capacity problem for the root-uniqueness tandem-duplication channel.

Abstract

The set of all $q$ -ary strings that do not contain repeated substrings of length $⩽ 3$ (i.e., that do not contain substrings of the form $aa$ , $abab$ , and $ab c ab c$ ) constitutes a code correcting an arbitrary number of tandem-duplication mutations of length $⩽ 3$ . In other words, any two such strings are non-confusable in the sense that they cannot produce the same string while evolving under tandem duplications of length $⩽ 3$ . We demonstrate that this code is asymptotically optimal in terms of rate, meaning that it represents the largest set of non-confusable strings up to subexponential factors. This result settles the zero-error capacity problem for the last remaining case of tandem-duplication channels satisfying the "root-uniqueness" property.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.