Two strings at Hamming distance 1 cannot be both quasiperiodic
Amihood Amir, Costas S. Iliopoulos, and Jakub Radoszewski

TL;DR
This paper extends a known property of periodic strings to quasiperiodic strings, proving that two strings differing at one position cannot both be quasiperiodic, and offers new insights into quasiperiodic combinatorics.
Contribution
It generalizes a classical fact from periodicity to quasiperiodicity, providing a new theoretical result and insights in combinatorics on words.
Findings
Two strings differing at one position cannot both be quasiperiodic
New theoretical insights into quasiperiodic structures
Extension of known periodicity properties to quasiperiodic strings
Abstract
We present a generalization of a known fact from combinatorics on words related to periodicity into quasiperiodicity. A string is called periodic if it has a period which is at most half of its length. A string is called quasiperiodic if it has a non-trivial cover, that is, there exists a string that is shorter than and such that every position in is inside one of the occurrences of in . It is a folklore fact that two strings that differ at exactly one position cannot be both periodic. Here we prove a more general fact that two strings that differ at exactly one position cannot be both quasiperiodic. Along the way we obtain new insights into combinatorics of quasiperiodicities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Algorithms and Data Compression · DNA and Biological Computing
Two strings at Hamming distance 1 cannot be both quasiperiodic
Amihood Amir
Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel
Costas S. Iliopoulos
Department of Informatics, King’s College London, London, UK
Jakub Radoszewski111The author is a Newton International Fellow.,
Department of Informatics, King’s College London, London, UK
Institute of Informatics, University of Warsaw, Warsaw, Poland
Abstract
We present a generalization of a known fact from combinatorics on words related to periodicity into quasiperiodicity. A string is called periodic if it has a period which is at most half of its length. A string is called quasiperiodic if it has a non-trivial cover, that is, there exists a string that is shorter than and such that every position in is inside one of the occurrences of in . It is a folklore fact that two strings that differ at exactly one position cannot be both periodic. Here we prove a more general fact that two strings that differ at exactly one position cannot be both quasiperiodic. Along the way we obtain new insights into combinatorics of quasiperiodicities.
1 Introduction
A string is a finite sequence of letters over an alphabet . If is a string, then by we denote its length, by for we denote its -th letter, and by we denote a factor of being a string composed of the letters (if , then it is the empty string). A factor is called a prefix if and a suffix if .
An integer is called a period of if for all . A string is called a border of if it is both a prefix and a suffix of . It is a fundamental fact of string periodicity that a string has a period if and only if it has a border of length ; see [4, 8]. If is a period of , is called a string period of . If has a period such that , then is called periodic. In this case has a border of length at least .
For two strings and of the same length , we write if for all and . This means that and are at Hamming distance 1, where the Hamming distance counts the number of different positions of two equal-length strings. The following fact states a folklore property of string periodicity that we generalize in this work into string quasiperiodicity. For completeness we provide its proof in Section 4.
Fact 1**.**
Let and be two strings of length and be an index. If , then at most one of the strings , is periodic.
We say that a string covers a string () if for every position there exists a factor such that . Then is called a cover of ; see Fig. 1. A string is called quasiperiodic if it has a cover of length smaller than .
A significant amount of work has been devoted to the computation of covers in a string. A linear-time algorithm finding the shortest cover of a string was proposed by Apostolico et al. [1]. Later a linear-time algorithm computing all the covers of a string was proposed by Moore and Smyth [9]. Breslauer [2] gave an on-line -time algorithm computing the cover array of a string of length , that is, an array specifying the lengths of shortest covers of all the prefixes of the string. Li and Smyth [7] provided a linear-time algorithm for computing the array of longest covers of all the prefixes of a string. All these papers employ various combinatorial properties of covers.
Our main contribution is stated as the following theorem. As we have already mentioned before, a periodic string has a border long enough to be the string’s cover. Hence, a periodic string is also quasiperiodic, and Theorem 1 generalizes Fact 1.
Theorem 1**.**
Let and be two strings of length and be an index. If , then at most one of the strings , is quasiperiodic.
The proof of Theorem 1 is divided into three sections. In Section 2 we restate several simple preliminary observations. Then, Section 3 contains a proof of a crucial auxiliary lemma which shows a combinatorial property of seeds that we use extensively in the main result. Finally, Section 4 contains the main proof.
2 Preliminaries
We say that a string is a seed of a string if and is a factor of some string covered by ; see Fig. 2. Furthermore, is called a left seed of if is both a prefix and a seed of . Thus a cover of is always a left seed of , and a left seed of is a seed of . The notion of seed was introduced in [5] and efficient computation of seeds was further considered in [3, 6].
In the proof of our main result we use the following easy observations that are immediate consequences of the definitions of cover and seed.
Observation 1**.**
Consider strings and .
- (a)
If is a cover of and , then is periodic with a period . 2. (b)
If is a cover of , then any cover of is also a cover of . 3. (c)
If is a seed of , then is a seed of every factor of of length at least . 4. (d)
If has a period and a prefix of length at least that has a cover , then is a left seed of .
A string is called a cyclic shift of a string , both of length , if there is a position such that . We denote this relation as . The following obviously holds.
Observation 2**.**
If is a cyclic shift of , then is a seed of .
3 Auxiliary Lemma
In the following lemma we observe a new property of the notion of seed. As we will see in Section 4, this lemma encapsulates the hardness of multiple cases in the proof of the main result.
Before we proceed to the lemma, however, let us introduce an additional notion lying in between periodicity and quasiperiodicity. We say that a string of length is almost periodic with period if there exists an index such that:
[TABLE]
In this case we refer to as the mismatch position. Furthermore, if for an integer , we say that each of these factors is an almost border of of length (and again refer to as the mismatch position). We immediately observe the following.
Observation 3**.**
A string of length is almost periodic with period and mismatch position if and only if has an almost border of length with mismatch position .
Example 1**.**
The following string of length 19:
abaab abaab abbab abba
is almost periodic with period and mismatch position (the letters at positions and are underlined). Hence, it has an almost border of length 14:
abaab abaab abba abaab abbab abba.
Lemma 1**.**
Let and be two strings of length and be an index. If , then is not a seed of .
Proof.
Assume to the contrary that is a seed of . Let be a string covered by that has as a factor. Obviously, it suffices to consider two occurrences of in to cover all positions of the factor : the leftmost one that covers and the rightmost one that covers . Let be the length of the longest suffix of that is a prefix of , and let be the length of the longest prefix of that is a suffix of (these are the so-called longest overlaps between and , and between and ). Thus we have and ; see Fig. 3. From now on we assume that . The other case (i.e., ) is symmetric by reversing the strings and . Let us denote .
First consider the case when satisfies . Then we have:
[TABLE]
by the definitions of and , respectively. Consequently, . This means that has an almost border of length with mismatch position . By Observation 3, is almost periodic with period and the same mismatch position.
The latter can be written equivalently as follows:
[TABLE]
Recall that . This means that the same cyclic-shift relations hold for all corresponding factors of that do not contain the symbol . Moreover, , so (and ). This concludes that:
[TABLE]
Moreover, the inequalities satisfied by in this case imply that and . Hence, the conditions (1) and (2) conclude that there is no suffix of of length at least that would be a prefix of . Consequently, , a contradiction.
We are left with two cases:
- (A)
. In this case has a border of length :
[TABLE]
Consequently, is periodic with period . On the other hand, does not have the period , since . Moreover, for all . In conclusion, there cannot exist a suffix of of length at least that would be a prefix of , i.e. , a contradiction. 2. (B)
. In this case has a border of length :
[TABLE]
Consequently, is periodic with period . On the other hand, does not have the period , since . Moreover,
[TABLE]
for all . In conclusion, there cannot exist a suffix of of length at least that would be a prefix of , i.e. , a contradiction.∎
The following example illustrates the main case of the proof of the above lemma.
Example 2**.**
Consider the following two strings of length 19:
abaab abbab abbab abba, abaab abaab abbab abba.
We have . The longest suffix of that is a prefix of has length (abaab abbab abba). Hence, is almost periodic with period and mismatch position 8. Moreover, is almost periodic with the same period and mismatch position . We see that no prefix of of length at least 5 can be a suffix of .
We use Lemma 1 as our key tool throughout the proof of the main result. As a consequence of Lemma 1 we obtain the following lemma that will be also useful in the main proof.
Lemma 2**.**
Let and be two strings of length and be an index. If , then there does not exist a string that would be both a cover of and a seed of .
Proof.
Consider an occurrence of in , , that covers the position . Due to Observation 1c, the string has as a seed. We have , which contradicts Lemma 1. ∎
4 Main Result
In this section we first present a proof of the folklore property of string periodicity (Fact 1) for completeness, and then proceed to the proof of our main result being a generalization of that fact (Theorem 1).
Proof (of Fact 1).
Assume to the contrary that and both strings are periodic. Let and () be the shortest periods of and . Assume w.l.o.g. that . It suffices to prove the lemma in the case that is a square of length and . Let us define and . By the periodicity of , we see that .
We may assume that , as otherwise we may reverse both strings , . Both and have period , as they are factors of (or the reversal of ), and their string periods of length , further denoted by and , are cyclic shifts. Now consider any such that (it exists by the upper bound on the value of ). Then and . This concludes that and differ at exactly one position , i.e., . However, is a cyclic shift of , hence a seed of by Observation 2. This contradicts Lemma 1. ∎
Proof (of Theorem 1).
Assume to the contrary that and both strings are quasiperiodic. Let and be the shortest covers of and . W.l.o.g. we can assume that . We consider a few cases depending on the lengths of the covers:
- (A)
. By Observation 1a, the strings , are both periodic. This contradicts Fact 1. 2. (B)
. Again by Observation 1a, is periodic with the period . Assume w.l.o.g. that . This means that the first half of and , , has period and is its left seed. There are three subcases:
- (B1)
. By Observation 1d, is a left seed of . Therefore, and contradict Lemma 2. 2. (B2)
and . In this case the strings and differ only at position . By Observation 1c applied to , has a seed . This contradicts Lemma 1. 3. (B3)
and . Then is a cover of , as it is a left seed of and a suffix due to the period . Hence, by Observation 1d, is a left seed of . Therefore, and contradict Lemma 2.
From now on we assume that . 3. (C)
. This immediately contradicts Lemma 2. 4. (D)
but . Let ; . Then is a border of , and is a border of . As , it is not possible to change a single position in such that both its prefix and its suffix of length become . 5. (E)
. We consider three final subcases.
- (E1)
. This means that is a border of , consequently a border of . However, is not a cover of . Otherwise, by Observation 1b, would be a cover of shorter than .
Consider the factors and ; note that they cover disjoint sets of positions.
If , then . The string is a border of and a cover of . Hence, by Observation 1c, is a cover of . This contradicts the opposite observation that we have just made. Otherwise (if ) we see that similarly is a cover of , again a contradiction. 2. (E2)
. This case is symmetric to the following case (D3) by reversing the strings and . 3. (E3)
. As is a prefix of , this means that the prefix of of length is a string such that .
Note that . The string is a cover of , therefore, by Observation 1c, is a seed of the prefix of . This, however, contradicts Lemma 1.
The above cases include all the possibilities. This concludes the proof. ∎
5 Conclusions
In this note we have proved that every two distinct quasiperiodic strings of the same length differ at more than one position. This bound is tight, as, for instance, for every even the strings and are both quasiperiodic and differ at exactly two positions.
Acknowledgements
The authors thank Maxime Crochemore and Solon P. Pissis for helpful discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Alberto Apostolico, Martin Farach, and Costas S. Iliopoulos. Optimal superprimitivity testing for strings. Inf. Process. Lett. , 39(1):17–20, 1991.
- 2[2] Dany Breslauer. An on-line string superprimitivity test. Inf. Process. Lett. , 44(6):345–347, 1992.
- 3[3] Michalis Christou, Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Bartosz Szreder, and Tomasz Waleń. Efficient seeds computation revisited. In Raffaele Giancarlo and Giovanni Manzini, editors, Combinatorial Pattern Matching - 22nd Annual Symposium, CPM 2011. Proceedings , volume 6661 of Lecture Notes in Computer Science , pages 350–363. Springer, 2011.
- 4[4] Maxime Crochemore, Christophe Hancart, and Thierry Lecroq. Algorithms on Strings . Cambridge University Press, New York, NY, USA, 2007.
- 5[5] Costas S. Iliopoulos, D. W. G. Moore, and Kunsoo Park. Covering a string. Algorithmica , 16(3):288–297, 1996.
- 6[6] Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, and Tomasz Waleń. A linear time algorithm for seeds computation. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012 , pages 1095–1112. SIAM, 2012.
- 7[7] Yin Li and William F. Smyth. Computing the cover array in linear time. Algorithmica , 32(1):95–106, 2002.
- 8[8] M. Lothaire. Combinatorics on Words . Addison-Wesley, Reading, MA., U.S.A., 1983.
