Binary patterns in the Prouhet-Thue-Morse sequence
Jorge Almeida, Ond\v{r}ej Kl\'ima

TL;DR
This paper characterizes binary patterns in the Prouhet-Thue-Morse sequence, showing most are segments of the sequence except for two specific patterns, and identifies patterns arising from non-trivial endomorphisms.
Contribution
It provides a comprehensive classification of binary patterns in the sequence, including those generated by non-trivial endomorphisms, clarifying previous attributions.
Findings
All binary patterns except two are segments of the sequence.
Identified finite patterns generated by non-trivial endomorphisms.
Clarified historical attribution of the pattern classification.
Abstract
We show that, with the exception of the words and , all (finite or infinite) binary patterns in the Prouhet-Thue-Morse sequence can actually be found in that sequence as segments (up to exchange of letters in the infinite case). This result was previously attributed to unpublished work by D. Guaiana and may also be derived from publications of A. Shur only available in Russian. We also identify the (finitely many) finite binary patterns that appear non trivially, in the sense that they are obtained by applying an endomorphism that does not map the set of all segments of the sequence into itself.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\publicationdetails
232021365460
Binary patterns in the Prouhet-Thue-Morse sequence
Jorge Almeida\affiliationmark1 Work partially support by CMUP (UID/MAT/ 00144/2019), which is funded by FCT (Portugal) with national (MATT’S) and European structural funds (FEDER) under the partnership agreement PT2020. The work was carried out at Masaryk University, whose hospitality is gratefully acknowledged, with the support of the FCT sabbatical scholarship SFRH/BSAB/142872/2018.
Ondřej Klíma\affiliationmark2 Work supported by Grant 19-12790S of the Grant Agency of the Czech Republic. CMUP, Dep. Matemática, Faculdade de Ciências, Universidade do Porto, Portugal
Dept. of Mathematics and Statistics, Masaryk University, Brno, Czech Republic
(2019-05-15; 2021-07-27; 2021-08-02)
Abstract
We show that, with the exception of the words and , all (finite or infinite) binary patterns in the Prouhet-Thue-Morse sequence can actually be found in that sequence as segments (up to exchange of letters in the infinite case). This result was previously attributed to unpublished work by D. Guaiana and may also be derived from publications of A. Shur only available in Russian. We also identify the (finitely many) finite binary patterns that appear non trivially, in the sense that they are obtained by applying an endomorphism that does not map the set of all segments of the sequence into itself.
keywords:
Prouhet-Thue-Morse sequence, pattern, infinite word, special word
1 Introduction
Let be the endomorphism of the free semigroup defined by and . Since is a prefix of , is also a prefix of . Hence, the sequence determines a sequence of letters, or infinite word, whose prefix of length is ; we say that the infinite word thus obtained is generated by . It is called the Prouhet-Thue-Morse sequence and it has been the object of extensive studies and applications. It was first considered by Prouhet (1851) in connection with a problem in number theory, five decades later by Thue (1906, 1912) to exhibit infinite words avoiding cubes and squares, and another two decades later by Morse (1921) as a discretized description of non-periodic recurrent geodesics in surfaces of negative curvature. See Allouche and Shallit (1999) for a survey on this topic, including several further connections with other branches of Mathematics. The first author and other collaborators have previously studied the sequence in the framework of symbolic dynamics and its connections with free profinite semigroups (see Almeida and Costa (2013) and Almeida et al. (2020)). It was in fact an attempt to construct a profinite semigroup with certain properties that prompted this work, although no further references to profinite semigroups will be made in this paper.
This paper concerns the study of binary patterns of , that is, finite or infinite words over the alphabet for which there exists an endomorphism of the semigroup (naturally extended to infinite words) such that the word can be found as a block of consecutive letters of (which we call a segment of ). Since we need to identify concrete finite segments of , a simple and efficient algorithm on how to compute them is presented in Section 2.
Characterizations of binary patterns of are due to Shur (1996a) and D. Guaiana (unpublished work announced in Restivo and Salemi (2002b, a)). Our first main contribution is a proof of the characterization attributed to D. Guaiana (but also, independently, obtained by Shur (1997) in his thesis) using results from Shur (1996a): with the exception of and , the binary patterns of are its finite segments. Section 3 presents our proof of this result.
The endomorphism and the endomorphism exchanging the letters and are easily seen to transform finite segments of into other such segments. In Section 4, we consider the problem of determining which finite segments may only be transformed into other segments by endomorphisms of that may be obtained by composition of and . Such words are said to be typical since we show that all but finitely many finite segments of are typical. We further determine all atypical words. As an application of our results, we also determine all infinite binary patterns of .
We conclude the paper with Section 5, where we propose the investigation of the properties we established for for arbitrary infinite words.
2 Segments of
By a word we always mean a finite sequence of letters of an alphabet , that is a member of the free monoid . A word is a factor of a word if there exist words and such that . In spite of the terminology, an infinite word is not a word but rather an infinite sequence of letters.
Note that is a code, in the sense that it generates a free subsemigroup of and, therefore, is injective.
For an infinite word , by the segments of we mean the words of the form with and the infinite words of the form . Note that, since , all factors of the words are segments of . It follows that a word is a segment of if and only if so is the word that is obtained from by interchanging the letters and .
A word is said to be avoided by if there is no homomorphism such that is a segment of . We also say that is unavoidable in if it is not avoided by ; we then also say that is a pattern of . For instance, it is well known that and are avoided by , which is also expressed by saying that is, respectively, cube-free and overlap-free (Lothaire, 1983).
The preceding notions are extended to infinite words by saying how endomorphisms of are applied to infinite words. Given an infinite word over the alphabet and an endomorphism of , we let be infinite word obtained by concatenating the : .
For a nonempty word , let denote its last letter.
The computation of the segments of may be carried out easily in view of the following proposition. The first part is an improved version of Shur (1996b, Corollary 1), although the same conclusion is in fact already established in the proof of the cited statement. We present a proof for the sake of completeness.
Proposition 2.1
A word is a segment of if and only if it is a factor of , where if , if , and otherwise. Moreover, for every integer , the value is minimum for to admit as factors all segments of of length .
Proof 2.1**.**
Since is cube-free, the cases where are easily verified by inspection. Suppose that is a segment of which we may assume to be of length at least 4. Then, is a factor of for some positive integer . Take to be minimum with that property. If is the minimum positive integer such that is a factor of for some letter , then either only can play that role and , or may play it and . We need to show, respectively, that or . Since , we may assume that for, otherwise, the inequality holds trivially.
Let . As and is minimum, there must be a nontrivial factorization with a suffix of and a prefix of .
If one of the factors or has length greater than , then we must have , which implies that and fulfills our aim. Thus, we may assume that both and have length at most . Now, we have and is a factor of . If is a factor of either or , then it is also a factor of and, therefore, also of , so that we are in the case . On the other hand, by the minimality of the word cannot be a factor of or , and so we have , which implies that , as claimed. It remains to consider the case where is a factor of neither nor . Then there must be a factorization with and nontrivial words, so that , which yields . This completes the proof of the first part of the proposition.
To prove the last part of the proposition, first note that, for , the value of is at least 3. We claim that, for , there is a word of length that is a segment of but not a factor of . Noting that , the result follows.
To establish the claim, consider the word . It is a segment of , in fact a factor of since is the first letter of . It remains to show that is not a factor of . Otherwise, since there are no overlaps in , must be a factor of . For , this is clearly impossible since not even is a factor of . For , we have
[TABLE]
Since there are no overlaps in , the only place where is found as a factor of is as the product of the two middle factors in the factorization given in (2). Hence, the word is not a factor of since, for instance, is not the first letter of .
For example, the segments of lengths 4 and 5 of the infinite word are the factors of those lengths of . But, for instance, is a segment of but not a factor of ; it is precisely the segment considered in the last part of the proof of Proposition 2.1.
Throughout the remainder of the paper, when we need to check whether a concrete finite word is a segment of , without any further reference we simply apply the algorithm given by Proposition 2.1, which is linear in the length of the given word. We proceed similarly when we need to compute all the segments of of a given length.
Now, we take into account also the dual version of Proposition 2.1, where is considered instead of , which is a direct consequence of Proposition 2.1 using the fact that the set of all segments of of fixed length is closed under taking images under . Since every segment of of length must contain the factor or , it follows from Proposition 2.1 that, for , every segment of length of is a factor of every segment of of some length which is at most
[TABLE]
The existence of such an is the property known as uniform recurrence of and holds for every sequence generated by iterating a primitive endomorphism of a free semigroup (Queffélec, 2010, Proposition 5.2). In the case of the Prouhet-Thue-Morse sequence, the optimum value of is presented in Allouche and Shallit (2003, Example 10.9.3): for , we have , where is the integer determined by the inequalities . Note that using the first inequality determining , one gets the upper bound , which is better than our rough upper bound .
3 Finite binary patterns
The following result plays a key role below.
Theorem 3.1** **(Shur (1996a))
The set of words of that are avoided by is the fully invariant ideal generated by the set
[TABLE]
Moreover, the above is a minimal generating set for the fully invariant ideal of the words avoided by .
The generators corresponding to and are, respectively, and ; the generator corresponding to is while, for , the word given by is avoided by but may be obtained for instance from the generator by mapping to and to .
Another useful ingredient in our arguments is the following “synchronization” result.
Lemma 3.2** **(de Luca and Varricchio (1989, Lemma 3.9))
Let and consider with . If and are words such that and is odd then has an overlap.
Since has no overlaps, we conclude that , with as in Lemma 3.2 and a finite word, then has even length.
Corollary 3.3
If there is a factorization where , , and , then and for some word and infinite word .
Proof 3.1**.**
We proceed by induction on , the case being trivial as is interpreted to be the identity function. Suppose that and, by symmetry, assume that . Since , by the induction hypothesis we know that and , for some finite word and infinite word , and so . Lemma 3.2 then implies that and belong to the image of . Hence, and belong to the image of .
We say that a segment of is special if both and are segments of . The special segments of have been investigated by de Luca and Varricchio (1989) with the purpose of counting the number of segments of each given length. For our purposes, it suffices to observe the following much simpler result.
Lemma 3.4** **(de Luca and Varricchio (1989, Lemma 3.6))
If the word is special, then so is .
We say that two words are suffix comparable if at least one of them is a suffix of the other. The following lemma is the core of our arguments.
Lemma 3.5
Suppose that is a finite word such that is unavoidable in and is a segment of but is not. If and is suffix comparable with , then is also suffix comparable with .
Proof 3.2**.**
Since is a suffix of , we may assume that is a suffix of , say . Consider a concrete occurrence of in : , where . Since is recurrent, we may assume that . By Corollary 3.3, we know that and for some word and infinite word (cf. Figure 1). Since starts with , so does .
Suppose first that ends with the letter and . If ends in then the word is a suffix of which, in view of Theorem 3.1, contradicts the assumption that is unavoidable in . On the other hand, if ends with then, taking into account that starts with , we conclude that for some finite word and infinite word . Since is a fixed point of the injective endomorphism , it follows that is a segment of , which we know is not the case.
If , then is suffix of for a letter . By Lemma 3.4, as it is easy to check that is special, so is . Hence, is special, contradicting the assumption that is not a segment of .
Thus, must end with , so that ends with . Since both and are suffixes of , they must be suffix comparable, thereby concluding the proof of the lemma.
Similarly, one can prove the following lemma.
Lemma 3.6
Suppose that is a finite word such that is unavoidable in and is a segment of but is not. If and is suffix comparable with then is also suffix comparable with .
Shur also observed in Shur (1996a) that the word (and, therefore, also ) is unavoidable in but it is not a segment of . Our first main result is that there are no other such examples, thus providing an alternative characterization of two-letter words unavoidable in t.
According to Restivo and Salemi (2002a, Theorem 2) and Restivo and Salemi (2002b, Theorem 3), the following theorem, which is considered to be surprising, was first proved by D. Guaiana in 1996 but, through private communication with A. Restivo, we learned that the proof was never published and the manuscript appears to be lost. On the other hand, we later learned from A. M. Shur that the next theorem also appears in his Ph.D. thesis (1997), which has never been published other than as a document in the Russian State Library. Moreover, Shur observed that the result can also easily be drawn from Theorem 3.1 using a characterization of the finite words on the alphabet that are not segments of , which is given in Shur (2005, Statement 1), a paper also in Russian. Since all the proofs seem to be either lost or somewhat inaccessible in the Russian literature, again we present our own proof for the sake of completeness.
Theorem 3.7
A word is unavoidable in if and only if it is one of the words and , or it is a segment of .
Proof 3.3**.**
We proceed by induction on the length of . In view of Theorem 3.1, it is easy to check that the theorem holds for words of length at most 5. Assuming inductively that the result holds for words of length , let be a word of length that is unavoidable in . Since interchanging the letters and does not affect either of the properties of being unavoidable in and being a segment of , we may as well assume that is the last letter of .
Let . Since is unavoidable in , so is . Hence, by induction hypothesis, may be found somewhere as a segment of . Take such an occurrence of in and let be the letter immediately after it. We wish to show that there is such an occurrence of in with . Aiming at a contradiction, we may assume that there is no such occurrence, that is, we always have . Since is recurrent, the segment may be found in as far as we wish, so that we may continue prolonging it on the left as much as may be convenient. Thus, we are assuming that is unavoidable in and that is a segment of as long as desired but is not a segment of . However, we have to be careful because there is in principle no assurance that such an extension of to the left retains the property that is unavoidable in .
Since is avoided by and we are assuming that , cannot end with . We distinguish several cases according to the termination of the word .
If ends with then, by the above, it ends with . Suppose, more precisely, that ends with . Since ends with and is avoided by , in fact must end with . This situation is impossible since we know that the suffix of is not a segment of .
Alternatively, assuming that ends with , it must end with . We may then apply successively Lemmas 3.5 and 3.6 to deduce that there is some such that is a suffix of . By Lemma 3.4, it follows that is special, which contradicts the assumption that is not a segment of .
The next case we consider is that where ends with . Note that cannot end with for, otherwise, ends with and, therefore, it cannot be unavoidable in . Also, cannot end with since is not a segment of . Hence, is not a suffix of .
Thus, assuming that ends with , it must end with . We are then again led to a contradiction as above using Lemmas 3.5, 3.6 and 3.4.
4 Typical finite binary patterns
Recall that the endomorphism of switching the letters and is denoted . Since the finite segments of are the finite words that are factors of for all sufficiently large and is a factor of , we may replace by in that characterization of the segments of . Moreover, as commutes with , we conclude that the set of finite segments of is closed under applying the substitutions and . By induction on , the words are palindromic, in the sense that they coincide with the words read in the reverse order; this entails the well known fact that the set of segments of is closed under reversal.
We say that a word is atypical if it is a segment of and there is an endomorphism of such that is also a segment of and is not of one of the forms or with . Segments of that are not atypical are said to be typical.
We say that a word is a variant of another word if it may be obtained from by applying reversal or or both. Note that the set of atypical words is closed under taking factors and, by the above discussion, it is also closed under taking variants.
The following result appears explicitly as Brlek (1989, Proposition 3.3) but may already be extracted from Thue (1912) (see Berstel (1995, Chapter 3, Proposition 2.13)).
Proposition 4.1
If is a segment of then is one of the words , , or for some .
Yet another property of the Prouhet-Thue-Morse infinite word is the following result which explains the above terminology.
Theorem 4.2
Let be a segment of containing at least one of the segments and along with all other segments of of length 3. Then is typical.
Proof 4.1**.**
Suppose is an endomorphism of such that is a segment of . By Proposition 4.1, since and are square segments of , each of the words and must be obtained by applying a power of to one of the words . Let then and , where . We may assume that since, otherwise, we would consider the pair instead of . Then, we have the factorization , where is the endomorphism of defined by and . Since is injective and is a fixed point of , from the fact that is a segment of , we conclude that so is . On the other hand, since and commute, if is a product of and then so is . Hence, we may assume that .
The mapping has also the property that is a segment of . Since , we may further assume that is one of the words or . Since is a factor of and neither nor is a factor of , then cannot start with and, therefore, it must be either or .
Consider first the case where . Since is a factor of but is not a factor of , cannot end in , which implies that is even. If , then starts with . But, since is a factor of , this implies is a factor of and, therefore, a segment of , which we know is not the case. Hence, we must have . It remains to rule out the case , which results from noting that in that case, from the assumption that either or is a factor of , it follows that either or is a factor of while we know that is not a segment of .
Next, consider the case where . Since is a factor of and is not a segment of , cannot start with . As is a prefix of , it follows that and . This leads to a similar situation as that considered at the end of the preceding paragraph, with the letters and interchanged which is, therefore, excluded. This completes the proof of the theorem.
The assumption of Theorem 4.2 that a segment of contains as factors at least one of the words and along with all other segments of length 3 of holds for all segments of of length , as may be easily checked by examining all segments of that length. Hence, by Theorem 4.2 there are only finitely many atypical words. Since there are 5 different segments of length 3 of that are supposed to appear in the word, no word with length shorter than 7 satisfies the criterion of Theorem 4.2 and is a word of length 7 that does satisfy it. On the other hand, the segment , of length 9, fails to have the segment as a factor.
The following result completes the above observations by giving the full identification of atypical words.
Theorem 4.3
Up to taking variants, the atypical words are the factors of the words , , and .
Proof 4.2**.**
To check that all relevant words have been duly considered, the reader may wish to refer to the diagram in Figure 2 later in the paper, where all atypical words are represented.
The following is the complete list of segments of of length 5:
[TABLE]
Note that all these words are variants of factors of at least one of the three words in the statement of the theorem. Hence, by showing that those three words are atypical, we obtain that so are all words of length up to 5.
We next indicate for each of the words in the statement of the theorem an endomorphism of not of the forms and that maps it to a segment of :
- •
: , ;
- •
: , ;
- •
: , .
The verification of all these statements amounts to routine calculations.
Showing that there are no other atypical words requires more work. Note that a word is typical if it has a typical factor. Hence, also excluding variants and words that satisfy the criterion of Theorem 4.2, we obtain the following reduced list of words remaining to be treated:
[TABLE]
We proceed to show that each word in the list (3) is typical. For that purpose, assume that is an endomorphism of such that is a segment of .
In the first three cases, since and are factors of , we may start the argument using Proposition 4.1 as in the proof of Theorem 4.2, assuming that is either or .
Consider first the case . Since in the three cases, is a factor of but is cube-free, must start with . Therefore, we may assume that with . In all three cases, since is a factor of , we get that is a factor of and this would provide an overlap in if ends with . Hence ends with . Since is of the form for some , we conclude that either starts with or it is . The first case is excluded since is then a factor of , whence also of , while it is not a segment of . The case is also excluded if is either or since it leads to the overlap in the factor of . In case , one can simply check directly that is not a segment of .
Still treating for the moment only the first three of the words in the list (3), suppose next that . Again, as is a factor of and cannot be a factor of , must start with . If it ends with , then would be an overlap in since is a factor of . Hence, starts and ends with . This is impossible in case has the factor since it would lead to the overlap in . This excludes the cases where is the first or the second word in the list (3). So, we have . Then is a square segment of . By Proposition 4.1, is one of the words with . Since gives a word that is too short to be , we must have , in which case a simple calculation shows that cannot start with . This ends the verification that the first three words in the list (3) are typical.
It remains to consider the word . Here, we have two square factors of , namely the squares of and . By Proposition 4.1 we know that there are words and non negative integers such that and . In case , comparing the lengths of the word and its factor , we obtain the inequality , so that . It follows that , and . From the equalities we then deduce that , where is the first letter of , and . Since is a factor of , is a segment of , which contradicts being overlap free. Thus, we must have . Then must be of the form where is a prefix of . It follows that, as in the proof of Theorem 4.2, we may then assume that and that is either or . Consider first the case where . Since is a factor of but is not a factor of , the word must start and end with the letter and we may assume that it is not reduced to . Since , we conclude that must start with . Since is a factor of , this yields the factor of , which is not possible since is a segment of . Finally, the case is excluded since cannot end with . This concludes the proof of the theorem.
To facilitate the visualization of the set of atypical words, we give a semigroup theoretical formulation. Although we do not go deep into it, the reader unfamiliar with semigroup theory may prefer to skip these considerations or refer to a standard textbook in the area such as Clifford and Preston (1961); Howie (1995).
Let be the set of atypical words. We may define a multiplication on the set as follows: for , is if is atypical and 0 otherwise; for all , . Note that is the Rees quotient of by the ideal consisting of the typical words together with the words that are not segments of .
The diagram in Figure 2 represents as a partially ordered set for the Green -order, in which an element lies above if and only if is a factor of . The words in bold are the lexicographic minima among their variants; note that those that are atoms (which are underlined) are precisely the words that were shown directly to be atypical in Theorem 4.3.
We conclude this section with another application of Theorem 4.2, this one concerning infinite patterns of .
Corollary 4.4
Let be an infinite word and suppose that there is an endomorphism of such that is a suffix of either or . Then is itself a suffix of either or .
Proof 4.3**.**
Since all segments of are unavoidable in and they are all extendable on the right, by Theorem 3.7 they are segments of . Since the language of the segments of defines a minimal subshift (Queffélec, 2010, Proposition 5.2), it follows that and have the same segments. In particular, the word is a segment of and it satisfies the assumption of Theorem 4.2. It follows that there is such that or . Again, since is injective and both and are fixed by , the result follows.
The somewhat different formulation for finite and infinite segments (compare Theorem 3.7 with Corollary 4.4) is fully justified by the following result, which entails that the infinite words and have no common suffix.
Proposition 4.5
If is an infinite word over and is a common infinite suffix of and , then is periodic.
Proof 4.4**.**
By assumption, there are finite words and such that , . Since and start with different letters, the words and have different lengths. Replacing by , if needed, we may assume that is shorter than . As is a prefix of , it follows that for some word . From , we deduce that and so , thereby showing that is periodic.
5 Final remarks and problems
For an infinite word over a finite alphabet , let be the language consisting of its finite segments. Note that the automorphisms of the semigroup permute the letters of ; we call them letter exchanges. The language obtained from by applying all possible letter exchanges is denoted . Let denote the set of all endomorphisms of such that . The set is similarly defined using instead of . Note that both and are submonoids of the monoid of all endomorphisms of the semigroup .
The following is an immediate consequence of Theorem 4.2.
Corollary 5.1
The monoid is generated by the set . In particular, it is finitely generated.∎
Corollary 5.1 is intimately related with a result of Thue (see Berstel (1995, Chapter 3, Theorem 2.16)) that characterizes the set of the so-called overlap-free morphisms, that is, endomorphisms of that map the set of all overlap-free words into itself, namely as the monoid generated by . In fact, in view of another result of Thue (see Berstel (1995, Chapter 3, Theorem 2.15)), all (overlap-free) words that can be arbitrarily prolonged in both directions to overlap-free words are segments of . It follows that overlap-free morphisms belong to and so Corollary 5.1 immediately yields Thue’s necessary condition for overlap-free morphisms. That the condition is also sufficient is given by another result of Thue (see Berstel (1995, Chapter 3, Lemma 2.2)). It does not appear to be immediately obvious how to deduce Corollary 5.1 from Thue’s results.
Corollary 5.1 is also related with a result of Pansiot (1981) characterizing the endomorphisms of that generate some infinite word obtained from by dropping a finite prefix as precisely the powers of . Since is recurrent, all such infinite words have the same language . Hence, the endomorphisms considered by Pansiot belong to , whence they are products of and . Since and commute, it follows from Corollary 5.1 that is either or for some , the latter possibility being excluded because is assumed to be a fixed point of . This gives Pansiot’s result. Again, it is not clear how to deduce Corollary 5.1 from Pansiot’s results.
Theorems 3.7 and 4.2, together with Corollary 5.1 may be regarded as three finiteness properties of the Prouhet-Thue-Morse sequence. It is natural to ask which infinite words possess such finiteness properties. More precisely, we propose the following problems.
Problem 1
Which infinite words have the property that, up to finitely many exceptions, the patterns of on the same alphabet are obtained from its segments up to an exchange of letters?
Problem 2
For which infinite words is the monoid finitely generated? Similar question for .
We say that a finite segment of is -atypical if there is some endomorphism of such that is also a segment of .
Problem 3
Which infinite words have only finitely many -atypical segments?
A negative example for Problem 1 is provided by the Fibonacci infinite word, which is the only fixed point of the endomorphism of defined by and . That there are infinitely many finite binary patterns of that are not segments of was proved in Restivo and Salemi (2002a) (see also Restivo and Salemi (2002b)), where it is also shown that there are Sturmian infinite words that admit as patterns all segments of all Sturmian infinite words. Recall that an infinite word is Sturmian if it has exactly segments of each length . We do not know whether is generated by and is generated by and . We also do not know whether the set of -atypical words is finite.
Problem 1 was raised in Restivo and Salemi (2002a) for binary infinite words that are either fixed points of endomorphisms or of linear complexity. In the same paper, it is observed that if is an infinite word with all elements of as segments (which may be obtained for instance by concatenating all the words in a sequence enumerating the elements of ), then obviously is a positive example for Problem 1. Note that and it is easy to see that is not finitely generated: for the endomorphisms that maps each letter to itself, except for one letter that is mapped to , where is prime, the only elements of that are factors of it are the letter exchanges and the factors of which it is also a factor. From the preceding observation it also follows that there are no -atypical words. Thus, is a negative example for Problem 2 and a positive example for Problem 3.
Acknowledgements.
Besides the connection with Pansiot’s work, we thank the anonymous referee for comments that led to improved readability of this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Allouche and Shallit (1999) J.-P. Allouche and J. Shallit. The ubiquitous Prouhet-Thue-Morse sequence. In Sequences and their applications (Singapore, 1998) , Springer Ser. Discrete Math. Theor. Comput. Sci., pages 1–16. Springer, London, 1999.
- 2Allouche and Shallit (2003) J.-P. Allouche and J. Shallit. Automatic Sequences: Theory, Applications, Generalizations . Cambridge University Press, 2003.
- 3Almeida and Costa (2013) J. Almeida and A. Costa. Presentations of Schützenberger groups of minimal subshifts. Israel J. Math. , 196:1–31, 2013.
- 4Almeida et al. (2020) J. Almeida, A. Costa, R. Kyriakoglou, and D. Perrin. Profinite semigroups and symbolic dynamics , volume 2274 of Lect. Notes in Math. Springer, Cham, 2020.
- 5Berstel (1995) J. Berstel. Axel Thue’s papers on repetitions in words: a translation. Technical report, Université du Québec à Montréal, 1995. Publications du La CIM 20, available at http://www-igm.univ-mlv.fr/~berstel/Articles/1994 Thue Translation.pdf .
- 6Brlek (1989) S. Brlek. Enumeration of factors in the Thue-Morse word. Discrete Appl. Math. , 24:83–96, 1989.
- 7Clifford and Preston (1961) A. H. Clifford and G. B. Preston. The Algebraic Theory of Semigroups , volume I. Amer. Math. Soc., Providence, R.I., 1961.
- 8de Luca and Varricchio (1989) A. de Luca and S. Varricchio. Some combinatorial properties of the Thue-Morse sequence and a problem in semigroups. Theor. Comp. Sci. , 63(3):333–348, 1989.
