How far away must forced letters be so that squares are still avoidable?
Matthieu Rosenfeld

TL;DR
This paper introduces a non-constructive method to determine how far apart forced letters must be in an infinite word to still avoid squares, providing bounds for different alphabet sizes and demonstrating the technique's potential for broader applications.
Contribution
The paper presents a novel non-constructive approach to avoid squares in infinite words with forced letter positions, establishing new distance bounds for various alphabet sizes.
Findings
Squares are avoidable with forced letters separated by at least 19, 3, or 2 positions depending on alphabet size.
Exponential lower bounds are established on the number of solutions.
The technique relies on computer-verified existence of certain languages.
Abstract
We describe a new non-constructive technique to show that squares are avoidable by an infinite word even if we force some letters from the alphabet to appear at certain occurrences. We show that as long as forced positions are at distance at least 19 (resp. 3, resp. 2) from each other then we can avoid squares over 3 letters (resp. 4 letters, resp. 6 or more letters). We can also deduce exponential lower bounds on the number of solutions. For our main Theorem to be applicable, we need to check the existence of some languages and we explain how to verify that they exist with a computer. We hope that this technique could be applied to other avoidability questions where the good approach seems to be non-constructive (e.g., the Thue-list coloring number of the infinite path).
| 9 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | |
| 4 | 19 | 22 | 28 | 36 | 50 | 63 | 88 | 118 | 148 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
How far away must forced letters be so that squares are still avoidable?
Matthieu Rosenfeld
Abstract
We describe a new non-constructive technique to show that squares are avoidable by an infinite word even if we force some letters from the alphabet to appear at certain occurrences. We show that as long as forced positions are at distance at least 19 (resp. 3, resp. 2) from each other then we can avoid squares over 3 letters (resp. 4 letters, resp. 6 or more letters). We can also deduce exponential lower bounds on the number of solutions. For our main Theorem to be applicable, we need to check the existence of some languages and we explain how to verify that they exist with a computer. We hope that this technique could be applied to other avoidability questions where the good approach seems to be non-constructive (e.g., the Thue-list coloring number of the infinite path).
1 Introduction
A square is a word of the form where is a non empty word. We say that a word is square-free (or avoids squares) if none of its factors is a square. For instance, is a square while is square-free. In 1906, Thue showed that there are arbitrarily long ternary words avoiding squares [8]. This result is often regarded as the starting point of combinatorics on words, and the generalizations of this particular question received a lot of attention. The authors of [2] study three such questions asked by Harju [5]. They also introduced a stronger version of the third problem.
Problem 1** ([2, Problem 4]).**
Let be an integer and let be any infinite ternary word. Does there exist an infinite ternary square-free word such that for all , ?
They give a partial solution to this question and they show that the answer is yes for any if . In fact, they showed something slightly stronger. Let be the smallest integer such that for all and for all sequence of indices such that , , there is an infinite square-free word such that . They showed that . Moreover, the fact that squares are avoidable over 3 letters can be used to show that . We show that , , for .
The main theorem of this paper gives sufficient conditions for the existence of square-free languages that fulfill some constraints. Kolpakov showed that there are more than square-free words of length over a ternary alphabet using a new non-constructive technique [6]. One of the ideas behind Kolpakov’s result is roughly to approximate (using a computer) the language of square-free words by the language of words avoiding squares of period less than for large , and to show that we do not lose too many words if we remove the larger squares from this language. We use a similar idea in this paper. We also use ideas from the power series method (see for instance [1, 7]) even if we do not explicitly manipulate any power series. It seems to be a good approach to show that the Thue-list number of paths is (see [3, 4] for definitions and conjectures on this topic) or to tackle other problems that might require a non-constructive approach.
This paper is organized as follows. We start by fixing some notations in Section 2. In Section 3, we give a weaker version of Theorem 4 to present the ideas of the theorem without some of the technicalities. Then in Section 4, we give the proof of Theorem 4, our main theorem. In Section 5, we explain how to verify with a computer the existence of some languages that are required to apply Theorem 4. Finally, in Section 6, we use Theorem 4 to bound the values of for different alphabet sizes.
2 Definitions and notations
We denote the set of non-negative integer (resp. positive integers) by (resp. ). For any word , we denote the th letter of by and the length of by . Then for any , . For any set of non-empty words , we let (resp. ) be the set of words obtained by catenation of finitely many (resp. infinitely many) elements of . A language over an alphabet is a set of finite words over this alphabet. We use the convention that and (we could use for the second one, but it is slightly less convenient for the implementation).
A partial word over is a (possibly infinite) word over the alphabet . For any partial word and word , we say that is compatible with if and for all such that and are defined. We denote by the set of square-free words that are compatible with the partial word .
3 Idea of Theorem 4
The main Theorem of this paper is Theorem 4. The main idea of this theorem is that if a language avoids short squares and is large enough then it contains square-free words of any length. The statement and proof of this theorem are rather difficult to follow so we give in this section a version of the Theorem for the case where the set is a singleton . We hope that this helps to convey the ideas of the proof of Theorem 4. This is in fact really similar to the ideas of [7], but instead of building the word letter by letter, we construct it factor by factor. For that we fix one size of a factor and look at the number of words whose length corresponds to multiples of this size.
Theorem 2**.**
Let be an alphabet, be a finite partial word and such that divides . Suppose that there are and a language such that:
- (I)
. 2. (II)
For all , avoids squares of period less than . 3. (III)
For any there are at least different words compatible with such that . 4. (IV)
There exists such that:
[TABLE]
Then is infinite.
Proof.
Let . Let be a set of words from that are compatible with such that, for any of length divisible by , there are exactly different words compatible with with . Conditions (I),(II) and (III) imply that such a set can be obtained by removing words from . For all non-negative , let be the number of square-free words of of length .
We will show by induction on that for all positive , . Let be a positive integer such that:
[TABLE]
By definition of , for any word of there are exactly different factors of length such that is in . Let be the set of words in of length whose prefix of length is in . Then by definition:
[TABLE]
In order to bound , let us introduce for all , . That is, is the set of words of that contain a square whose midpoint (the middle of the square) is located between the positions and in the word. Clearly , so our next task is to compute bounds on for all .
Lemma 3**.**
We have the following inequalities:
- •
for all , ,
- •
for all , .
Proof.
If , then . Since, does not contain squares of period less than , .
Now, let . For any and let be the set of words of that admit as a prefix.
By definition of , any word contains a square whose second half starts in position and ends in position where and . Given , and , we know the first half of the square and thus the word is known at least up to position . By definition of there are at most possible values for the remaining letters. By summing over all the values of and one gets: . By summing over all the values of , we finally get .∎
Now, by (IH1), for all , and thus:
[TABLE]
We can use this bound in inequality (1) and we get:
[TABLE]
This concludes the proof that for all positive , . Since , we deduce that is unbounded and thus is infinite.∎
4 The main theorem
This section is devoted to the proof of the main Theorem. As already mentioned the ideas of the proof are the same as for the proof of Theorem 2. However, this is more technical because is not a singleton anymore. Moreover, we want the equivalent of condition (IV) to be as general as possible and for that, we need to bound the size of as tightly as possible. Thus the equivalent of Lemma 3 (Lemma 5) is much more technical and we delay its proof to a later subsection.
Theorem 4**.**
Let be an alphabet, be a finite set of finite partial words, be an integer. Suppose that there is a language and a function such that:
- (I)
. 2. (II)
For any and there are at least different words compatible with and such that . 3. (III)
For all , avoids squares of period less than . 4. (IV)
For all and integer , let
[TABLE]
and
[TABLE]
There exist and solution of the following system:
[TABLE]
Then for any infinite partial word , is infinite.
Proof.
Let and be a sequence of elements of such that . For any integer , let . Let be a set of words from that are compatible with such that, for any of length , there are exactly different words compatible with with . That is, we remove words from in order to replace the “at least ” by “exactly ”. For all non-negative , let be the number of square-free words of of length .
We will show by induction on that for all positive , .
Let be a positive integer such that:
[TABLE]
By definition of , for any word of of length there are exactly different factors of length such that is in . Let be the set of words in of length whose prefix of length is in . Then by definition:
[TABLE]
In order to bound , let us introduce for all , . That is, is the set of words of that contain a square whose midpoint (the middle of the square) is located between the positions and in the word. Clearly , so our next task is to compute the values of for all i. Let be the smallest integer such that and let . Remark that .
Lemma 5**.**
We have the following inequalities:
- •
for all , ,
- •
,
- •
for all , .
The proof of this Lemma is not really informative and is mostly a rather technical counting argument, so we moved it to Section 4.1.
We can use the bounds on the sizes of the s to bound :
Lemma 6**.**
We have
[TABLE]
Proof.
First, let us show by induction on that for all :
[TABLE]
Let us first show that this is true with , using the fact that .
[TABLE]
Now, let be an integer such that (IH2) is true for .
[TABLE]
Thus equation (IH2) is true for all and in particular for and we get:
[TABLE]
This concludes the proof of this Lemma. ∎
By induction hypothesis (IH1) . Let us bound the product on the right hand side:
[TABLE]
Now, using this equation with Lemma 6 gives
[TABLE]
Now recall that and thus by definition of , . We deduce:
[TABLE]
We can finally replace by this bound in inequality (2) and we get:
[TABLE]
Moreover and thus for all , . For all , , so we conclude that is infinite. ∎
Remark that Theorem 4 is far from sharp. One could improve the bounds given by Lemma 5. This could be done by lowering and or by introducing a third coefficient for the second non-empty . However, we were not able to obtain significant improvement that were worth the additional technicalities.
In Section 5 we explain how to verify with a computer that there exists a language that satistfies conditions (I),(II) and (III). We also need a way to verify condition (IV). In order to compute , we can use that and, for all , . Thus given the values of the one can compute using a dynamic algorithm and all the rest is straight forward to compute. Thus it is easy to verify with a computer whether or not a given set of values of is a solution. We provide a C++ program that takes as input , , , and and verifies whether this is a solution of the equations of condition (IV).
4.1 Proof of Lemma 5
This subsection is dedicated to the proof of Lemma 5. Remark that the statement and proof are not self-contained since some of the notations are defined in the proof of Theorem 4.
See 5
Proof.
If then by definition . Moreover, does not contain squares of period less than and thus .
Now, let . By definition, any word from can be written with . For any and , let be the set of words of that admit as a prefix. Clearly .
Let (resp. ) be integers such that there is an element that contains a square starting at (resp. ) and of period (resp. ) with and . Because of the square in , we know that for all , . If then contains a square which is not possible. Hence
[TABLE]
Let be a word that contains a square starting at and of period then we know its suffix of size . Thus there are at most possibilities, moreover since the size of the unknown suffix is , it is square-free and there are at most possibilities. Thus for a fixed and value of , the number of ways to add a suffix to to obtain an element of that contains a square of period starting at is:
[TABLE]
But since is a non-increasing function in , the maximum is reached when we pack the as much as we can on the lowest values of . Thus this quantity is in fact equal to:
[TABLE]
Now we can sum over all the values of and we get:
[TABLE]
Summing over all the possible yields .
The remaining case is . Once again, let (resp. ) be integers such that there is an element of that contains a square starting at (resp. ) and of period (resp. ) with and . By definition of , . We can use equation (3) again and we get:
[TABLE]
This is a contradiction with the fact that . Thus given the value of there is at most one possible value for and . The number of ways for a fixed and value of to complete with a suffix into an element of is at most:
[TABLE]
Then by summing over all the possible values of , we get:
[TABLE]
We can use the variable substitution and remark that and we get:
[TABLE]
We conclude that by summing over all .∎
5 Finding a set that satisfies Theorem 4
In this section, we explain how to verify the existence of a language that fulfills conditions (I),(II) and (III) of Theorem 4.
We consider some particular directed labeled graphs: is a set of vertices together with a set of labeled arcs. For any and , is an arc from to with label . These graphs could also be seen as finite state machines where all the states are initial and final.
The Rauzy graph of length of a factorial language over is the graph where and . For any graph and any set , we denote by the subgraph induced by .
Let be the Rauzy graph of length of the square-free words over . Remark, that the factors of length of any walk on correspond to edges of and by definition they are square free. Thus, the sequence of labels of any walk on avoids squares of period less than , but can contain longer squares. We let be the set of words that contains no square of period less than (from the previous remark can also be seen as the set of walks on ).
As an illustration, we give in Fig. 1 without the arc labels.
For this Section, we abuse the notation and allow ourself to identify words and sequences.
For any graph and partial word , we define inductively for any integer and vertex :
[TABLE]
Intuitively, gives the number of walks of length starting from that are compatible with the last letters of . Indeed, there is one walk of length [math] and we always take the transition that is labeled by the current letter of and any transition if this letter is . Remark that in the third case, there are in fact either [math] or summands in the sum.
Lemma 7**.**
Let , , and a non-empty set . If for all and , , then there exists a language such that:
- •
,
- •
for all , avoids squares of period less than ,
- •
for any and there are at least different words compatible with such that .
Proof.
Let be the set of sequences of labels that correspond to a walk in . By definition, the two first conditions on are fulfilled.
Let and . If we let be the suffix of length of . Otherwise, we let such that and is a suffix of (there is such an element in ). Each walk of length starting in gives a unique sequence of labels such that contains no square of period less than .
We easily deduce, by induction on , that for all the number of walks of length starting at that are compatible with is at least . So, in particular, the number of walks of length starting at and compatible with is at least . This concludes the proof. ∎
In fact, we need something stronger because for the values of that we use the graphs are too big to fit in a computer. We can exploit symmetries of to work on a smaller equivalent graph.
For any square-free word , we let be the shortest suffix of such that for all there exists with . If , then is such a suffix of itself (since is empty) and thus there is always a shortest suffix.
For instance, with we have . For any letter , the word is square free if and only if is square free. Indeed, a new square is necessarily a suffix of , it is enough to look at the two letters in bold in to deduce that there is no square of length and all the other possible suffix of even length of are also suffixes of . In fact, for any of size , the word avoids squares if and only if avoids squares (this is proven in the next lemma) and this is the main motivation behind the definition of (in particular, words with the same image by can be extended in the same way).
For any graph , let be the graph such that . The next lemma tells us that we only need to consider the walks on instead of the walks on .
Lemma 8**.**
Let be a positive integer and , and . Let . Then for all , .
Proof.
Let us first show that for any and with if then . Let us show that under these assumptions, for any , avoids squares of period . Since is square free, we only need to show that no suffix of is a square. We have to distinguish between two cases:
- •
. Suppose for the sake of contradiction that there is a square of period in . We deduce that the suffix of length of contains a square of period . That is, contains a square of period which is a contradiction.
- •
. Since is an integer we get . Moreover and thus . Thus by definition of , there exists such that . Thus there is such that . Remark, that . We conclude that there is such that . This implies that the suffix of length of is not a square of period .
We deduce that for any and with if then .
Let , and such that . By definition of this implies that there is with and . Thus and . From the previous paragraph, it implies that is square-free. Let us show that . By definition, for all there exists with . We easily deduce that for all there exists with . This implies that is a suffix of . Since , we deduce that is also a suffix of . Since and is a suffix of , we get that . We showed that if there are , and such that , then there exists such that and .
We deduce that for all , . The other inclusion is clear from the definition of and we get for all :
[TABLE]
By definition of , for any there is at most one outgoing arc for every label in the set of the right. Since the two sets are equals, we deduce that every vertex of the graph has at most one outgoing arc for any label. Intuitively, (4) implies (by induction on the length of the walk) that for any the set of labeled walks from to in is equal to the set of labeled walks from to in . We are now ready to show by induction on that for all and , . By definition .
Let be a positive integer such that for all ,
[TABLE]
Then, for all , if we get:
[TABLE]
The case where is similar. ∎
Using Lemma 7 together with Lemma 8, we get the following lemma:
Lemma 9**.**
Let , , and be a non-empty set. If for all and , , then there exists a language such that:
- •
,
- •
for all , avoids squares of period less than ,
- •
for any and there are at least different words compatible with and such that .
The graph is much smaller than and we can use a computer to check the conditions of this lemma for the values of that we used. One should first find the graph . The following fact allows us to easily compute the set of vertices of without computing :
Lemma 10**.**
Let . Then if and only if is the smallest non-empty suffix of such that .
Moreover, given a graph , the definition of gives a trivial dynamic algorithm that computes in time . Starting with and inductively removing from all the vertices for which gives the largest subgraph that meets the conditions of Lemma 9. As long as this subgraph is not empty one can then apply Lemma 9. Algorithm 1 computes the largest subgraph of with the required property.
6 Application of Theorem 4
In this section we apply Theorem 4. We provide a C++ implementation of Algorithm 1 that verifies the existence of the language that fulfill conditions (I),(II) and (III) from Theorem 4. Condition (IV) can be easily verified (as long as solutions are given) and we also provide a C++ code to do that. 111The codes can be found in the ancillary files of https://arxiv.org/abs/1903.04214
Theorem 11**.**
For any alphabet , let be the smallest integer such that for all and for all sequence such that , , there is an infinite square-free word such that . Then:
- •
,
- •
,
- •
,
- •
if , .
Proof.
For any alphabet , we have . Moreover, is a decreasing function of the size of the alphabet. Thus the third statement can easily be deduced from the second one. We show the remaining statements independently of each others. We will show the upper bounds using Algorithm 1, Lemma 9 and Theorem 4. The lower bounds are verified by exhaustive search.
If : Let and We can use Algorithm 1 to check that we can apply Lemma 9 with , , . Thus conditions (I),(II) and (III) of Theorem 4 are fullfilled. We can check with a computer that condition (IV) of Theorem 4 is also fullfilled with and . This implies that for any there are infinite square-free words over compatible with . Since , we deduce that for any there are infinite square-free words over compatible with . We get .
: Let . An exhaustive search confirms that there are only square-free words over compatible with . Thus .
Let and We can use Algorithm 1 to check that we can apply Lemma 9 with , and , . We can then apply Theorem 4 with , and and we deduce that for any there are infinite square-free words over compatible with . Moreover, We deduce that for any there are infinite square-free words over compatible with . Thus .
: Let . An exhaustive search confirms that there are only square-free words over compatible with . Thus .
Let and We can use Algorithm 1 to check that we can apply Lemma 9 with and the values of given in Table 1. Thus conditions (I),(II) and (III) of Theorem 4 are fullfilled.
We can also check that the values of given in Table 1 fulfill condition (IV) of Theorem 4. We deduce that for any there are infinite square-free words over compatible with . Moreover, We deduce that for any there are infinite square-free words over compatible with . Thus .∎
The three applications of Algorithm 1 require between 30 and 100GB of RAM (and around 5 hours of computations). We had to optimize the way strings are stored in memory in order to be able to compute the graphs for large enough values of . The rest of the computations (finding the solution to the system and the exhaustive search) easily run on a laptop in a few milliseconds. Remark that we showed something slightly stronger since the results would still hold if an adversary was to tell us at every choice of letter only the next forced letters with their positions (that is, we know the next element of ).
Experimental computations suggest that is closer to 7 than to and that .
Acknowledgement
Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and by the Walloon Region
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. P. Bell and T. L. Goh. Exponential lower bounds for the number of words of uniform length avoiding a pattern. Information and Computation , 205(9):1295–1306, 2007.
- 2[2] J. Currie, T. Harju, P. Ochem, and N. Rampersad. Some further results on squarefree arithmetic progressions in infinite words. Theoretical Computer Science , 799:140–148, 2019.
- 3[3] S. Czerwiński and J. Grytczuk. Nonrepetitive colorings of graphs. Electronic Notes in Discrete Mathematics , 28:453–459, 2007. 6th Czech-Slovak International Symposium on Combinatorics, Graph Theory, Algorithms and Applications.
- 4[4] J. Grytczuk, J. Kozik, and P. Micek. New approach to nonrepetitive sequences. Random Structures & Algorithms , 42(2):214–225.
- 5[5] T. Harju. On square-free arithmetic progressions in infinite words. Theoretical Computer Science , 2018.
- 6[6] R. M. Kolpakov. On the number of repetition-free words. Journal of Applied and Industrial Mathematics , 1(4):453–462, 2007.
- 7[7] P. Ochem. Doubled patterns are 3-avoidable. Electronic Journal of Combinatorics , 23(1), 2016.
- 8[8] A. Thue. Über unendliche Zeichenreihen. ’Norske Vid. Selsk. Skr. I. Mat. Nat. Kl. Christiania , 7:1–22, 1906.
