TL;DR
This paper investigates repetitions in infinite rich words, establishing lower bounds on repetition thresholds and constructing a specific example over a binary alphabet with a notably small critical exponent, advancing open problems in the field.
Contribution
It provides new lower bounds on repetition thresholds and constructs an infinite rich word with a minimal critical exponent over a binary alphabet, addressing a 2017 open problem.
Findings
Established lower bounds on repetition thresholds for rich words over 2 and 3-letter alphabets.
Constructed an infinite rich word over binary alphabet with critical exponent $2+rac{\sqrt{2}}{2}$.
First progress on Vesti's open problem from 2017.
Abstract
Rich words are characterized by containing the maximum possible number of distinct palindromes. Several characteristic properties of rich words have been studied; yet the analysis of repetitions in rich words still involves some interesting open problems. We address lower bounds on the repetition threshold of infinite rich words over 2 and 3-letter alphabets, and construct a candidate infinite rich word over the alphabet with a small critical exponent of . This represents the first progress on an open problem of Vesti from 2017.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: School of Computer Science, University of Waterloo
Waterloo, ON N2L 3G1, Canada
11email: [email protected]
11email: [email protected]
Repetitions in infinite palindrome-rich words
Aseem R. Baranwal
Jeffrey Shallit
Abstract
Rich words are characterized by containing the maximum possible number of distinct palindromes. Several characteristic properties of rich words have been studied; yet the analysis of repetitions in rich words still involves some interesting open problems. We address lower bounds on the repetition threshold of infinite rich words over 2 and 3-letter alphabets, and construct a candidate infinite rich word over the alphabet with a small critical exponent of . This represents the first progress on an open problem of Vesti from 2017.
Keywords:
Critical exponent Repetitions Rich words Palindrome
1 Introduction
Palindromes—words equal to their reversal—are among the most widely studied repetitions in words. The class of palindrome-rich words, or simply rich words—those words containing the maximum possible number of palindromes—was introduced in the papers [3, 8, 10]. Since then, rich words have received much attention in the combinatorics on words literature; see, for example, [4, 12, 21].
1.1 Preliminaries
In this section we provide the preliminary definitions and results that we use throughout the paper, along with the motivation behind our work.
Definition 1
A finite word is rich if it contains distinct nonempty palindromes. An infinite word is rich if all its factors are rich.
We say that a word has exponent and period , where is a positive rational number that denotes the number of times is repeated. We say is primitive if its only integer exponent is . The word is an overlap if where is a prefix of .
Example 1
The word is rich, because it has 8 distinct nonempty palindromes as factors, while the word is not rich. The word has period 4 and exponent 2, since , where and .
Definition 2
For a given alphabet , a mapping on is an antimorphism if for all .
Definition 3
The critical exponent of an infinite word is defined to be the supremum of the set of all rational numbers such that there exists a finite nonempty factor of with exponent .
Definition 4
The repetition threshold on an alphabet of size is the infimum of the set of exponents such that there exists an infinite word that avoids greater than -powers.
In other words, the repetition threshold is the smallest possible critical exponent of a word over an alphabet of size . Dejean gave a famous conjecture about this threshold in [9], which was proven by Currie and Rampersad [7], and independently by Rao [18]. The repetition threshold can also be studied for a limited class of infinite words. For example, Rampersad et al. studied this threshold for infinite balanced words in [17]. In this paper, we study the repetition threshold for infinite rich words over an alphabet of size .
1.2 Previous work
Let the word be the fixed point of a given involutive antimorphism . We say is a -palindrome if . The set of -palindromic factors of a word is denoted by . In 2013, Pelantová and Starosta introduced the idea of -palindromic defect.
Definition 5
The -palindromic defect of a finite word , denoted by , is defined as
[TABLE]
where = |\big{\{}\{a,\Theta(a)\}:a\in\Sigma\text{, a occurs in }w\text{ and }a\neq\Theta(a)\big{\}}|.
Further, they proved that all recurrent words with a finite -palindromic defect contain infinitely many overlapping factors [16]. This result leads to the following theorem [16].
Theorem 1.1
All infinite rich words contain a square.
Theorem 1.1 provides a lower bound on the repetition threshold for infinite rich words over a -letter alphabet; namely . In [22], Vesti gives both upper and lower bounds on the length of the longest square-free rich words, and proposes the open problem of determining the repetition threshold for infinite rich words.
2 Results over the binary alphabet
We construct an infinite binary rich word and determine the value of its critical exponent. We further conjecture that this value is the repetition threshold for the binary alphabet, based on supporting evidence from computation. We define the word as the image of a fixed point, , where the morphisms and are defined as follows:
[TABLE]
2.1 Automatic theorem-proving
We utilize the automatic theorem-proving software Walnut, written by Hamoon Mousavi, to constructively decide first-order predicates concerning the word [14]. To enable Walnut to work with the word , we require an automaton with output that produces . Computing the lengths for , we note that
[TABLE]
Since the Pell numbers are defined by the recurrence , , and , this suggests that the word is Pell-automatic, meaning that there exists an automaton that takes as input an integer represented in the Pell number system, and outputs the symbol in at index . The Pell number system is a non-standard positional number system in the family of Ostrowski numeration systems [15]. We utilize the Pell adder constructed in [2] to enable writing predicates in this number system. The Walnut version equipped with the adder is available on GitHub.111Repository: https://github.com/aseemrb/Walnut/ .
2.2 Constructing the automaton
Using the methods of Angluin [1], we construct an automaton with output for the word . Figure 1 represents the automaton. Note that this automaton consists of 4 states, and we have not restricted the Pell representations to be unique for each integer, meaning that the input may end with a 2, and a non-zero digit may follow a 2. The node labels in the figure represent the state and the corresponding output symbol.
Before we proceed, we prove that this automaton produces the same word as given by . To do this, we restrict the automaton in Figure 1 to only consider unique integer representations in the Pell number system. Thus, the least significant digit is , and a 2 is always followed by a 0. This gives the automaton in Figure 2, which represents for morphisms and , given by
[TABLE]
2.3 Proof of equivalence of the morphisms
In this section, we prove that the automaton in Figure 2 produces the same infinite word as that produced by morphisms and . We need two lemmas to prove this equivalence.
Lemma 1
For all , we have .
Proof
We prove this by induction on . For , we have that
[TABLE]
So the base case holds. Next, we construct the induction hypothesis,
[TABLE]
For the inductive step, consider . Using the definition of the morphisms and , we have that,
[TABLE]
Using the induction hypothesis in Eq. (1), we get
[TABLE]
This completes the proof.
Lemma 2
For all , .
Proof
The proof is similar to that of 1, by induction on . For , we have
[TABLE]
So the base case holds. We have the induction hypothesis,
[TABLE]
For the inductive step, consider . Using the definition of the morphisms and , we have that
[TABLE]
Using the induction hypothesis in Eq. (2), we get
[TABLE]
This completes the proof.
Now we prove the following equivalence theorem about the words produced by the automaton in Figure 2 and the word given by morphisms and .
Theorem 2.1
The infinite words and are equal.
Proof
We prove this by a simultaneous induction on with 3 hypotheses.
[TABLE]
The base case can be checked by hand. Assume that the hypotheses hold for . Next, we consider the following inductive steps using the definitions of and .
[TABLE]
[TABLE]
[TABLE]
This proves that the hypotheses are true. From Eq. (3), we have . Letting , we get . This completes the proof.
2.4 Proof of palindromic richness
We claim that the infinite word is rich. The proof is carried out using Walnut by constructing a set of predicates based on Theorem 2.2, as done in [20]. We say that a word has a unioccurrent suffix if is not a factor of any proper prefix of .
Theorem 2.2
(Glen et al. [10]) A word is rich if and only if every prefix of has a unioccurrent palindromic suffix.
In the following predicates, R denotes the automaton in Figure 2. First, we introduce the fundamental predicates that form the building blocks for verification of the richness property.
The predicate FactorEq takes 3 parameters and evaluates to true if the length- factors of starting at indices and are equal. 2. 2.
The predicate Occurs takes 4 parameters and evaluates to true if the length- factor of starting at index occurs in the length- factor starting at index , i.e., is a factor of . 3. 3.
The predicate Palindrome takes 2 parameters and evaluates to true if the length- factor of starting at index is a palindrome.
1def FactorEq "?msd_pell Ak (k < n) => (R[i + k] = R[j + k])";2def Occurs "?msd_pell (m <= n) &3 (Ek (k + m <= n) & $FactorEq(i, j + k, m))";4def Palindrome "?msd_pell Aj,k ((k < n) & (j + k + 1 = n)) =>5 (R[i + k] = R[i + j])";
By Theorem 2.2, for any finite word to be rich, it is sufficient to check if all its prefixes have a unioccurrent palindromic suffix. We use this property to construct the predicate RichFactor which takes two parameters , and evaluates to true if the length- factor of starting at index is rich. Figure 3 shows the representation of variables in the predicate.
1def RichFactor "?msd_pell2 Am ((m >= 1) & (m < n)) =>3 (Ej (i <= j) & (j < i + m) &4 Palindrome(j, i + m - j) &5 ~Occurs(j, i, i + m - j, m - 1))";
Now, we simply check that all prefixes of are rich to show that the infinite word is rich. The following predicate, R_Is_Rich evaluates to true, which completes the proof.
1eval R_Is_Rich "?msd_pell An $RichFactor(0, n)";
2.5 Determining the critical exponent
To determine the critical exponent, first, we compute the periods such that a repetition with exponent and period occurs in .
1eval HighPowPeriods "?msd_pell (p >= 1) &2 (Ei Aj (2j <= 3p) => R[i + j] = R[i + j + p])":The language accepted by the produced automaton is , which is the Pell-base representation of numbers of the form , for . Next, we compute pairs of integers such that has a factor of length with period , and this factor cannot be extended to a longer factor of length with the same period.
1def MaximalReps "?msd_pell Ei2 (Aj (j < n) => R[i + j] = R[i + j + p]) &3 (R[i + n] != R[i + n + p])";Finally, we compute the pairs where matches the regular expression in the Pell base representation, and is the maximum possible length of any factor with period .
1eval HighestPowers "?msd_pell2 HighPowPeriods(p) &3 MaximalReps(n, p) &4 (Am $MaximalReps(m, p) => m <= n)";
Figure 4 shows the automaton produced by the predicate HighestPowers. It accepts pairs of the following forms:
[TABLE]
Here, the length of the words is and the period is . Eq. (6) corresponds to and . Thus we have
[TABLE]
Eq. (7) corresponds to
[TABLE]
Eq. (8) corresponds to
[TABLE]
Putting for (7), and for (8), we notice that the expressions for and coincide.
[TABLE]
Since Pell numbers are the convergents of , and the ratio converges to , we have that
[TABLE]
For , as , the value in Eq. (9) is increasing, and tends to . Thus, the critical exponent of the word is . The Walnut commands for verifying richness and computing the critical exponent are available on GitHub.222URL: https://github.com/aseemrb/Walnut/blob/master/CommandFiles/rich2.txt .
2.6 Optimality of the critical exponent
A backtracking computation shows that the longest rich binary word with critical exponent is of length 1339. Combining this with the result above, we obtain the following bounds.
[TABLE]
2.7 Larger alphabets
For an alphabet of size , backtracking search shows that . The longest word that has a critical exponent is of length 114. For and the exponent threshold , our search program has reached words of length 3800 and has not terminated.
3 Faster backtracking
In this section, we discuss some methods to optimize our backtracking algorithm. The most obvious optimization is to consider the following.
Without loss of generality, we assume that the word starts with a 0. 2. 2.
We impose the restriction that the first occurrence of the symbol occurs before the first occurrence of symbol if .
3.1 Lyndon method
Since our goal is to check if there is an infinite rich word with critical exponent less than a preset threshold, we can utilize the Lyndon method to prune certain branches of the backtracking search tree. A Lyndon word is a primitive nonempty word that is strictly smaller in lexicographic order than all of its rotations. If a word satisfies the properties of richness and the critical exponent being less than some threshold, then all factors of the word also satisfy these properties. This fact helps us by pruning those paths in the search tree that lead to a suffix that is lexicographically smaller than the word itself.
3.2 Counting palindromes
To check for richness, Groult et al. give a linear time algorithm to count the number of distinct palindromes in a word [11]. Their algorithm is based on two major ideas: a linear-time algorithm by Gusfield to compute all maximal palindromes in a word [13], and a linear-time algorithm by Crochermore and Ilie to compute the LPF (longest previous factor) array [6]. However, their approach is not helpful to our problem since it requires linear pre-processing time.
What we require is a fast online algorithm such that given the number of distinct palindromes for a word over an alphabet , we can find the number of distinct palindromes in the word for all in constant amortized time. Such an algorithm is given by Rubinchik and Shur [19]. Their primary idea is to construct a graph where each node represents a unique palindrome. There are two types of edges in this graph:
Border edge: This is a directed edge from to labeled , if for some . 2. 2.
Suffix edge: This is an unlabeled directed edge from to , if is the longest proper palindromic suffix of .
Whenever we append a new symbol to an already processed word, it takes amortized constant time to maintain this graph. The C++ implementation of the algorithm can be found on GitHub.333URL: https://github.com/aseemrb/research-scripts/blob/master/scripts/palin.cpp .
Example 2
Figure 5 shows the graph construction for the rich word . The number of nonempty palindromes is equal to 7. Note that we have an imaginary word that has length and is a palindrome. The suffix edges are shown by dashed lines, while the border edges are shown with solid lines having labels. We say that a palindrome consisting of a single symbol borders , which makes the implementation of the algorithm easy.
3.3 Computing maximal runs
In [5], Chen et al. present a survey of fast space-efficient algorithms for computing all maximal runs in a string. They also propose some new and faster algorithms for the same. In future work, we aim to understand and implement these algorithms in our backtracking search, so that we are able to compute tighter lower bounds on the repetition threshold more efficiently.
4 Future prospects
An obvious direction for further research is to develop novel ideas and methods that may help us prove lower bounds on the repetition threshold of infinite rich words. Another possible direction is to construct infinite rich words over larger alphabets that may serve as candidates for the repetition threshold.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75 (2), 87–106 (1987)
- 2[2] Baranwal, A.R., Shallit, J.: Critical exponent of infinite balanced words via the Pell number system. Preprint: https://arxiv.org/abs/1902.00503 (2019)
- 3[3] Brlek, S., Hamel, S., Nivat, M., Reutenauer, C.: On the palindromic complexity of infinite words. Internat. J. Found. Comp. Sci. 15 , 293–306 (2004)
- 4[4] Bucci, M., De Luca, A., Glen, A., Zamboni, L.Q.: A new characteristic property of rich words. Theoret. Comput. Sci. 410 , 2860–2863 (2009)
- 5[5] Chen, G., Puglisi, S.J., Smyth, W.F.: Fast & practical algorithms for computing all the runs in a string. In: Ma, B., Zhang, K. (eds.) CPM 07, LNCS, vol. 4580, pp. 307–315. Springer-Verlag (2007)
- 6[6] Crochemore, M., Ilie, L.: Computing longest previous factor in linear time and applications. Inform. Process. Lett. 106 (2), 75–80 (2008)
- 7[7] Currie, J., Rampersad, N.: A proof of Dejean’s conjecture. Math. Comp. 80 (274), 1063–1070 (2011)
- 8[8] de Luca, A., Glen, A., Zamboni, L.Q.: Rich, Sturmian, and trapezoidal words. Theoret. Comput. Sci. 407 , 569–573 (2008)
