
TL;DR
This paper proves that the number of rich words of length n over any alphabet grows subexponentially, refining understanding of their combinatorial complexity and extending previous bounds.
Contribution
It establishes that the growth rate of rich words is subexponential for any alphabet size, generalizing prior results for binary alphabets.
Findings
The number of rich words grows subexponentially with length.
The limit of the nth root of the count of rich words is 1 for any alphabet.
This confirms the complexity of rich words is lower than exponential.
Abstract
Any finite word of length contains at most distinct palindromic factors. If the bound is reached, the word is called rich. The number of rich words of length over an alphabet of cardinality is denoted . For binary alphabet, Rubinchik and Shur deduced that for some constant . We prove that for any , i.e. has a subexponential growth on any alphabet.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On Number of Rich Words
Josef Rukavicka Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, CZECH TECHNICAL UNIVERSITY IN PRAGUE ([email protected]).
(January 25, 2017
Mathematics Subject Classification: 68R15)
Abstract
Any finite word of length contains at most distinct palindromic factors. If the bound is reached, the word is called rich. The number of rich words of length over an alphabet of cardinality is denoted . For binary alphabet, Rubinchik and Shur deduced that for some constant . We prove that for any , i.e. has a subexponential growth on any alphabet.
1 Introduction
The study of palindromes is a frequent topic and many diverse results may be found. In recent years, some of the papers deal with so-called rich words, or also words having palindromic defect [math]. They are words that have the maximum number of palindromic factors. As noted by [6], a finite word can contains at most distinct palindromic factors with being the length of . The rich words are exactly those that attain this bound. It is known that on binary alphabet the set of rich words contains factors of Sturmian words, factors of complementary symmetric Rote words, factors of the period-doubling word, etc., see [6, 4, 1, 13]. On multiliteral alphabet, the set of rich words contains for example factors of Arnoux–Rauzy words and factors of words coding symmetric interval exchange.
Rich words can be characterized using various properties, see for instance [8, 5, 2]. The concept of rich words can also be generalized to respect so-called pseudopalindromes, see [10]. In this paper we focus on an unsolved question of computing the number of rich words of length over an alphabet with letters. This number is denoted .
This question is investigated in [15], where J. Vesti gives a recursive lower bound on the number of rich words of length , and an upper bound on the number of binary rich words. Both these estimates seem to be very rough. In [9], C. Guo, J. Shallit and A.M. Shur constructed for each a large set of rich words of length . Their construction gives, currently, the best lower bound on the number of binary rich words, namely , where is a polynomial and the constant . On the other hand, the best known upper bound is exponential. As mentioned in [9], calculation performed recently by M. Rubinchik provides the upper bound for some constant , see [11].
Our main result stated as Theorem 4.3 shows that has a subexponential growth on any alphabet. More precisely, we prove
[TABLE]
In [14], Shur calls languages with the above property small. Our result is an argument in favor of a conjecture formulated in [9] saying that for some infinitely growing function the following holds true {R_{n}(2)}=\mathcal{O}\Bigl{(}\frac{n}{g(n)}\Bigr{)}^{\sqrt{n}} .
To derive our result we consider a specific factorization of a rich word into distinct rich palindromes, here called UPS-factorization (Unioccurrent Palindromic Suffix factorization), see Definition 3.2. Let us mention that another palindromic factorizations have already been studied, see [3, 7]: Minimal (minimal number of palindromes), maximal (every palindrome cannot be extended on the given position) and diverse (all palindromes are distinct). Note that only the minimal palindromic factorization has to exist for every word.
The article is organized as follows: Section 2 recalls notation and known results. In Section 3 we study a relevant property of UPS-factorization. The last section is devoted to the proof of our main result.
2 Preliminaries
Let us start with a couple of definitions: Let be an alphabet of letters, where and ( denotes the set of nonnegative integers). A finite sequence with is a finite word. Its length is and is denoted . Let denote the set of words of length . We define that contains just the empty word. It is clear that the size of is equal to .
Given and with , we say that is a factor of if there exists such that , and , , , .
A word is called a palindrome if . The empty word is considered to be a palindrome and a factor of any word.
A word of length is called rich if has distinct palindromic factors. Clearly, is rich if and only if its reversal is rich as well.
Any factor of a rich word is rich as well, see [8]. In other words, the language of rich words is factorial. In particular it means that for any . Therefore, the Fekete’s lemma implies existence of the limit of and moreover
[TABLE]
For a fixed , one can find the number of all rich words of length and obtain an upper bound on the limit. Using computer Rubinchik counted for , (see the sequence A216264 in OEIS). As , he obtained the upper bound given in Introduction.
As shown in [8], any rich word over alphabet is richly prolongable, i.e., there exist letters such that is also rich. Thus a rich word is a factor of an arbitrarily long rich word. But the question whether two rich words can appear simultaneously as factors of a longer rich word may have negative answer. It means that the language of rich words is not recurrent. This fact makes enumeration of rich words hard.
3 Factorization of rich words into rich palindromes
Let us recall one important property of rich words [6, Definition and Proposition ]: the longest palindromic suffix of a rich word has exactly one occurrence in (we say that the longest palindromic suffix of is unioccurrent in ). It implies that , where is a palindrome which is not a factor of . Since every factor of a rich word is a rich word as well, it follows that is a rich word and thus , where is a palindrome which is not a factor of . Obviously . We can repeat the process until is the empty word for some , . We express these ideas by the following lemma:
Lemma 3.1**.**
Let be a rich word. There exist distinct non-empty palindromes such that
[TABLE]
Definition 3.2**.**
We define UPS-factorization (Unioccurrent Palindromic Suffix factorization) to be the factorization of a rich word into the form (1).
Since in the factorization (1) are non-empty, it is clear that . From the fact that the palindromes in the factorization (1) are distinct we can derive a better upper bound for . The aim of this section is to prove the following theorem:
Theorem 3.3**.**
There is a constant such that for any rich word of length the number of palindromes in the UPS-factorization of satisfies
[TABLE]
Before proving the theorem, we show two auxiliary lemmas:
Lemma 3.4**.**
Let such that
[TABLE]
The number of palindromes in the UPS-factorization of any rich word with satisfies
[TABLE]
Proof.
Let be an infinite sequence of all non-empty palindromes over an alphabet with letters, where the palindromes are ordered in such a way that implies that . In consequence are palindromes of length , are palindromes of length , etc. Since are distinct non-empty palindromes we have . The number of palindromes of length over the alphabet with letters is equal to (just consider that that the “first half” of a palindrome determines the second half). The number equals the length of a word concatenated from all palindromes of length less than or equal to . Since , it follows that the number of palindromes is less than or equal to the number of all palindromes of length at most ; this explains the inequality (4). ∎
Lemma 3.5**.**
Let , , such that . We have
[TABLE]
Proof.
The sum of the first terms of a geometric series with the quotient is equal to . Taking the derivative of this formula with respect to with we obtain: . It follows that the right inequality of (5) holds for all and . The condition implies that , which explains the left inequality of (5). ∎
We can start the proof of Theorem 3.3:
Proof of Theorem 3.3.
Let be a minimal nonnegative integer such that the inequality (3) in Lemma 3.4 holds. It means that:
[TABLE]
where for the last inequality we exploited (5) with and . If , then the condition is fulfilled (it is the condition from Lemma 3.5) for any . Hence let us suppose that and . From (6) we obtain:
[TABLE]
Since is such that the inequality (3) holds and for any and , we can write:
[TABLE]
We apply a logarithm on the previous inequality:
[TABLE]
An upper bound for the number of palindromes in UPS-factorization follows from (4), (7), and (9):
[TABLE]
The previous inequality supposes that and . If then we can easily derive from (3) that and consequently . Thus the inequality holds as well for this case. Since every rich word over an alphabet with the cardinality is also a rich word over the alphabet with the cardinality , the estimate (2) in Theorem 3.3 holds if we set the constant as follows: . ∎
Remark 3.6*.*
Theorem 3.3 implies that average length of a palindrome of UPS-factorization of a rich word of length is . Note that in [12] it is shown that most of palindromic factors of a random word of length are of length close to .
4 Rich words form a small language
The aim of this section is to show that the set of rich words forms a small language, see Theorem 4.3.
We present a recurrent inequality for . To ease our notation we omit the specification of the cardinality of alphabet and write instead of .
Denote , where is the constant from Theorem 3.3 and .
Theorem 4.1**.**
Let , then
[TABLE]
Proof.
Given , let denote the number of rich words with UPS-factorization , where for . Note that any palindrome is uniquely determined by its prefix of length ; obviously this prefix is rich. Hence the number of words that appears in UPS-factorization as cannot be larger than . It follows that . The sum of this result over all possible (see Theorem 3.3) and completes the proof. ∎
Proposition 4.2**.**
If such that for all , then .
Proof.
For any integers , the assumption implies that
. Exploiting (11) we obtain:
[TABLE]
The sum
[TABLE]
can be interpreted as the number of ways how to distribute coins between people in such a way that everyone has at least one coin. That is why .
It is known (see Appendix for the proof) that
[TABLE]
From (12) we can write: . To evaluate , just recall that for any constant and moreover . ∎
The main theorem of this paper is a simple consequence of the previous proposition.
Theorem 4.3**.**
Let denote the number of rich words of length over an alphabet with letters. We have .
Proof.
Let us suppose that . We are going to find such that . The definition of a limit implies that there is such that for any , i.e. . Let . It holds for any that . Using Proposition 4.2 we obtain , and this is a contradiction to our assumption that . ∎
5 Appendix
For the reader’s convenience, we provide a proof of the well-known inequality we used the proof of Proposition 4.2.
Lemma 5.1**.**
, where and .
Proof.
Consider . The binomial theorem states that
[TABLE]
By dividing by the factor we obtain
[TABLE]
Since and , then , it follows that
[TABLE]
Let us substitute and let us exploit the inequality , that holds for all :
[TABLE]
∎
Acknowledgments
The author wishes to thank Edita Pelantová and Štěpán Starosta for their useful comments. The authors acknowledges support by the Czech Science Foundation grant GAČR 13-03538S and by the Grant Agency of the Czech Technical University in Prague, grant No. SGS14/205/OHK4/3T/14.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Balková , Beta-integers and Quasicrystals , Ph D thesis, Czech Technical University in Prague and Université Paris Diderot-Paris 7, 2008.
- 2[2] L. Balková, E. Pelantová, and Š. Starosta , Sturmian jungle (or garden?) on multiliteral alphabets , RAIRO-Theor. Inf. Appl., 44 (2010), pp. 443–470.
- 3[3] H. Bannai, T. Gagie, S. Inenaga, J. Kärkkäinen, D. Kempa, M. Piątkowski, S. J. Puglisi, and S. Sugimoto , Diverse palindromic factorization is NP-complete , in Developments in Language Theory: 19th International Conference, DLT 2015, Liverpool, UK, July 27-30, 2015, Proceedings., I. Potapov, ed., Springer International Publishing, 2015, pp. 85–96.
- 4[4] A. Blondin Massé, S. Brlek, S. Labbé, and L. Vuillon , Palindromic complexity of codings of rotations , Theor. Comput. Sci., 412 (2011), pp. 6455–6463.
- 5[5] M. Bucci, A. De Luca, A. Glen, and L. Q. Zamboni , A new characteristic property of rich words , Theor. Comput. Sci., 410 (2009), pp. 2860–2863.
- 6[6] X. Droubay, J. Justin, and G. Pirillo , Episturmian words and some constructions of de Luca and Rauzy , Theor. Comput. Sci., 255 (2001), pp. 539–553.
- 7[7] A. Frid, S. Puzynina, and L. Zamboni , On palindromic factorization of words , Adv. Appl. Math., 50 (2013), pp. 737–748.
- 8[8] A. Glen, J. Justin, S. Widmer, and L. Q. Zamboni , Palindromic richness , Eur. J. Combin., 30 (2009), pp. 510–531.
