On generalized Lyndon words
Francesco Dolce, Antonio Restivo, Christophe Reutenauer

TL;DR
This paper introduces a generalized framework for Lyndon words based on position-dependent lexicographical orders, providing new characterizations and factorization properties that unify and extend classical results.
Contribution
It defines generalized Lyndon words using position-specific total orders and offers new characterizations and factorizations, extending classical Lyndon word theory.
Findings
New characterizations of Lyndon words and their factorizations.
Extension of Lyndon word theory to generalized lexicographical orders.
Specific results for classical and alternating lexicographical orders.
Abstract
A generalized lexicographical order on infinite words is defined by choosing for each position a total order on the alphabet. This allows to define generalized Lyndon words. Every word in the free monoid can be factorized in a unique way as a nonincreasing factorization of generalized Lyndon words. We give new characterizations of the first and the last factor in this factorization as well as new characterization of generalized Lyndon words. We also give more specific results on two special cases: the classical one and the one arising from the alternating lexicographical order.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On generalized Lyndon words
Francesco Dolce
IRIF, Université Paris Diderot (France), [email protected]
Antonio Restivo
Dipartimento di Matematica e Informatica, Università degli Studi di Palermo (Italy), [email protected]
Christophe Reutenauer
LaCIM, Université du Québec à Montréal (Québec, Canada), [email protected]
Abstract
A generalized lexicographical order on infinite words is defined by choosing for each position a total order on the alphabet. This allows to define generalized Lyndon words. Every word in the free monoid can be factorized in a unique way as a nonincreasing factorization of generalized Lyndon words. We give new characterizations of the first and the last factor in this factorization as well as new characterization of generalized Lyndon words. We also give more specific results on two special cases: the classical one and the one arising from the alternating lexicographical order.
Keywords: Generalized Lyndon words, nonincreasing Lyndon factorization, Alternating lexicographical order
1 Introduction
Let be a totally ordered alphabet. A word is called a Lyndon word if for each nontrivial factorization , one has (here is the lexicographical order). Lyndon words were introduced in [15]. It is easy to see that this property can be expressed in an equivalent way using infinite words, namely ( where ) for each nontrivial factorization .
A well-known theorem of Lyndon states that every finite word can be decomposed in a unique way as a nonincreasing product of Lyndon words. This theorem, which is a combinatorial counterpart of the famous theorem of Poincaré-Birkhoff-Witt, provides an example of a factorization of the free monoid (see [13]). It has also many algorithmic applications and it may be computed in an efficient way. Indeed, Duval proposed in [10] a linear-time algorithms to compute it, while Apostolico and Crochemore proposed in [1] a -time parallel algorithm. This factorization is also used in string processing algorithms (see [4]) and for the computation of runs in a word (see, e.g., [9]).
In this paper we consider a variant of this family of words: generalized Lyndon words. These words were first introduced by the third author in [18]. Given a generalized order , i.e., an order in which the comparison between two words depends on the length of their common prefix (see Section 3 for the formal definition), a finite word is called a generalized Lyndon word if for each nontrivial factorization we have .
In this paper we present both new results and new proofs proofs of already published results concerning this family of words. In [18] it is proved that the family of generalized Lyndon words is a Hall set, and thus that they provide a factorization of the free monoid. As a consequence, the associated Lie polynomials form a basis of the free Lie algebra (see [17, 18]). In the present paper, we give a new proof of this factorization theorem (Theorem 16) using only combinatorial techniques instead of the heavy machinery of Hall set theory.
Note that Nyldon words, introduced by Grinberg in [11], also provide a factorization of the free monoid (see [7]), but they are not generalized Lyndon words. Inverse Lyndon words introduced in [4] are not generalized Lyndon words neither, while anti-Lyndon words (introduced in the same paper) with respect to a lexicographical order can be viewed as classical Lyndon words with respect to the order (see also Example 9 later).
With our new combinatorial approach we are able, on one hand, to simplify several of the proofs from [18] and, on the other hand, to obtain new interesting results. In particular, we deduce a new characterization of the last factor of the unique nonincreasing factorization in Lyndon words (Corollary 18). We also simplify a result of [18], stating that generalized Lyndon words are characterised by their suffixes and show a new characterization by the prefixes (Theorem 14). This last result is new even for classical Lyndon words.
Next, we focus on two particular cases of generalized orders: the classical and the alternating one.
In Theorem 20 we give two new characterizations of the first factor of the nonincreasing factorization into classical Lyndon words. For a different characterization of the first factor see also [10] and [17, Lemma 7.14 (iii)].
From Theorem 20 we deduce a new proof of a result from Ufnarovskij (Corollary 22) which characterizes Lyndon words by their prefixes.
The second case we focus on, related to continued fractions, is given by Galois words. These are generalized Lyndon words with respect to the alternating lexicographical order , that is the order comparing two words in an opposite way depending on the parity of the length of the common prefix (see Section 6 for the formal definition). The link with continued fractions is that we have that if and only if
[TABLE]
where for all (see also [18]).
In Theorem 32 we give a characterization which generalized Ufnarovskij’s Theorem to Galois words. Moreover, we also characterize the first factor of the nonincreasing factorization in Galois words (Theorem 33). This is the analogue for Galois words of Theorem 20.
We conclude in Section 7 with some remarks and open problems.
Acknowledgement. We thank the anonymous referee for his/her very detailed report with copious and useful suggestions.
Dedication. Maurice Nivat has been one of the main figures of French school of Theoretical Computer Science. Some of his contributions are in the field of combinatorics on words, as for instance the ones related to discrete geometry, where paths are coded by words (see [BN]). Lyndon words, whose generalization we focus in this paper, have a significant role in discrete geometry (see, e.g., [6]).
We want to dedicate this paper, in this very international journal that he founded, to his memory.
2 Definitions and notations
For undefined notation we refer to [12] and [13]. We denote by a finite alphabet, by the free monoid and by the free semigroup. Elements of are called words and the identity element, denoted by is called the empty word. We say that a word is a factor of the word if for some words ; is a prefix (resp. suffix) if (resp. ); it is nontrivial if and proper if . We say that is a nontrivial factorization of whenever are both nonempty. The length of a word , where for all , is equal to and it is denoted by .
A period of a word is a natural integer such that for any such that ; it is called a nontrivial period if . A word having a nontrivial period is called periodic.
We say that * is a fractional power of * if and for some nonnegative integer . In this case, one writes , where is a positive rational. Note that for (or ) this means that is a prefix of . Fractional powers are also known as sesquipowers (see, e.g., [16]).
We say that the is a strict fractional power of if is a fractional power of and, with the notations above, or, equivalently, that . In this case is a prefix of .
Example 1**.**
Let . Then and . The last one is, in particular, a strict fractional power of .
We denote by the set of sequences over , also called infinite words; such a sequence is also written . If is a (finite) word of length , denotes the infinite word having as a prefix and of period .
We denote by the set of finite and infinite words.
A border of a word of length is a word which is simultaneously a nontrivial proper prefix and suffix of . A word is called unbordered if it has no border. It is well-known that a word has a border if and only if it is periodic.
Suppose that are finite nonempty words. The following fact is well-known: one has if and only if are power of a common word, and this is true if and only if and commute (see, for instance, [13, Corollary 6.2.5]).
Observe also that if for two nonempty words , one has , then by Fine and Wilf theorem, their prefixes of length differ (see, for instance, [12]).
Given an order on the alphabet , we can define the lexicographic order (or simply when it is clear from the context) on in the following way : if either is a proper prefix of (in which case must be in ) or we may write , for some words , and some letters such that .
Definition 2**.**
Let be two distinct elements of such that we have a factorization with finite nonempty words and is an infinite word. We say that the comparison between and takes place within if is a prefix of , but is not. If moreover are letters we say that the comparison takes place at position .
Note that, when the comparison takes place within , one may write , for some and such that .
Lemma 3**.**
Let be nonempty words such that . Then the comparison between and takes place within the first factor of if and only if is not a fractional power of .
Proof.
The comparison between the two infinite words takes place within the first if and only if the two prefixes of length of and are different. The conclusion follows from the fact that is a fractional power of if and only if is a prefix of . ∎
3 Generalized lexicographical order
Definition 4**.**
For each , let be a total order on . To the sequence we associate a total order on , that we still denote by when it is clear from the context, called generalized lexicographical order, as follows: if either is a proper prefix of (in which case must be in ) or we can write for some , some and some letters such that .
Example 5**.**
Let be the generalized order on defined by if is a prime number and otherwise. Then we have and .
Note that, as for the classical order, when is a prefix of we could have , as shown in the next example.
Example 6**.**
Let , and as in Example 5. The word is a prefix of but .
Let us consider a generalized lexicographical order on .
Lemma 7**.**
Let be as in Definition 2 (and the sentence following it). Then (resp. ) implies (resp. ) for any infinite words .
Lemma 8**.**
Let be nonempty finite words such that and let be two finite words. Then
- (i)
if neither or is a prefix of the other, then ; 2. (ii)
if is not a fractional power of , then , where is the largest integer such that is a prefix of . In particular .
Proof.
In case (i), the comparison between the two infinite words takes place within the prefix of length . Hence we conclude using Lemma 7.
Suppose now that the hypothesis of (ii) holds. Then we can write and , with , such that , and . Let . Since and since , we have that . The two infinite words and share the same prefix of length , and the same do and . Thus the comparison between between and takes place at position . Since , we can conclude. ∎
We use several times the following observation: the opposite order of a generalized order is also a generalized lexicographical order, obtained by reversing all the orders .
Example 9**.**
Let be the usual lexicographical order on , that is such that for all . Then is defined by for all .
Example 10**.**
Let be the generalized order defined in Example 5. Then we have and .
Part of the following lemma is stated in [18].
Lemma 11**.**
The following conditions are equivalent for nonempty words :
- (1)
; 2. (2)
; 3. (3)
; 4. (4)
.
Proof.
The four conditions obtained by replacing in the lemma by are equivalent (see again [13, Corollary 6.2.5]). We may therefore assume that none of these equalities holds.
Let us assume first that condition (1) holds and prove that the other conditions hold too.
If is not a fractional power of , then conditions (2), (3) and (4) hold, by point (ii) of Lemma 8, with and for case (2), and for case (3), and and for case (4).
Let us suppose that is a strict fractional power of . We may therefore write , for some , and . Using the observation in the previous section, deduced from the Fine and Wilf theorem, we see that the prefixes of length of and are distinct, i.e., . Since both and begin by and both and begin by , and since the comparison in all four cases is done in the prefix of length , we can conclude that (2), (3) and (4) hold.
Let us now consider the case when is a nonstrict fractional power of , i.e., is a proper prefix of . That implies that either is not a fractional power of or is a strict fractional power of . Let us consider the opposite order of . Since , from what we have seen above, we have that , and . Thus, conditions (2), (3) and (4) hold.
Finally, let us suppose that the negation of (1) holds, that is that (remember that we supposing ). Then, using the same reasoning as above, we have , and , i.e., the negations of the three last conditions. This shows that each of the condition (2), (3) or (4) implies (1). ∎
4 Generalized Lyndon words
In this section we introduce generalized Lyndon words.
Definition 12**.**
Given an alphabet and a generalized order on we say that a finite word is a generalized Lyndon word if for any nontrivial factorization one has .
Example 13**.**
Let and be the order defined in Example 5. The word is a generalized Lyndon word for the order . Indeed, one can easily check that .
4.1 Characterization of generalized Lyndon words
In the next theorem we give two characterizations of generalized Lyndon words. Recall that a classical result due to Lyndon states that a word is a classical Lyndon word if and only if for any nontrivial proper suffix of (see [12, Proposition 5.1.2]).
The second part of the next result has already been proved in [18, Proposition 2.1]. We give here a shorter proof.
Theorem 14**.**
Let us consider a generalized lexicographical order on .
A word is a generalized Lyndon word if and only if for any nontrivial factorization , one has .111One may find on Wikipedia the following characterization (without proof nor references): is a classical Lyndon word if and only if for each nontrivial factorization one has . Our condition is not an extension of this condition to generalized Lyndon words. Indeed, if one take the usual order , one has but . 2. 2.
A word is a generalized Lyndon word if and only if for any nontrivial factorization , one has .
Proof.
By definition, is a generalized Lyndon words if and only if for each nontrivial factorization , one has . By Lemma 11, this is equivalent both to and to , i.e. . ∎
Example 15**.**
Let and as in Example 13. Let us consider the nontrivial factorization with and . One has and .
4.2 Factorization into generalized Lyndon words
The following result is already proved in [18, Theorem 2.1] using the theory of Hall sets. We give here an independent proof, especially for the uniqueness part, using only combinatorial arguments.
Theorem 16**.**
Each word in can be factorized in a unique way as a nonincreasing product of generalized Lyndon words.
Proof.
Let us consider a generalized lexicographical order on .
To prove the existence of such a nonincreasing factorization we follow the proof of [18, Corollary 2.2]. Let (if there is nothing to prove). We define as the shortest among all nontrivial suffixes of such that is minimum. By Theorem 14, is a generalized Lyndon word. If , we have found our factorization. Otherwise, we can write and, by induction on the length of , we may assume that , where the are generalized Lyndon words with . Moreover, we have . Indeed, by construction of we have , and thus, using Lemma 11, .
Let us now prove the uniqueness of this factorization. Suppose that we have , where the are generalized Lyndon words with. Let us show that is uniquely determined by the following condition: it is the shortest among all nontrivial suffixes of such that is minimum. To prove this it is enough to show that if is a nontrivial proper suffix of , then ; and if is a suffix of longer that , then . The first inequality follows from point 2 of Theorem 14 and the fact that is a generalized Lyndon word. Suppose now that the second one is not true. Thus there exists some , with , and some factorization with nonempty, such that , and
[TABLE]
From this last inequality and from Lemma 11 we deduce that . Since , we thus have
[TABLE]
Continuing recursively, we find that , therefore and, since , that . This gives us a contradiction to Theorem 14, since is a generalized Lyndon word. Thus is uniquely determined and, by proceeding recursively we prove the uniqueness of the factorization. ∎
From the proof of the previous theorem we obtain the two following results.
Corollary 17**.**
Let , with generalized Lyndon words such that . Then is the shortest among all nontrivial suffixes of such that is minimum.
Corollary 18**.**
With the same hypothesis as in Corollary 17, we have that is the longest suffix of which is a generalized Lyndon word.
Note that the this result is known for classical Lyndon words (see [17, Lemma 7.14 (ii)] and [10]).
Proof of Corollary 18.
Indeed, if there exists a suffix longer than which is a generalized Lyndon word, then, since has as a proper suffix, we would have by point 2 of Theorem 14, contradicting Corollary 17. ∎
Example 19**.**
Let us consider the word with the order defined in Example 5. The unique nonincreasing factorization of in generalized Lyndon words is .
Note that every factor of the factorization in Lyndon words is primitive, i.e., if with a finite word and an integer, then and .
5 Classical Lyndon words
In this section, we take as generalized lexicographical order the usual lexicographical order , simply denoted by . Clearly, a generalized Lyndon word for this order is a usual one, since for two finite words of the same length, one has if and only if . (see [5, Theorem 8]).
5.1 Factorization into Lyndon words
The nonincreasing factorization of a word into Lyndon words, as in Theorem 16, is the usual nonincreasing factorization into Lyndon words (see, for instance, [12]).
While at the end of Section 4 we gave two characterizations of the last element of the factorization, here we focus on the first factor. This result is motivated by point 1 of Theorem 14: the fact that a word is not a Lyndon word implies the existence of a prefix such that , where is the corresponding suffix. If one chooses the shortest prefix satisfying this property, this turns out to be the first factor in the Lyndon factorization. In the same vein, it is motivated by Ufnarovskij’s Theorem (Corollary 22 below).
Theorem 20**.**
Let be the nonincreasing factorization into Lyndon words of a finite nonempty word .
The word is the shortest nontrivial prefix of such that, when writing , one has either or . 2. 2.
The word is the shortest nontrivial prefix of such that .
In order to prove Theorem 20 we need a preliminary result which refines Lemma 11 in the case of the usual lexicographical order.
Note that, for any infinite words such that , with the classical order, and for any finite word , one has .
Lemma 21**.**
Let be two nonempty words. Then each of the two following conditions is equivalent to each of the four conditions in Lemma 11:
- (5)
; 2. (6)
.
Proof.
Condition (3) in Lemma 11 is equivalent to condition (5): indeed Similarly, condition (2) is equivalent to condition (6): indeed, ∎
Note that the previous lemma implies that if , then
[TABLE]
a result proved by Bergman in [2, Lemma 5.1] (see also [20, p.34 and pp.101–102]).
The following result is [20, Theorem 2, p.35].
Corollary 22** (Ufnarovskij).**
A word is a Lyndon word if and only if for any nontrivial factorization , one has .
Proof.
It follows from point 1 of Theorem 14 and from Lemma 21. ∎
Example 23**.**
The word is a Lyndon word. We have .
Corollary 24**.**
If , with , are Lyndon words such that , then .
Proof.
The case , it is trivial. Let consider the case . By induction hypothesis we have . From Lemma 21 it follows that . Hence, . ∎
It is well-known that all (classical) Lyndon words are unbordered (see, for instance, [8]).
Proof of Theorem 20.
Let us prove the first assertion. When , then is a Lyndon word and the result is true by point 1 of Theorem 14.
Suppose now that . Then, by Corollary 24, we have . Let be a nontrivial prefix of shorter then . Thus, we have a nontrivial factorization for some . By Theorem 14, we know that . Since is unbordered, cannot be a fractional power of . Thus, by point (ii) of Lemma 8, one has , which prove the first part of the theorem.
The second assertion just follows from the first one. Indeed, using Lemma 21, we have that if , then is equivalent to . ∎
Example 25**.**
Let . Its nonincreasing factorization into Lyndon words is . One can check that while .
6 Galois words
In this section we consider a particular generalized lexicographical order.
Definition 26**.**
Let be an order on . The alternating lexicographical order with respect to is the generalized lexicographical order defined by the sequence with if is odd, and if is even.
Example 27**.**
Let us consider as the usual order on . Then one has .
This order is relevant when one orders real numbers through their continued fractions, see for example [18, p.1-2].
The terminology in the following definition is justified in [18, p.2].
Definition 28**.**
A Galois word is a generalized Lyndon word for an alternating lexicographical order.
Example 29**.**
Let us consider the usual order on . The following are Galois words: , , , , , , .
6.1 Characterization of Galois words
Similarly to what we saw in Section 5.1 for the classical order, for any infinite words such that , and any finite word , one has if is even, and if is odd.
Symmetrically, when one has if is even and if is odd.
Example 30**.**
Let us consider the order of Example 27. One has and .
Using the previous observation we can prove the next lemma using the same techniques as in Lemma 21.
Lemma 31**.**
Let be nonempty words. Then each of the two following conditions is equivalent to each of the four conditions in Lemma 11 when considering the order :
- (5)
* if is even and if is odd;*
- (6)
* if is even and if is odd.*
Proof.
Let us first suppose that is even. By the remark at the beginning of the section, one has . Using the same remark we have, in the case is odd, . Thus condition (3) of Lemma 11 is equivalent to condition (5).
Similarly, condition (2) of Lemma 11 is equivalent to condition (6). Indeed, whenever is even one has , and whenever is odd one has . ∎
The following characterization of Galois words can be seen as a generalization of Ufnarovskij’s Theorem (Corollary 22).
Theorem 32**.**
A word is a Galois word if and only if for any nontrivial factorization , one has the following condition: if is even and if is odd.
Proof.
The result immediately follow from point 1 in Theorem 14 and from Lemma 31. ∎
6.2 Factorization into Galois words
Suppose that is the nonincreasing factorization of in Galois words. We call multiplicity of the number . In other words , with .
The following result is a generalization of Theorem 20 to Galois words. This result is motivated by Theorems 14 and 32.
Theorem 33**.**
Let with Galois words satisfying . Let be the multiplicity of . Let be the shortest nontrivial prefix of such that
[TABLE]
Then
- (i)
if is odd, is even, and , then ;
- (ii)
otherwise, .
Note that we can give an equivalent condition on in the previous statement.
Lemma 34**.**
Let be a finite word, with . Then satisfies condition if and only if is such that one has either or .
Proof.
From Lemma 31 it follows that one has when is even and when is odd. ∎
In order to prove Theorem 33 we need several lemmata.
Recall, from Section 5.1 that classical Lyndon words have no border. This is no more true for Galois words, as shown in the next lemma.
Lemma 35**.**
*([18, Proposition 3.1]
If a Galois word has a border, then it has odd length.*
Lemma 36**.**
Let be Galois words with and a prefix of . Then is even.
Proof.
Let be a prefix of with odd. Then by Theorem 32, one has . ∎
Lemma 37**.**
Let be Galois words with . If is a strict fractional power of then is even.
Proof.
Let and a prefix of such that . The factorization is not trivial since . Both and have as a prefix, and since , we have , where and .
Let us suppose by contradiction that is odd. By the remark at the beginning of Section 6.1, we have . Since but , the comparison of the last inequality takes place within the prefix of length . Thus, by Lemma 7, one has , which is impossible since is a Galois word. ∎
Lemma 38**.**
Let with Galois words satisfying . Let be the multiplicity of and assume that and that . Then:
- (i)
if is odd and is even, then ;
- (ii)
otherwise, .
Proof.
If then we have and thus ., i.e., is odd and condition (ii) holds.
Suppose now that . Let us first suppose that is even. If , then necessarily we have , since . Therefore, and is odd, so we are in case (ii). If are not all equal, we can argue by induction on . Thus since is even we have . By applying Lemmata 11 and 31 we find that . Finally, since , we have ; moreover, either is odd, or is even and then and is even, so that we are in case (ii).
Suppose now that is odd. We assume first that , i.e., .
We have . We show that is not a fractional power of . Indeed, is not a prefix of by Lemma 36; moreover, by Lemma 37, is not a strict fractional power of ; since being a fractional power is equivalent to be a prefix, or a strict fractional power, we are done.
Hence, by Lemma 8 (ii) (applied to the opposite order), it follows that .
Finally, let us consider the case and odd. We have and , with . By induction applied to we have , i.e., condition (ii) holds. Since is odd, one deduces, by using Lemmata 11 and 31, that when is even and when is odd. ∎
An interesting consequence of the previous lemma is the following.
Corollary 39**.**
Let with Galois words satisfying . Let be the multiplicity of . One has if is even and if is odd.
Proof.
This follows from Lemma 31 with and . ∎
We can now prove the main result of the section.
Proof of Theorem 33.
Let us prove first that the two prefixes, for the case (ii) and for the case (i) satisfy condition . If we are under the hypotheses of case (ii), i.e., if is even or is odd (the two conditions are not mutually exclusive), then by Lemma 38. If we are under the hypotheses of case (i), then we have and the multiplicity of in is odd. Hence, applying Lemma 38 we find that . In both cases the result follows from Lemma 34.
Let us now prove that any nontrivial proper prefix of in case (ii) and of in case (i) does not satisfy condition .
Let be a non trivial factorization of and . By Theorem 14, we have . If the comparison in the last inequality takes place within the first of , then , and we can conclude by using Lemma 34.
Otherwise, by Lemma 3, is a fractional power of , i.e., we can write , (since is primitive). We claim that is odd. Indeed, if we suppose that , then is a prefix of , hence of , so that is odd by Lemma 35. If we suppose that , then we can write , with and . Thus and are both borders of . This implies by Lemma 35 that is odd and even, hence is odd. This implies that is even if and only if is odd.
Let us suppose first that is odd. Then by Theorem 32. If we are in case (ii), we have by Corollary 39, hence , as we wanted to prove. If we are in case (i), then . If the comparison between and takes place within the first of , we conclude that . Otherwise, by Lemma 3, we can write , with , and . Since is a Galois word, we have . Since , the comparison in the previous (strict) inequality takes place in the prefix of their common length, hence in their prefix of length . Thus, for all infinite words .
Since has as a prefix, we deduce that . Therefore, since is odd, we have
[TABLE]
Let us suppose now that is even (and thus is odd). From Theorem 32 it follows that and from Corollary 39 it follows that . Thus .
We have proved that no nontrivial proper prefix of satisfy condition . It remains to prove that, under the hypotheses of case (i), each proper prefix of of length at least does not satisfy condition .
Since we are in case (i), is odd, is even and . Using Lemma 38 we have that . Thus it follows from Lemma 34 that does not satisfy condition .
Let now consider with a nontrivial factorization of . If is even, and thus is odd, we have by Theorem 32. Using Lemma 31 and Corollary 39 we find
[TABLE]
Finally, let us suppose that is odd, and thus and are even. By Theorem 14 we have . Since is not a prefix of (being of even length, it cannot be a border of ), the comparison is within the first of . Thus we have . Since is even and , we deduce that
[TABLE]
Therefore, in both cases does not satisfy condition . ∎
Example 40**.**
Let us consider the word using the alternating order of Example 25. The nonincreasing factorization of in Galois words is . One can check that , and that each nontrivial proper prefix of does not satisfy condition of Theorem 33.
7 Remarks and open problems
Generalized Lyndon words, defined by using orders different than the usual lexicographical one, have different behaviors than classical Lyndon words.
For instance, we have seen that generalized Lyndon words are not, in general, unbordered, as is the case for classical Lyndon words. Moreover, it is known that for each primitive word , its unique conjugate which is a classical Lyndon word, is one of the elements in the nonincreasing factorization of into Lyndon words (see, e.g. [17, Section 7.4.1] and [14] where are given algorithms to compute this unique conjugate). This is no more true for generalized Lyndon words, as shown in the next example.
Example 41**.**
Let us consider the primitive word . Let us consider the alternating order on with . The unique Galois word conjugate to is . The nonincreasing factorization of into Galois words is .
In the classical case it is easy to show that we have a symmetric result of Corollary 17, namely that is maximum along all , with a nontrivial prefix of . This is not true for general orders, as shown in the next example.
Example 42**.**
Let us consider the word and the alternating order of Example 41. Its nonincreasing factorization in Galois word is . If we consider the nontrivial proper prefix , we have .
Moreover, using the same notation as before, it is not true in general that is the longest among all prefixes of which are generalized Lyndon words (this is true for classical Lyndon words, see, e.g., [17, Lemma 7.14 (iii)]).
Example 43**.**
Let as in Example 42. The prefix is longer than and it is also a Galois word.
For classical Lyndon words, it is known that the unique nonincreasing factorization of a word in Lyndon words is also the factorization into Lyndon words which has the less number of factors222This follows easily from the property: if are Lyndon words and , then is a Lyndon word, see [12, Proposition 5.1.3]. This is no more true for generalized Lyndon words, as shown in the next example.
Example 44**.**
Let us consider the alternating lexicographical order of Example 41. Then is a word with its nonincreasing factorization in Galois words. The word admits a shorter factorization into Galois words, namely .
In [10], Duval shows that given a finite word , it is possible to compute in linear time its nonincreasing factorization into classical Lyndon words.
Open Problem 1**.**
Generalize Duval’s algorithm to generalized Lyndon words.
In the same paper, Duval also proposed a second algorithm generating all Lyndon words of length . A consequence of this algorithm is that the number of Lyndon words of length at most is equal to the number of words of length that are prefixes of a Lyndon word plus . This property is no more true for a generalized order, as shown in the next example.
Example 45**.**
Let us consider with the usual order. The only Lyndon words of length at most are and . It is easy to check that there are exactly words of length which are prefixes of a Lyndon word, namely and .
Let us now consider with the alternating order given by . There are Galois words of length at most (namely and ) but only words of length which are prefixes of Galois words (namely and ).
Finally, all along the paper we only considered finite Lyndon words. In [19] are defined infinite Lyndon words: these are the infinite words which have infinitely many prefixes that are (finite) Lyndon words. Then the authors of [19] prove that is an infinite Lyndon word if and only if is smaller than any of its nontrivial proper suffixes (Proposition 2.2). They prove also that each infinite word is equal to a nondecreasing product of finite Lyndon words and perhaps one infinite one (Proposition 2.3). This means that either , with finite Lyndon words and an infinite one, or is an infinite product of finite Lyndon words; in both cases, .
Thus, following [3] and [19], we say that an infinite word is a generalized infinite Lyndon word if is smaller that any of its nontrivial proper suffixes. It would be interesting to generalize this result to Galois words and other generalized Lyndon words.
Open Problem 2**.**
Prove that each infinite word can be factorized in a unique way as a nonincreasing product of finite and infinite generalized Lyndon words.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Alberto Apostolico and Maxime Crochemore. Fast parallel Lyndon factorization with applications. Math. Systems Theory , 28(2):89–108, 1995.
- 2[2] George M. Bergman. Centralizers in free associative algebras. Trans. Amer. Math. Soc. , 137:327–344, 1969.
- 3[3] Luc Boasson and Olivier Carton. Transfinite Lyndon words. In Developments in language theory , volume 9168 of Lecture Notes in Comput. Sci. , pages 179–190. Springer, Cham, 2015.
- 4[4] Paola Bonizzoni, Clelia De Felice, Rocco Zaccagnino, and Rosalba Zizza. Inverse Lyndon words and inverse Lyndon factorizations of words. Adv. in Appl. Math. , 101:281–319, 2018.
- 5[5] Silvia Bonomo, Sabrina Mantaci, Antonio Restivo, Giovanna Rosone, and Marinella Sciortino. Sorting conjugates and suffixes of words in a multiset. Internat. J. Found. Comput. Sci. , 25(8):1161–1175, 2014.
- 6[6] Srečko Brlek, Jacques-Olivier Lachaud, Xavier Provençal, and Christophe Reutenauer. Lyndon + Christoffel = digitally convex. Pattern Recognition , 42(10):2239–2246, 2009.
- 7[7] Émilie Charlier, Manon Philibert, and Manon Stipulanti. Nyldon words. 2018. URL: https://arxiv.org/pdf/1804.09735.pdf , ar Xiv:1804.09735 .
- 8[8] Christian Choffrut and Juhani Karhumäki. Combinatorics of words. In Handbook of formal languages, Vol. 1 , pages 329–438. Springer, Berlin, 1997.
