Generalized Lyndon Factorizations of Infinite Words
Amanda Burcroff, Eric Winsor

TL;DR
This paper extends Lyndon factorization concepts to infinite words under a generalized lexicographic order, proving a unique factorization theorem and characterizing infinite Lyndon words by their prefixes.
Contribution
It introduces a generalized Lyndon factorization for infinite words, proving its uniqueness and characterizing infinite Lyndon words through their prefixes.
Findings
Every infinite word has a unique nonincreasing generalized Lyndon factorization.
The factorization is finite if and only if the last term is finite.
Infinite generalized Lyndon words have infinitely many Lyndon prefixes.
Abstract
A generalized lexicographic order on words is a lexicographic order where the total order of the alphabet depends on the position of the comparison. A generalized Lyndon word is a finite word which is strictly smallest among its class of rotations with respect to a generalized lexicographic order. This notion can be extended to infinite words: an infinite generalized Lyndon word is an infinite word which is strictly smallest among its class of suffixes. We prove a conjecture of Dolce, Restivo, and Reutenauer: every infinite word has a unique nonincreasing factorization into finite and infinite generalized Lyndon words. When this factorization has finitely many terms, we characterize the last term of the factorization. Our methods also show that the infinite generalized Lyndon words are precisely the words with infinitely many generalized Lyndon prefixes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: University of Michigan, Ann Arbor MI 48109, USA
11email: {burcroff,rcwnsr}@umich.edu
Generalized Lyndon Factorizations of Infinite Words
Amanda Burcroff
Eric Winsor
Abstract
A generalized lexicographic order on words is a lexicographic order where the total order of the alphabet depends on the position of the comparison. A generalized Lyndon word is a finite word which is strictly smallest among its class of rotations with respect to a generalized lexicographic order. This notion can be extended to infinite words: an infinite generalized Lyndon word is an infinite word which is strictly smallest among its class of suffixes. We prove a conjecture of Dolce, Restivo, and Reutenauer: every infinite word has a unique nonincreasing factorization into finite and infinite generalized Lyndon words. When this factorization has finitely many terms, we characterize the last term of the factorization. Our methods also show that the infinite generalized Lyndon words are precisely the words with infinitely many generalized Lyndon prefixes.
Keywords:
generalized lexicographic order infinite generalized Lyndon word unique nonincreasing Lyndon factorization
1 Introduction
A rotation of a finite word is a word of the form , where is a factorization of . A finite word is called Lyndon if it is strictly smallest among its class of rotations with respect to the standard lexicographic order. In particular, every finite word is a conjugate of some power of a Lyndon word. Lyndon words were introduced in 1953 by Shirshov in [12] and studied by Lyndon in [8]. Lyndon words have been given various names throughout their history, including standard lexicographic sequences, regular words, and prime words. These names hint at their significant role in the factorization of words.
Let denote the free monoid on a totally ordered (possibly infinite) alphabet , where is ordered lexicographically. The Chen-Fox-Lyndon factorization theorem for words states that the Lyndon words form a basis for [2]. Put more concretely, any finite word on can be written uniquely as a product of nonincreasing Lyndon words.
About 40 years later, infinite Lyndon words were introduced in [13]. There are several equivalent definitions, but we use the definition which focuses on the idea of rotation. An infinite word is called Lyndon if it is strictly smallest among its suffixes with respect to the standard lexicographic order. If is an infinite word with a nontrivial factorization , the suffix can be viewed as the rotation with respect to this factorization. Let denote the set of sequences, or infinite words, over . These too yielded deep factorization properties; Siromoney et al. showed that every sequence in has a unique factorization as a nonincreasing product of finite and infinite Lyndon words.
The extension of the Lyndon property to generalized lexicographic orders came about 10 years later by Reutenauer [11]. A generalized lexicographic order is a modified lexicographic order where the total order of the alphabet depends on the index of comparison. This naturally induces a notion of finite and infinite generalized Lyndon words under a generalized lexicographic order. (See Section 2.) Reutenauer showed that the finite generalized Lyndon words form a basis for using Hall set theory, and Dolce et al. provided a combinatorial proof in 2018 [11, 3]. Generalized Lyndon words are studied further by Dolce et al. in [4].
An example of a generalized lexicographic order is the alternating order , where the alphabet is given its standard order when the index of comparison is odd and its opposite order when the index is even. This order can be connected with continued fractions by noting that the map defined by
[TABLE]
satisfies in if and only if in . Generalized Lyndon words with respect to the alternating order are called Galois words, and Galois factorizations were given further characterization in [4]. Another special case are the anti-Lyndon words, introduced in [5], which are generalized Lyndon words with respect to the opposite lexicographic order.
Dolce et al. conjectured that the finite and infinite generalized Lyndon words provide a unique nonincreasing (with respect to -powers) factorization of all infinite words. Our main result is to show that this is indeed the case.
In Section 3, we focus on words with a generalized Lyndon suffix. Theorem 3.3 shows that these are precisely the words with finitely many terms in their nonincreasing generalized Lyndon factorization. Moreover, we characterize the last term as the first generalized Lyndon suffix (with respect to the index).
Sections 4 and 5 focus on the existence and uniqueness, respectively, of nonincreasing generalized Lyndon factorizations for words which have no generalized Lyndon suffix. In the process we develop powerful machinery to take advantage of the strong properties of these factorizations. A product of this machinery is presented briefly in Section 6, where we show that an infinite word is generalized Lyndon if and only if it has infinitely many generalized Lyndon prefixes. This is the generalized analogue of the result of Siromoney et al. showing that infinite Lyndon words are precisely the words with infinitely many Lyndon prefixes.
2 Preliminaries
Let . Words are finite or infinite (to the right) sequences of letters from a fixed (possibly infinite) alphabet . For , the contiguous substring beginning at the letter and ending with the (inclusive on both ends) is denoted . A word is a factor of if for (possibly empty) words and . In the case that is empty, is a prefix of , and if is empty, then is a suffix of . If in addition (resp. ) is nonempty, we say that the prefix (resp. suffix) is proper. If is an infinite word, the suffix of beginning at the index of is denoted . The length of a finite word is denoted by .
Let . Given a total order on an alphabet , the lexicographic ordering on is defined such that if and only if is a proper prefix of or and for words and letters . We are primarily interested in a generalization of this order.
For each , let be a total order on . The generalized lexicographic order induced by is defined such that if and only if is a proper prefix of or and for words and letters .
If is a prefix of or is a prefix of , we write . Note that if , then implies . We will use the operator “transitively”, where the expression implies that the shortest of the words is a prefix of the rest. We also define a modified comparison operator such that if the prefixes of having length satisfy , where is the generalized lexicographic order. The same property of only comparing the prefixes up to the length of the shortest word in a chain also applies when the operators and are applied together in a chain.
A finite word is called a power of a finite word if for some integer . Let the -power of , denoted by , be the infinite word . An infinite word is called a power of a finite word if ; we also say that is periodic. If is infinite, we use the convention . An infinite word with a periodic suffix is called eventually periodic, and an infinite word which is not eventually periodic is called aperiodic. A word which is not a power is called primitive. A finite word is called a fractional power of a finite word if . We write , e.g., . See [6], [7], and [10] for more on the combinatorics of words.
A word is a finite generalized Lyndon word if it is strictly smallest among its class of rotations with respect to a generalized lexicographic order. That is, for any nontrivial factorization , we have . An infinite word is an infinite generalized Lyndon word if it is strictly smallest among its class of suffixes with respect to a generalized lexicographic order. A nonincreasing generalized Lyndon factorization of a word is a product of the form where , each is generalized Lyndon, and for all .
3 Existence and Uniqueness of Finite Factorizations
In this section, we show that the words admitting a unique finite nonincreasing generalized Lyndon factorization are precisely the words that have a generalized Lyndon suffix.
Lemma 1
([3], Lemma 31)* Let be nonempty finite words. Then the following four conditions are equivalent:
(1) (2) (3) (4) .*
We will also make use of a result by Lyndon and Schützenberger [9] concerning commuting words, which can easily be strengthened when one of the words is generalized Lyndon.
Lemma 2
([9])* Two finite words commute if and only if they are powers of a common word.*
Corollary 1
Suppose is a finite generalized Lyndon word, is any finite word, and . Then is a power of .
Proof
This follows from Lemma 2 and the fact that generalized Lyndon words are primitive.
Lemma 3
Suppose and are finite words satisfying (resp. ) for some . Then .
Proof
Suppose there exists a maximum nonnegative integer, , such that ; note that . Then
[TABLE]
Thus , a contradiction to our choice of . The proof proceeds analogously for the case where the inequalities are reversed.
Theorem 3.1
Suppose is an infinite word. If is a nonincreasing product of finite generalized Lyndon words, then has no generalized Lyndon suffixes.
Proof
Suppose has a generalized Lyndon suffix . Without loss of generality, we can assume and , where each is a generalized Lyndon word, , and is a suffix of . Since is generalized Lyndon, Lemma 1 implies . Furthermore, since is generalized Lyndon, we have for all . Thus, for all we have , hence .
Suppose that there exists some such that . Note that each such is a prefix of . By the nonincreasing property of the generalized Lyndon factors, either there exist finitely many such , or there exists some and such that for all , we have . The latter case holds because there are only finitely many prefixes of , so one prefix must appear infinitely many times. By the nonincreasing property of the factorization, this means that all terms in the factorization after the first term equal to this prefix must also equal this prefix. Observe that in the latter case, we have
[TABLE]
hence .
We conclude that there exists a minimal such that for some and . Since , then . Thus, , so Lemma 3 implies . Suppose . Then is a prefix of , so
[TABLE]
hence is a power of by Corollary 1. Thus, is not generalized Lyndon, so , a contradiction.
Thus, we must have that . By the minimality of , we have that for , which implies that for . As , we have that . Hence
[TABLE]
In particular, by Lemma 3, we have and . We repeat this process, showing that and for all . Hence . However, since , Lemma 1 implies . Thus and commute, so Corollary 1 implies is a power of . In particular, is a proper suffix of , so , contradicting our nonincreasing assumption.
Thus, we must have for all . has a generalized Lyndon suffix, so it cannot be periodic. We can fix to be the smallest index such that . By Lemma 3, the inequality
[TABLE]
implies that . Hence for some and . On the one hand, we have , hence . On the other hand, since is a prefix of and is a factor, we have because is generalized Lyndon. Hence, Lemma 2 implies , contradicting that by the generalized Lyndon property of .
Lemma 4
If is a finite word and is an infinite word, then (resp. ) if and only if (resp. ).
Proof
Suppose . Let be the largest integer such that . Hence for some infinite word . Thus, the comparison between and happens between index and index , inclusive. In particular, .
Now suppose . Let be the largest index such that . Thus, the comparison between and happens between index and index , inclusive. In particular, .
The proof with the reverse inequalities proceeds analogously.
In order to show the existence and uniqueness of generalized Lyndon factorizations of infinite words, we will invoke a theorem of Reutenauer which gives the analogous result for finite words [11].
Theorem 3.2
[11, 3]* Any finite word has a unique nonincreasing factorization into generalized Lyndon words.*
Theorem 3.3
An infinite word with an infinite generalized Lyndon suffix has a unique factorization into generalized Lyndon words, and this factorization is finite. Furthermore, the last term in this factorization is the first generalized Lyndon suffix by index.
Proof
We first show existence. Let be the first generalized Lyndon suffix of by index, that is, where the length of is minimum such that is generalized Lyndon. Let be the unique nonincreasing factorization of from Theorem 3.2. It is enough to show that , as this will yield as a nonincreasing generalized Lyndon factorization of .
Suppose that . By Lemma 4, this implies . Let be the shortest (not necessarily proper) suffix of such that is minimal. Note that we have , so is nonempty. However, by construction we have for every suffix of . Notably, for any suffix of because is generalized Lyndon. Thus is generalized Lyndon. This contradicts our choice of to be the first generalized Lyndon suffix of . Therefore , so we have produced a nonincreasing factorization of .
By Theorem 3.1, any factorization of must have only finitely many terms. Let be a nonincreasing factorization of into generalized Lyndon words. Suppose, seeking a contradiction, that is not the longest generalized Lyndon suffix of , i.e., there is a suffix of of the form where is a suffix of . From the nonincreasing property of the factorization and the generalized Lyndon property of , we know . By Lemma 1, . Inductively, we find . Thus, by Lemma 4, we have , contradicting that is generalized Lyndon.
Now that we have uniquely determined , the other factors are uniquely determined. This follows because the prefix of has a unique nonincreasing factorization into generalized Lyndon words. Thus by our initial assumption that is a nonincreasing factorization of , the unique factorization of must be .
4 Existence of Infinite Factorizations
In this section, we describe a method to construct an infinite factorization of a word with no generalized Lyndon suffix by taking a limit of the finite factorizations of some of its prefixes.
Lemma 5
If a primitive infinite word has infinitely many generalized Lyndon prefixes, then it is a generalized Lyndon word.
Proof
Let be a primitive word which is not infinite generalized Lyndon, and let be minimal such that . Let be the index of comparison between and . Then for any , we have with a comparison at index . Thus is not generalized Lyndon for any , so we can conclude that has finitely many generalized Lyndon prefixes.
Lemma 6
If is a finite word that is not generalized Lyndon, then has finitely many generalized Lyndon prefixes.
Proof
If is not generalized Lyndon, then we can write where for some prefix . Observe that will be a factor of any prefix of having length at least , and will be a prefix of any such prefix of . Thus any prefix of having length at least is not generalized Lyndon.
Theorem 4.1
An infinite word has a nonincreasing factorization into generalized Lyndon words.
Proof
Fix an infinite word . Theorem 3.3 completes the proof in the case that has an infinite generalized Lyndon suffix. So we can assume that has no infinite generalized Lyndon suffix. In particular, is not generalized Lyndon.
We will first consider the case where is not eventually periodic. Since is not generalized Lyndon, Lemma 5 implies that has finitely many generalized Lyndon prefixes. Thus one of its generalized Lyndon prefixes must appear in the factorization of yielded by Theorem 3.2 for infinitely many . Let be such a prefix, and let .
We will now inductively construct a factorization of . Suppose we can write such that each is a finite generalized Lyndon word, , and has infinitely many prefixes whose factorizations begin with . Since has no generalized Lyndon suffixes, is not generalized Lyndon, so it must have finitely many generalized Lyndon prefixes. Since infinitely many prefixes of have factorizations beginning with , one of the generalized Lyndon prefixes of , which we label , must be such that infinitely many prefixes of have factorizations beginning with . We can then write . Note that by construction, . By induction, we get a nonincreasing generalized Lyndon factorization of .
Now suppose that is eventually periodic. If is a power of a generalized Lyndon word , we can use the factorization . Otherwise, is a power of a finite word that is not generalized Lyndon or is primitive. In either case, Lemmas 5 and 6 imply that has finitely many generalized Lyndon prefixes. We can thus apply the construction from the previous paragraph, in each step yielding a factorization of starting with . This process will halt only if has infinitely many generalized Lyndon prefixes such that is the factorization of a prefix of . By Lemmas 5 and 6, this implies that is a power of a generalized Lyndon word . Moreover, since the ’s have unbounded length and , we must have . Therefore is a factorization of . Thus, in any case, this construction yields a nonincreasing factorization of into generalized Lyndon words.
5 Uniqueness of Infinite Factorizations
We will determine the uniqueness of the factorization constructed in Section 4, handling first the eventually periodic words and then aperiodic words with no generalized Lyndon suffix.
Theorem 5.1
An eventually periodic infinite word has a unique nonincreasing factorization into generalized Lyndon words.
Proof
Fix an infinite word with a periodic suffix. Observe that this implies we can write as where is a (possibly empty) finite word and is a nonempty finite generalized Lyndon word. We may assume has no generalized Lyndon suffix, as this case is handled by Theorem 3.3.
We first claim that the factorization (from Theorem 4.1) of must terminate with . Since is generalized Lyndon hence not equal to any of its rotations, we have that if and only if is an integer multiple of . Moreover, if is not a multiple of , then is a power of a word which is not generalized Lyndon and hence has finitely many generalized Lyndon prefixes by Lemma 6. If one of these generalized Lyndon prefixes, , appears infinitely many times in the factorization of , then is a suffix of . Since and are suffixes of , they are powers of rotations of and , respectively. Because and are generalized Lyndon, this means that . That is, only finitely many terms of the factorization are not equal to . Thus, we can conclude for sufficiently large .
Now suppose and are two distinct factorizations of . Note that must be an integer multiple of , as is a generalized Lyndon word and hence not equal to any of its rotations. Without loss of generality, assume . In this case, there exists such that , which violates the uniqueness of the nonincreasing generalized Lyndon factorization for finite words from Theorem 3.2. Thus, the nonincreasing factorization of into generalized Lyndon words is unique.
Lemma 7
Let be a finite generalized Lyndon word where , is a finite generalized Lyndon word for all , is a suffix of a finite generalized Lyndon word , is a prefix of a finite generalized Lyndon word , and . Then and for all .
Proof
The generalized Lyndon property of implies . The nonincreasing property of the factors implies . Combining these inequalities, we have .
Suppose . The generalized Lyndon property of and the nonincreasing property furthermore implies
[TABLE]
Hence Lemma 3 implies that , so . Repeating this process, we can conclude .
Similarly, suppose . The generalized Lyndon property of and the nonincreasing property implies
[TABLE]
Hence Lemma 3 implies that , so . Repeating this process, we can conclude .
Lemma 8
Let satisfy the hypotheses of Lemma 7. If , then there exists some with such that
[TABLE]
Proof
We assume that for with and proceed by induction on . Note that the base case of is automatic. Furthermore, we suppose there exists with that satisfies the property of Lemma 8 when we restrict to considering with . Note that we can have . By Lemma 7, we have and , hence . Thus, if , then we are done.
Suppose and , so is a prefix of . Let for , where each . Let . Thus . By Lemma 7, we have , so is a factor of . Since is a prefix of , by the generalized Lyndon property we have . On the other hand, we have , which implies by Lemma 1. In particular, we have . Combining inequalities yields , implying is a power of by Corollary 1. Thus
[TABLE]
so Corollary 1 and Lemma 1 imply . This contradicts our choice of , hence and , as desired.
In the other case, we need to consider is and . Let for and , noting that since is generalized Lyndon, and hence primitive. If or if , then is a factor of . Note by our inductive hypothesis that is a prefix of . By the generalized Lyndon property, we have . However, we also have
[TABLE]
Combining inequalities yields , implying is a power of by Corollary 1. This means is a power of , contradicting the primitiveness of , so we must have and . Since we assume , there must exist some such that and . Notably, the largest value of such that works. Then is a prefix of , and is a factor of . By the generalized Lyndon property of and our inductive hypothesis, we have . However, we also have , hence by Lemma 1. In particular, . Again, we combine inequalities and use Corollary 1 and Lemma 1 to conclude , our final contradiction.
Lemma 9
If satisfies the hypotheses of Lemma 7, then .
Proof
It is enough to show that , since . If , then by Lemma 8, we have for some . Moreover, by Lemma 7 and our assumption , we have , hence . Thus, is a factor of and is a prefix, so the generalized Lyndon property of implies . Since we have Lemma 3 implies . In particular, . Combining inequalities, we have . This implies is a power of by Corollary 1. Thus . In particular , which implies by the generalized Lyndon property of . Therefore , contradicting our choice of .
Corollary 2
Let be as in the statement of Lemma 9. If we additionally assume that is a proper suffix of and , then .
Proof
By Lemma 9, we have . Since is a proper suffix of , we have . Thus , which is a contradiction.
Theorem 5.2
An aperiodic infinite word with no generalized Lyndon suffix has a unique nonincreasing factorization into finite generalized Lyndon words.
Proof
Suppose is an aperiodic word with no generalized Lyndon suffix such that has two distinct nonincreasing factorizations into generalized Lyndon words. Note that each factor in both factorizations must be finite. We can remove any initial common factors, so without loss of generality where for all , for all , and . Since we know finite words have unique nonincreasing generalized Lyndon factorizations from Theorem 3.2, we have for any .
Define , , and . We define to be the unique integer such that can be written as , where is a prefix of . We define to be the unique integer such that can be written as , where is a prefix of . This construction is illustrated in Figure 1. Observe that for each we have that is a proper prefix of , is a proper suffix of , is a proper prefix of , and is a proper suffix of .
We aim to show that for each , and since this reduction can only be applied finitely many times, we will reach a contradiction. Assume not, that for a certain .
First suppose that , and note that we cannot have be a power of or , or else is not primitive by Lemma 9. Thus Lemmas 9 and 7 imply that for some and . Moreover, Corollary 2 implies that . Lemma 7 implies that and , hence implies . Furthermore, Lemma 7 yields , so . Thus
[TABLE]
However, by the generalized Lyndon property of and Lemma 8, we also have
[TABLE]
Combining inequalities yields , which implies that and are powers of a common word by Lemma 2. Thus is not primitive, contradicting that it is generalized Lyndon. So in this case we have .
Now suppose . By the generalized Lyndon property of , we have . The nonincreasing property of our factors implies Hence , so for some . Since , Corollary 2 applies to . In particular, . By the generalized Lyndon property of and along with Lemma 1, we have
[TABLE]
Hence . Note that , so we also have . On the one hand, we have
[TABLE]
However, the generalized Lyndon property of and Lemma 7 imply
[TABLE]
Therefore , which implies and are powers of a common word. However, we reach our final contradiction by noting that is not primitive, contradicting that it is generalized Lyndon.
Theorem 5.3
Every infinite word has a unique factorization into a nonincreasing product of generalized Lyndon words.
Proof
This follows directly from Theorems 3.3, 5.1, and 5.2.
6 Characterization of Infinite Generalized Lyndon Words
Siromoney et al. showed in [13] that the infinite Lyndon words are precisely the limits of prefix-preserving increasing sequences of finite Lyndon words. We show that this result still holds when Lyndon words are replaced with generalized Lyndon words provided that the infinite word is primitive.
Theorem 6.1
A primitive infinite word is generalized Lyndon if and only if it has infinitely many generalized Lyndon prefixes.
Proof
Lemma 5 handles the reverse direction. Suppose that there exists an infinite generalized Lyndon word with finitely many generalized Lyndon prefixes. Since has infinitely many prefixes, one of its generalized Lyndon prefixes must appear in the unique nonincreasing generalized Lyndon factorizations (from Theorem 3.2) of infinitely many of the prefixes of .
We will now use the method presented in the proof of Theorem 4.1 to construct a nontrivial factorization of , contradicting the result of Theorem 3.3. Suppose that where is a finite generalized Lyndon word and . Further suppose that has infinitely many prefixes with factorizations beginning with . If is not generalized Lyndon, the process proceeds as in Theorem 4.1.
Suppose is generalized Lyndon. If we can choose a generalized Lyndon prefix of such that infinitely many prefixes of have factorizations beginning with , then the process can continue. Otherwise, there must be infinitely many prefixes of such that is a factorization of a prefix of . In particular, we have that for infinitely many prefixes of . Taking the limit of these prefixes, we find that . Thus, is a nontrivial factorization of , contradicting Theorem 3.3.
Therefore either the process terminates and produces a nontrivial finite generalized Lyndon factorization of , or it continues indefinitely and produces a nonincreasing generalized Lyndon factorization of . Either case contradicts Theorem 3.3, so must have infinitely many generalized Lyndon prefixes.
We cannot hope this result extends to the case where the infinite word is not primitive. For example, consider under the alternating order. It has infinitely many Galois prefixes, namely the prefixes of the form for any , but is not Galois.
7 Further Directions
Theorem 9 shows that infinite generalized Lyndon words have infinitely many generalized Lyndon prefixes. But which finite generalized Lyndon words can arise as Lyndon prefixes? It is straightforward to see that if the alphabet is finite, then the maximum -letter word will not arise as a prefix of any infinite generalized Lyndon word. However, we conjecture that every other finite generalized Lyndon word is extendable to an infinite generalized Lyndon word.
Conjecture 1
Every finite generalized Lyndon word of length at least is a prefix of an infinite generalized Lyndon word.
Observe that every finite Lyndon word can be extended to an infinite Lyndon word by appending an -power of the maximum letter. However, in the generalized case there is no notion of a maximal letter appearing in a word. Moreover, there exist finite generalized Lyndon words which cannot be extended to infinite generalized Lyndon words by appending a power of a letter. For example, is a Galois word but and are not Galois. The Galois word is still extendable to an infinite Galois word by appending a slightly more complicated suffix, e.g., . Given an infinite word , it may be interesting to characterize which finite generalized Lyndon words are extendable to an infinite generalized Lyndon word by appending .
Given that every word has a unique nonincreasing factorization into generalized Lyndon words, one may wish to characterize or compute this factorization. For example, given a simple representation (e.g. a finite expression of products and powers) of an infinite word and a generalized lexicographical ordering, one may wish to compute the factorization of the word in polynomial time.
In a different direction, the existence and uniqueness of a factorization of a general transfinite (ordinally indexed) word into Lyndon words is proved in [1]. It remains to be seen whether this factorization theorem still holds when using generalized Lyndon words. Lastly, one may seek a general characterization of the first factor in a generalized Lyndon factorization along the lines of [14]. While simple characterizations such as longest Lyndon prefix fail, there may be a more clever characterization lurking in the background.
8 Acknowledgements
We extend our thanks to the anonymous referees for their detailed comments. The paper was greatly improved by their suggestions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] L. Boasson and O. Carton. Transfinite Lyndon Words. International Conference on Developments in Language Theory , Lecture Notes in Comput. Sci. , 9168:179-190. Springer, (2015).
- 2[2] K.-T. Chen, R. H. Fox, and R. C. Lyndon, Free differential calculus, IV. Annals of Mathematics , 68:81-95, (1958).
- 3[3] F. Dolce, A. Restivo, C. Reutenauer. On generalized Lyndon words. Theoretical Computer Science , doi: 10.1016/j.tcs.2018.12.015, (2018).
- 4[4] F. Dolce, A. Restivo, C. Reutenauer. Some variations on Lyndon words. ar Xiv:1904.00954 [math.DM] , 2019.
- 5[5] D. A. Gewurz and F. Merola. Numeration and enumeration. European Journal of Combinatorics . 33(7): 1547-1556, (2012).
- 6[6] M. Lothaire. Combinatorics on words. Cambridge Mathematical Library , Cambridge University Press, Cambridge, (1997).
- 7[7] M. Lothaire. Algebraic combinatorics on words. Volume 90 of Encyclopedia of Mathematics and Its Applications , Cambridge University Press, Cambridge, (2002).
- 8[8] R. C. Lyndon. On Burnside’s problem. Trans. Amer. Math. Soc. , 77:202-215, (1954).
