On winning shifts of marked uniform substitutions
Jarkko Peltom\"aki, Ville Salo

TL;DR
This paper investigates the structure of winning shifts in a two-player word game for subshifts generated by marked uniform substitutions, revealing they also have a substitutive structure and analyzing their complexity functions.
Contribution
It provides an explicit description of winning shifts for marked uniform substitutions, especially for generalized Thue-Morse, and links their complexity functions to those of the original subshifts.
Findings
Winning shifts of these subshifts have a substitutive structure.
The complexity functions of winning shifts match those of the original subshifts.
Explicit formulas for the complexity functions of generalized Thue-Morse are derived.
Abstract
The second author introduced with I. T\"orm\"a a two-player word-building game [Playing with Subshifts, Fund. Inform. 132 (2014), 131--152]. The game has a predetermined (possibly finite) choice sequence , , of integers such that on round the player chooses a subset of size of some fixed finite alphabet and the player picks a letter from the set . The outcome is determined by whether the word obtained by concatenating the letters picked lies in a prescribed target set (a win for player ) or not (a win for player ). Typically, we consider to be a subshift. The winning shift of a subshift is defined as the set of choice sequences for which has a winning strategy when the target set is the language of . The winning shift mirrors some properties of . For instance, and have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On winning shifts of marked uniform substitutions
Jarkko Peltomäki and Ville Salo
[email protected], [email protected]
Abstract
The second author introduced with I. Törmä a two-player word-building game [Playing with Subshifts, Fund. Inform. 132 (2014), 131–152]. The game has a predetermined (possibly finite) choice sequence , , of integers such that on round the player chooses a subset of size of some fixed finite alphabet and the player picks a letter from the set . The outcome is determined by whether the word obtained by concatenating the letters picked lies in a prescribed target set (a win for player ) or not (a win for player ). Typically, we consider to be a subshift. The winning shift of a subshift is defined as the set of choice sequences for which has a winning strategy when the target set is the language of . The winning shift mirrors some properties of . For instance, and have the same entropy. Virtually nothing is known about the structure of the winning shifts of subshifts common in combinatorics on words. In this paper, we study the winning shifts of subshifts generated by marked uniform substitutions, and show that these winning shifts, viewed as subshifts, also have a substitutive structure. Particularly, we give an explicit description of the winning shift for the generalized Thue-Morse substitutions. It is known that and have the same factor complexity. As an example application, we exploit this connection to give a simple derivation of the first difference and factor complexity functions of subshifts generated by marked substitutions. We describe these functions in particular detail for the generalized Thue-Morse substitutions.
Keywords: two-player game, winning shift, marked substitution, factor complexity, generalized Thue-Morse word
Turku Centre for Computer Science TUCS, Turku, Finland
University of Turku, Department of Mathematics and Statistics, Turku, Finland
1 Introduction
In the paper [15], the second author introduced with I. Törmä a two-player word-building game. The two players, Alice and Bob, agree on a finite alphabet , a target set of words over , game length , and a choice sequence (a word) of integers in . On the round of the game, , Alice first chooses a subset of of size and then Bob picks a letter from the subset . During the game, Alice and Bob thus together build the word (finite or infinite). If this built word is in the target set , then Alice wins, otherwise Bob does. In other words, Alice aims to build a valid word of while her adversary Bob attempts to introduce a forbidden word.
In studying games of this sort, it would be typical to fix a choice sequence and see what conditions on guarantee the existence of a winning strategy for one of the players. The work of [15] adopts the opposite point of view: fix a set and see for which choice sequences Alice has a winning strategy. This set of choice sequences, dubbed as the winning set of , turns out to be a very interesting object. First of all, if is a subshift, then , now called the winning shift of , is also a subshift, and the set of factors of of length is exactly the winning set of factors of of length . Actually the winning set inherits many properties of . For instance, if is a regular language, so is , and if computable, then so is . The most interesting result, which sparked the research in this paper, is the fact that the sets and have the same cardinality so, for a subshift , the winning shift has the same entropy and factor complexity function as . Now the winning set is in a sense simpler than because it is downward closed: if any letter of a choice sequence in is downgraded to a smaller letter, then the resulting word is still in . The winning set is thus a rearrangement of to a downward closed set. Indeed, the winning set can be significantly simpler: for instance, the winning set of a Sturmian subshift is the subshift over whose words contain at most one letter .
Descriptions of the winning shifts for particular subshifts remain largely unknown. In this work, we provide such descriptions for the winning shifts of subshifts generated by marked uniform substitutions. A marked substitution is a substitution such that all images of letters begin with distinct letters and end with distinct letters. We prove that all long enough choice sequences in such a winning shift are obtained from a few core choice sequences by substitution (Theorem 4.9). Let us make this more precise. Let be a marked uniform substitution of length , and let be a short choice sequence in the language of the winning shift of the subshift generated by . Write for letters and . Then is in the language of ; here is the substitution defined by and the word is in the winning set of certain suffixes of the -images of a subset of of size . All long enough choice sequences in the language of are essentially obtained in this way. In general, the short choice sequences and possible words can be very complex and they elude any simple description, but they can be efficiently computed. This together with Theorem 4.9 allows us to rapidly compute the language of the winning shift . If we make additional assumptions on , then the situation can be simplified. For instance, if is permutive (letters at a fixed position of the -images form a permutation of the alphabet ), then is simply of the form for some such that (4.10). This class of permutive uniform substitutions includes the generalized Thue-Morse substitutions. For them, we compute all involved parameters and give full description of the whole winning shift (Section 5).
The structure of the winning shift of a marked uniform substitution is quite easy to comprehend, and we apply our results to give a simple derivation of the first difference function of such a substitution (Theorem 4.12). This function can in turn be used to derive the factor complexity function. A. Frid has derived these functions previously with other methods [7]; see also [12]. Our arguments and Frid’s arguments, which by the way apply in a more general setting, in the end reduce to the same fundamental observations, but the high-level view is completely different. We prefer gaming and feel that analyzing Alice and Bob’s match is fresh and, more importantly, fun. The aim of this paper is to describe the winning shift; the connection to factor complexity is more of a motive for the study, a curiosity. We do, however, derive the factor complexity function in full detail for the generalized Thue-Morse words, just as we describe their winning shifts completely (Section 5). These complexity functions have been derived in full generality previously by Š. Starosta in [14] using an intriguing connection to so-called -rich words. Results in specialized cases were known before Starosta, see [5, 6, 16]. A short version of this paper with results applying only to the generalized Thue-Morse words was presented in the proceedings of RuFiDiM IV [13].
The paper is organized as follows. In the next section, we give the necessary definitions and results needed. After this in Section 3, we outline the structure of the winning shift of the Thue-Morse substitution and use it as a motivating example to introduce our ideas. Section 4 contains the main results. We show that generally short choice sequences can be substituted to obtain longer choice sequences, but the additional assumption of markedness is needed for desubstitution. We end Section 4 by deriving a recurrence for the first difference function of a marked uniform substitution. The final section is devoted to the generalized Thue-Morse substitutions. We completely describe their winning shifts and, as an application, derive formulas for their factor complexity functions.
2 Notation and Preliminary Results
2.1 Standard Definitions
Here we briefly define word-combinatorial notions; further details are found in, e.g., [10]. An alphabet is a nonempty finite set of letters, and we denote by the set of finite words over . The set of words over of length is denoted by , and by we denote the set of words over with length at most . Infinite words over are sequences in . The length of a finite word is denoted by , and the empty word is the unique word of length [math]. Suppose that is a word (finite or infinite) such that for some words , , and . Then we say that is a factor of . If (respectively , then we call the factor a prefix (respectively suffix) of . If and , then is a proper prefix of ; similarly we define a proper suffix of . We say that occurs at position of ; the position is an occurrence of the factor . Thus we index letters from [math]. The word , where , is obtained from the word by deleting letters from the beginning and letters from the end. An infinite word is ultimately periodic if it is of the form ; otherwise it is aperiodic.
A subshift is a subset of defined by some set of forbidden words:
[TABLE]
We denote by the set of words of length occurring in words of and define the language of as the set . The subshift is uniquely defined by its language. The function defined by letting is called the factor complexity function of (we assume that is known from context), and it counts the number of words of length in the language of . We define the first difference function by setting and . This function measures the growth of the factor complexity function.
2.2 Substitutions
A function is a called a substitution if for all . In this paper, we typically select . If has the same length for every , then we say that is uniform. In this paper, we assume that for uniform substitutions we have for all . We call the images of letters, the words , -images. If begins with and for a letter , then the infinite word obtained by repeatedly applying to , denoted by , is a fixed point of the substitution . Consider the language defined as the set
[TABLE]
consisting of the factors of the words obtainable by applying repeatedly to the letters of . Let
[TABLE]
The subshift generated by is simply the subshift with the language (i.e., we forbid the complement of ). The substitution is primitive if there is an integer such that contains all letters of for every . The substitution is aperiodic if the subshift generated by does not contain ultimately periodic infinite words. We assume that all substitutions considered are aperiodic.
We call a substitution left-marked if all of its -images begin with distinct letters. In other words, there exists a permutation such that for . Analogously we define right-marked substitutions. If a substitution is left-marked and right-marked, then it is simply called a marked substitution. Observe also that marked substitutions have an obvious but important property: if a single letter of a -image is changed, then the resulting word is no longer a valid -image. A substitution is permutive if there exists permutations , , , from to such that for . A permutive substitution is uniform and marked.
We say that a word in admits an interpretation for letters , , by if , , , and . The word is called an ancestor of the word . We say that is a synchronization point of (for ) if and whenever for some and some words and , then and for some words and such that . We say that has synchronization delay if every word in of length at least has at least one synchronization point and is minimal. Observe that if is marked, then all words in of length at least have a unique ancestor. We assume that all substitutions considered in this paper have a synchronization delay. It follows from a theorem of Mossé [11, Corollaire 3.2.] that the synchronization delay of a uniform, primitive, and aperiodic substitution always exists.111Mossé’s Theorem applies to any primitive and aperiodic substitution.
Let be a uniform substitution of length with synchronization delay . Let in be a word such that . Suppose that has an ancestor , so that with . While might have several ancestors, the uniformity of and the fact that has at least one synchronization point ensure that the numbers and are independent of the chosen ancestor . In fact, the positions and mark a synchronization point of . All in all, the number uniquely identifies the positions of where the -images of the letters of any ancestor of begin at, and we say that has decomposition .
2.3 Word Games
Next we define precisely the word game in which two players, Alice and Bob, build a finite or infinite word. A word game is a quadruple , where is an alphabet, , the target set is a subset of , and the choice sequence is a word of length (an infinite word if ) over the alphabet . We may allow the target set to contain words of distinct lengths by using in place of ; this will always be clear from context.
Denote by the word game with , and write for letters . During the round , , of this game, first Alice chooses a subset of of size . Then Bob picks a letter from the set . After rounds, Alice and Bob have together built a word . If , then Alice wins the game and otherwise Bob does. An example is provided at the beginning of Section 3, and more examples are found in [15]. The notions presented in this paragraph extend to the case in a natural way.
Alice’s strategy for is a function that specifies which subset of size she should choose next given the word of length constructed so far. Similarly we define Bob’s strategy as a partial function specifying which letter Bob should pick given the word constructed so far and the subset chosen by Alice. Let and respectively be Alice’s strategy and Bob’s strategy for the game . The play of the strategy pair is the word defined inductively by with (if , then the play is simply infinite). We say that Alice’s strategy is winning if for all Bob’s strategies (Alice wins no matter how Bob plays). Analogously Bob’s strategy is winning if for all Alice’s strategies . If or is a closed set in the product topology of (in particular, if is a subshift), then a winning strategy always exists for one of the players [8]. In this paper, we consider Bob’s strategies only indirectly. Thus whenever we talk about a winning strategy we mean that it is Alice’s winning strategy. Similarly by a winning play we mean a play by a strategy pair where is Alice’s winning strategy.
As mentioned in the introduction, we are interested in the choice sequences for which Alice has a winning strategy. Given a subset of , where , we define the winning set of as the set
[TABLE]
Notice that in general Alice has several winning strategies for a choice sequence in We often omit the alphabet , it will be clear from the context. For a language , we set
[TABLE]
and call also this set the winning set of . If and is a subshift, then we call the winning shift of ; if the subshift is generated by a substitution , then we denote its winning shift by . Indeed, in [15, Proposition 3.4], the following result was obtained.
Proposition 2.1**.**
If is a subshift, then is a subshift and .
We abuse notation and write for , it is always clear from context whether we consider finite words or infinite words. In addition, we have the following observation.
Lemma 2.2**.**
Let and be sets containing words of equal length. If , then .
Proof.
Alice’s winning strategy for a word game with target set and choice sequence in is sufficient as it is for her to win in the game with the same choice sequence and target set . ∎
We endow the alphabet with the natural order . Suppose that and are words over this alphabet (finite or infinite), and write and for letters , . Then we write if and only if and for . The winning set is downward closed with respect to this partial ordering: if and , then . This is simply because downgrading a letter from the choice sequence only makes Bob’s chances of winning slimmer.
Observe that the winning strategies for finite choice sequences ending with the letter are just trivial extensions of winning strategies of shorter choice sequences ending with a letter greater than . Thus we say that a finite choice sequence is reducible if it ends with and irreducible otherwise. The infinite words of the winning shift are obtainable from irreducible choice sequences by appending infinitely many letters and by taking closure. A rule of thumb for the rest of the paper is that to describe the structure of the winning sets it is enough to study only irreducible choice sequences.
Finally, we need the next proposition [15, Proposition 5.7] that motivates the presented results.
Proposition 2.3**.**
If and , then .
We note that a subset of can be interpreted as a family of subsets of (a so-called set system) by considering a word as the characteristic function of a subset. 2.3 has been proven in relation to set systems in [3].222Formally, the result of [3] corresponds to the binary case of 2.3. Their order-shattered sets for the set system whose characteristic functions are correspond to the choice sequences in , where is word reversal, that is, their games are played from right to left.
3 The Motivating Example of the Thue-Morse Substitution
In this section, we consider the winning shift of the Thue-Morse substitution. Through examples, we describe the substitutive structure of this winning shift and outline how it can be used to compute the factor complexity of the subshift generated by the Thue-Morse substitution. Our claims are rigorously derived in the subsequent sections in a more general setting.
Let be the Thue-Morse substitution: , . The substitution is uniform, primitive, and marked, and it is readily proven that it is aperiodic. With an exhaustive search, it is easily established that its synchronization delay is (see also 5.2). The fixed point
[TABLE]
is the famous Thue-Morse word, which is overlap-free (i.e., it does not contain a factor of the form for a word and a letter ). For more details on the substitution , see for example [9, Section 2.2].
In Table 1, we list irreducible choice sequences of for lengths to .333Here we indeed abuse notation, and we should write for . Remember that reducible choice sequences of length are obtained by padding shorter irreducible choice sequences with the letter . For the choice sequence , Alice has the following winning strategy:
[TABLE]
the other arguments being irrelevant. This strategy is depicted in Figure 1 as a strategy tree; this tree representation is used throughout this paper. Whenever Alice has more than one choice according to her strategy, the tree branches to several nodes that correspond to Alice’s possible choices of letters. We omit edges from the tree when there are no branchings.
Table 1 contains many patterns. By 2.3, the number of irreducible choice sequences of length is counted by the first difference function . Based on the data, it seems that for all and only if for and . This is of course readily observed when looking at the factor complexity function; here we see much more: the rule described next confirms the preceding observations.
We observe that a choice sequence in the winning shift always seems to contain at most three occurrences of . Moreover, if contains exactly three occurrences of , then the distance between the two final occurrences is for some , and the middle occurrence is preceded by at most occurrences of the letter . The rule seems to be the following. If , then the only irreducible choice sequence of length (up to the difference at the very beginning) is . Then the number of s increases until there are of them. Next a third occurrence of can be introduced: the choice sequences of length are and (the former choice sequence downgraded). Then the number of s before the second to last occurrence of starts to grow one by one until the choice sequences considered are of length , and then the pattern repeats. The observed rule suggests that irreducible choice sequences of of lengths to are related to irreducible choice sequences of lengths to . Indeed, these choice sequences look identical: the latter ones are just “blown up” by a factor of . Since the substitution also “blows up” words by a factor of , we proceed to look at -images of the strategy trees of short choice sequences.
Consider the strategy tree for the choice sequence depicted in Figure 1. Substitute all letters of this tree with while preserving the branch structure to obtain the right tree of Figure 1. The obtained strategy tree gives a winning strategy for Alice in a word game with choice sequence . Let us next give an intuitive explanation for the strategy from Alice’s point of view. Alice can beat Bob in the word game with choice sequence by imagining that she plays the word game with choice sequence , for which she has a winning strategy. On her first turn, Alice lets Bob choose between [math] and . Since Alice wins this game of length , Alice can also win the game of length with choice sequence played on the -images and (choice sequence is also possible but less interesting). Continuing, Alice lets Bob again choose between [math] and . The win on this play of length ensures Alice winning the game of length with choice sequence played on the -images , , , . Next, Alice gives Bob only one choice to ensure a win, so Bob, having no options, loses in the game of length with choice sequence played on the respective -images. Overall, we see that the short winning strategy for the choice sequence enables Alice to always win the game with choice sequence . This longer choice sequence is constructed in such a way that all occasions of Bob having a real choice (branches of the strategy tree) correspond to Bob having a choice of two letters in the shorter game with choice sequence ; Alice just imagines playing a short game with choice sequence filling the suffixes of the -images by not letting Bob choose. Alice’s method can indeed be viewed as a branch-preserving substitution of the strategy tree.
The method described above does not explain if it is possible for Alice to obtain a winning strategy for, e.g., the choice sequence from some shorter winning strategy. Let us see how she could do this. Alice again imagines playing the winning strategy of the word game with choice sequence using her winning strategy of Figure 1. Now, however, during the first turn Alice lets Bob pick a suffix of length of the -images of the letters [math] and (which Bob is allowed to play on the first turn of the shorter game). Continuing as above, the played word will be a suffix of a word played in the word game with choice sequence and a suffix of a -image of a word played in the word game with choice sequence . Therefore also . Similarly the play on the -images does not have to complete the final image, the play can be restricted to a proper prefix of the -images. In this particular case of the Thue-Morse substitution, it is easy to be convinced that all long enough winning strategies are obtainable by substitution by working out some example desubstitutions on strategy trees.
In the next section, we will prove that the above methods always produce longer winning strategies from short winning strategies, even in the case of a general uniform substitution. We will show that not all long enough winning strategies are necessarily obtainable from short ones by substitution, but we will show that this holds for marked uniform substitutions. In essence, Alice can derive winning strategies for all long enough choice sequences in from a few core strategies. Moreover, we are able to deduce the first difference function of a marked uniform substitution, which makes it possible to derive a formula for the factor complexity function.
Knowing that winning strategies are obtained by substitution is not enough to give a complete description of the winning shift . There is typically some ambiguity on short prefixes of words in due to the fact that they are related to the winning sets of word games played on suffixes of -images. The winning sets of proper suffixes of -images of a marked substitution can be very complicated—nothing general can be stated about their form. Thus at the end of Section 4, we introduce additional assumptions that simplify these winning sets. We show that the winning sets of proper suffixes of the -images of permutive uniform substitutions are trivial, so that admits a complete description. In this case, it can be shown that also the winning shift , not only the winning strategies, has a substitutive structure.
Let us conclude this section by describing the substitutive structure of in our example case of the Thue-Morse substitution. Let be a substitution defined by and , and let be an irreducible choice sequence in for a letter . The result is that the words and are in and that all irreducible choice sequences of length at least are obtained in this manner. Thus in our particular example it is sufficient to know all irreducible choice sequences of of length at most to completely describe .
4 Main Results
In general, for a uniform substitution , substituting short winning strategies yields longer winning strategies in a manner similar to what was outlined in the previous section. To figure out the longer choice sequence obtained from a substituted short winning strategy, we need to identify the positions of the -images of a subset of where Bob can make choices without compromising the chances of Alice winning; in other words, we need to identify the winning set of . Notice that in general we obtain many possible choice sequences as the winning set of might contain several words. We also want to consider the winning sets of prefixes and suffixes of these -images since we want to include plays where in the beginning Bob plays a proper suffix of a -image of a letter and in the end he plays a proper prefix of a -image of a letter, just like in the examples of the previous section. Throughout this section, we assume that is a uniform and aperiodic substitution of length with synchronization delay .
Before formalizing the ideas in the following lemma, we introduce some notation. Let be Alice’s strategy for a word game . We define its language to consist of all possible plays with this strategy, that is, it is the set containing all words for Bob’s strategies . Here, we let denote the set , that is, contains the words that are playable in rounds when Alice uses the strategy .
Lemma 4.1**.**
Let be Alice’s winning strategy for a word game with . Then
[TABLE]
for all integers and such that .
Proof.
Let be in the set on the left side of the inclusion in the statement of the lemma. Notice that this set is indeed nonempty as the intersected sets all contain the word or . We can factorize as , where , , and for . We define a strategy for Alice for the word game as follows:
- •
first Alice plays according to a winning strategy for the game (such a strategy exist as was chosen to be in the winning set of );
- •
after rounds have been played, Alice plays according to a winning strategy for the game , where is a word in such that has the word of length played so far as a suffix (the winning strategy exists because is in for all );
- •
finally, after rounds, Alice plays according to a winning strategy for the game , where is a word in such that has the word of length played so far as a suffix (again, the winning strategy exists because is in for all ).
The described procedure clearly defines a strategy for Alice. What is left is to prove that the strategy is a winning strategy for Alice in order to conclude that .
We show that Bob cannot produce a forbidden word during any round. During the first rounds Alice plays according to a winning strategy for the word game , so a forbidden word cannot be produced. Suppose then that rounds have been played without producing a forbidden word. The word played so far is a suffix of length of the word where . Alice plays next according to a winning strategy for the word game , so the word played during the first rounds is a suffix of length of the word for some . Since is a winning strategy, we see that , so also . This means that no forbidden words are produced during the first rounds. Similarly we see that no forbidden word is produced during the final rounds. We conclude that is a winning strategy for Alice. ∎
Example 4.2**.**
In general, not all choice sequences in are obtainable from shorter ones as in 4.1. Consider for instance the left-marked substitution
[TABLE]
with synchronization delay .444This is quite tedious to find by hand, we used a computer. Its fixed point is
[TABLE]
The left strategy of Figure 2 is a winning strategy of Alice for the choice sequence . Let us show that this strategy is not obtainable from a shorter strategy by substitution. If it would be the case then, by desubstituting the words on the four paths of the strategy tree, we would obtain a winning strategy for Alice. This desubstituted strategy is depicted on the right in Figure 2. The letter stands for one of the letters [math], , and ; as is not right-marked, it is not immediately obvious what should be. Consider the words and corresponding to the two top paths of this desubstituted tree. It is straightforward to see that in the factor is extended to the left only by the letter [math], but is not extended to the left by [math]. This means that there is no choice for , so no desubstituted strategy is winning for Alice. Observe that this happens essentially due to the fact that the -images of [math] and have a common suffix of length . Notice also that the right tree of Figure 2 corresponds by its branch structure to the choice sequence , which can checked not to be in .
Next we turn our attention to substitutions whose winning shifts consist essentially only of choice sequences as in 4.1. We begin with a definition.
Definition 4.3**.**
Let in be a (finite) choice sequence such that . If the winning strategies of are obtainable from the winning strategies of shorter choice sequences in by substitution as in 4.1, then we call substitutive.
Our first step towards desubstituting long enough winning strategies is to consider left-marked substitutions for which we can prove the following lemma.
Lemma 4.4**.**
Suppose that is left-marked. Let in be an irreducible choice sequence such that . Then all winning plays of the game have decomposition .
Proof.
Let be any winning strategy for Alice for the word game . We will prove that the last branching at the end of the strategy tree of marks a synchronization point of any winning play, that is, we claim that all winning plays by Alice with strategy have decomposition . Let be a word in for some word and letter , and suppose that has decomposition (the decomposition is well-defined as ). Let be the largest integer such that . Consider the suffix of of length , so that is a prefix of for some . Since is irreducible, the word is a winning play for some letter such that . Now the word must also have decomposition as otherwise deleting the last letter from the words and would yield two different decompositions for the word contradicting the assumption . Thus by repeating the preceding arguments, we see that is a prefix of for some . Since is left-marked, the only option is that is empty. Consequently, we have . Since was an arbitrary winning strategy, the claim follows. ∎
Example 4.5**.**
Continuing 4.2, consider the winning plays of the word game with choice sequence , depicted on the left in Figure 2. All four possible plays , , , and indeed have decomposition .
4.4 lets us define the notion of decomposition for long enough irreducible choice sequences.
Definition 4.6**.**
Suppose that is left-marked, and let be an irreducible choice sequence in such that . We say that has decomposition where is the unique number such that all winning plays of the game have decomposition .
Example 4.7**.**
Let us show that without assuming that is left-marked, the claim of 4.4 is not always true. Consider the primitive substitution
[TABLE]
which is not left-marked, nor are any of its conjugates since does not have any.555The conjugate of a substitution is the substitution obtained by cyclically shifting the common prefix of the -images. A substitution and its conjugate have the same language. The substitution has synchronization delay , and its fixed point is
[TABLE]
The strategy tree of Figure 3 shows that . Now not all plays with this winning strategy have the same decomposition because in the -images only two distinct letters may occur at a fixed position. In fact, we conjecture something stronger: for infinitely many .
Before we begin desubstituting long strategies, we prove the following lemma, which gives a description of the form of the choice sequences in . Let be the substitution defined by for .
Lemma 4.8**.**
Suppose that is left-marked. If is an irreducible choice sequence in such that with ( has decomposition ), then .
Proof.
If , then there is nothing to prove, so we assume that . Consider the positions , , …, of . Among these positions only the position may contain a letter that is greater than . Otherwise in some play Bob could make a choice inside a -image; recall that the decomposition of the plays is fixed before the game even starts, see 4.4. This is impossible as is left-marked. Thus the letters at positions to spell out a word of the form with . Thus by repeating this argument more times, the claim follows. ∎
Next we consider only marked substitutions and show that then desubstitution is possible.
Theorem 4.9**.**
Suppose that is marked. Let in be an irreducible choice sequence such that . Then is substitutive and if has decomposition , then there exists an irreducible choice sequence in with winning strategy such that
[TABLE]
where is the largest integer such that .
Proof.
Let in be an irreducible choice sequence having decomposition such that . Let be Alice’s winning strategy for the word game with choice sequence . By definition, the strategy tree of branches at positions where contains a letter that is greater than . Let us show how to perform a branch-preserving desubstitution on to obtain a shorter winning strategy .
Consider first a leaf of the strategy tree of . Since by 4.4, the last letter of the play corresponding to this leaf is the first letter of some -image. Since is left-marked, there is a unique letter such that begins with . We replace the leaf corresponding to with a leaf corresponding to .
Next we show how to desubstitute the factors between two branchings in the middle of the strategy tree . Say and are consecutive positions of containing letters that are greater than such that . By 4.8, the factor of starting at position and ending at position is of the form for some . Let be any winning play with the strategy . Since has decomposition , it follows that the factor of starting at position and ending at position is a -image of some shorter word in . This means that after rounds have been played, any time Alice’s strategy branches, Bob has just completed a -image on his previous turn. This means that it is possible to do a branch-preserving desubstitution on the subtrees of length of the strategy tree of : the factor played between two branchings is a -image of a shorter word in and can be directly desubstituted (since is injective). If there are no branchings before the final branching, then we can directly desubstitute the factor of any play starting at position and ending at position (which could be empty).
Now if , then we have desubstituted the whole strategy tree of , and we are done. Suppose that . As is right-marked, the letter at position of uniquely identifies a letter in such that the prefix of of length is a suffix of . We modify by replacing the first choices by a single choice of on a path corresponding to the play . In other words, we let and set to contain the desubstituted subtree obtained above for the suffix of of length . Now is a strategy and it has the same branch structure as save for the initial part of rounds. By construction, all plays by are ancestors of the plays with the winning strategy , so must also be a winning strategy. The strategy is clearly obtained from the strategy by substitution as in 4.1. Therefore is substitutive. The desubstitution process described clearly indicates that has the claimed form. ∎
The essential message of Theorem 4.9 is that knowing all winning strategies for irreducible choice sequences in up to length is enough to derive winning strategies for all irreducible choice sequences—Alice does not need to learn much to beat Bob. Notice also that we can effectively enumerate when is marked, the sets in the statement of Theorem 4.9 are easily found by exhaustive search.
Notice that substituting a strategy tree by preserves its branch structure. Conversely, desubstituting, as in Theorem 4.9, preserves most of the branch structure. Indeed, supposing that is marked, then the subtree of the winning strategy of a word in , as in the third paragraph of the proof of Theorem 4.9, has the same branch structure as the desubstituted subtree. The initial part of the tree comes from a winning set played on suffixes of -images. As there are finitely many of these, we conclude that there can be only finitely many different branch structures in the winning trees associated to the winning shift . This means that in any choice sequence the number of letters greater than is bounded. In essence, Bob can almost never make a difference: on most turns, he has no options but to play what Alice wants. Compared to real life games, this makes our game somewhat degenerate. We emphasize that a priori it is not clear if Bob gets to play often or not.
Observe that substituting two short winning strategies for two distinct choice sequences of the same length could yield the same longer choice sequence. For instance, if and are in , then cutting a branch of length from the winning strategy for the choice sequence yields a winning strategy for the choice sequence . It follows that , so all choice sequences obtained by substituting the winning strategy are already obtained by substituting the winning strategy . This is further elaborated in the proof of Theorem 4.12. Moreover, it is possible that by substituting two distinct winning strategies for a fixed choice sequence produces distinct long choice sequences.
Notice that the prefix of of length , as in the statement of Theorem 4.9, can be very complicated: we only assume that is aperiodic and marked and that it has synchronization delay, so the interior parts of the -images can be chosen almost arbitrarily. To simplify the situation, assume that is permutive. It is now clear that the suffix games related to the -images are trivial: , where is a subset of of elements. To put it in other words: . Thus by Theorem 4.9, we see that the winning shift has the following substitutive structure.
Proposition 4.10**.**
Suppose that is permutive. If in is an irreducible choice sequence such that with letters and , then is in for , and all choice sequences of length at least are obtained in this way.
Since is injective, the relation of the preceding proposition is a bijection from irreducible choice sequences of length to irreducible choice sequences of length . Such a bijection exists also in the case where is only marked as we shall see next in Theorem 4.12. For its proof, we need the following lemma.
Lemma 4.11**.**
Let for a set , a letter , and a word , and suppose that is maximal (for ). Then there exists a unique subset of of size such that for all Alice’s winning strategies for a choice sequence with .
Proof.
Let and be two different winning strategies for the choice sequence . If , then there would be a letter in, say, . By removing the subtree of length associated to this letter from the strategy tree of and attaching it to the strategy tree of , we obtain a new strategy. This new strategy clearly is a winning strategy for Alice for the choice sequence contradicting the maximality of the letter . Thus the set is the same for all Alice’s winning strategies for the choice sequence , and we may denote it by .
Consider then a choice sequence with , and let be Alice’s arbitrary winning strategy for it. It must be that as otherwise there would be a letter in , and we could attach the subtree associated to it to the strategy tree of Alice’s winning strategy for the choice sequence , like above, to obtain a contradiction with the maximality of the letter . ∎
The next theorem states the same result as [7, Corollary 3]. For the statement, we define to be the least integer such that .
Theorem 4.12**.**
Assume that is marked. Suppose that , and write with , , and . Then
Proof.
Consider irreducible choice sequences in of length ending with a word of length . Let be the largest letter such that . When a winning strategy for the choice sequence is substituted, as in 4.1, we obtain a winning strategy for an irreducible choice sequence of length , where with . Moreover, the final letters of such a choice sequence are independent of the prefix by Theorem 4.9. Further, as , Theorem 4.9 implies that all irreducible choice sequences of length are obtained by substitution. Now there are a total of irreducible choice sequences of length with suffix , so if we show that a total of distinct irreducible choice sequences of length are obtainable from them by substitution, then we have shown that there are equally many irreducible choice sequences of length and .
Let be as in 4.11. Consider a choice sequence , , with winning strategy . The choice sequences of length obtained from by substitution are determined by the words in . Lemmas 4.11 and 2.2 imply that , so what is relevant is the size of . 4.11 and 2.3 show that the size of is . Therefore a total of irreducible choice sequences of length are obtainable from choice sequences with suffix . As mentioned in the previous paragraph, we have proved that . The claim follows by a straightforward computation. ∎
Example 4.13**.**
Theorem 4.12 is not true if is only left-marked. Consider for instance the substitution of 4.2 Now , so Theorem 4.12 would predict that . However, by a direct computation, it can be seen that in this case but .
Theorem 4.12 can be used to derive the factor complexity function of a marked uniform substitution because . As the precise details in finding the exact formula do not involve word games, we omit the details and refer the reader to [7, Theorem 2].
Notice also that Theorem 4.12 proves that the first difference function is a -automatic sequence, so the factor complexity function is a -regular sequence; see [2]. This holds for arbitrary uniform substitution.
5 Winning Shifts of Generalized Thue-Morse Words
In this section, we describe the winning shifts of generalized Thue-Morse words and, using our results, derive the known formulas for their factor complexity functions. For more on generalized Thue-Morse words, see e.g. [1]. Our notation largely follows [14].
Let denote the sum of digits in the base- representation of the integer . For and , the generalized Thue-Morse word is defined as the infinite word whose letter at position equals . It is straightforward to prove that is the fixed point, beginning with the letter [math], of the primitive substitution defined by
[TABLE]
for , where the letters are interpreted modulo . The word is ultimately periodic if and only if [1]. We make the assumption that is aperiodic.
To clarify the notation, from now on we assume that letters are elements of the group , so that we can naturally add letters. Moreover, we keep and fixed and simply write for .
Let denote the permutation defined by setting . In other words, the permutation maps to the final letter of the word . We set to be the order of , that is, the least positive integer such that .
To describe the winning shift of , it is crucial to know words of of length and . Our proof is almost verbatim from [14].
Lemma 5.1**.**
We have
- •
* and*
- •
.
Proof.
Set . Clearly . Let to be the set of factors of length of the words in . By the definition of , we have . Since , we have .
By the form of , either the first two letters of a factor of length are equal or its last two letters are. The claim thus follows from the form of the factors of length . ∎
The following lemma concerning the synchronization delay of is proven in [4]; we repeat the proof here.
Lemma 5.2**.**
The substitution has synchronization delay .
Proof.
Consider a word of of length . If contains a factor with , then the factor cannot occur inside a -image, so the position where occurs marks a synchronization point. If such a factor does not occur in , then the word is of the form , that is, . Suppose for a contradiction that has ancestor . Due to the form of , we have and , that is, . This is impossible as and due to our assumption that . Thus the only ancestor of is . We have thus shown that .
Fix . Since , we see that by 5.1. Consider the prefix of of length . This prefix has as a suffix, and its prefix of length is a suffix of . Because , the word has two ancestors proving that . ∎
Since is permutive, it now follows that every choice sequence in having length at least is obtainable by substitution from a shorter choice sequence. Next we describe the choice sequences of length at most .
Proposition 5.3**.**
Let in be an irreducible choice sequence of length .
- (i)
If , then with and . 2. (ii)
If , then or with and .
Moreover, each word of such form is in .
Proof.
Consider first the case . Write with letters and , and let be a winning play in the game with choice sequence . First we argue that the prefix of of length is of the form for some , that is, it equals . If this were not the case, then this prefix equals for some words and and letters and such that . Thus has decomposition . Since is a winning play, Bob cannot choose inside a -image, and it must thus be that is a positive multiple of . This is impossible as now . Due to the restricted form of the prefix of of length , we see that Bob cannot make any choices between his first and last turns, so . Suppose for a contradiction that . Now Bob can pick a letter such that . It follows that is an ancestor of the played word . This is however a contradiction with 5.1. Therefore . It is now clear that any word of the form with and is in : after Bob has chosen , Alice forces him to play after which she lets him choose among the letters such that is in .
Suppose then that . If contains exactly two letters that are greater than , one at the beginning and one at the end, then must again be of the form with and (after Bob has chosen , Alice forces him to play after which she lets him choose among the letters such that ; see 5.1). Otherwise write with letters , , and such that , and let again be a winning play in the game with choice sequence . Analogous to the arguments of the preceding paragraph, we see that unless the prefix of of length is of the form for some . Again, we have and, further, . Assume for a contradiction that . After rounds Bob can choose a letter such that . Clearly the word played so far has decomposition , so during her next turns Alice must let Bob complete the -image beginning with . During his final turn Bob can pick a letter such that . It follows that the played word has the word as an ancestor. By 5.1, this ancestor is not in , so Bob wins. This is a contradiction, so . The preceding arguments also show that must have or as a prefix. Let us consider the former case. Since Bob wins if he can choose inside a -image, Alice must now force Bob to play to ensure that the word played so far has multiple ancestors. If , then as his ultimate move Bob can pick a letter such that . Then has unique ancestor . Our assumption that implies by 5.1 that , which is impossible by the choice of . Thus , that is, . It is now straightforward to derive a winning strategy for Alice for any . The subtree of length of such a strategy is depicted in Figure 4; it is readily verified that the corresponding strategy is winning for Alice using 5.1. The claim follows. ∎
Since is permutive, all long enough choice sequences in are of the form , where for letters and . Combining this with 5.3, we see that the winning shift indeed has the same form as described in Section 3. Either is of the form with and or , where , is the largest such that and .
5.3 together with Theorem 4.12 implies that for the first difference function for takes only two values: and . Using induction, we can derive the values of and (the factor complexity function of ) for any ; see Table 2. These functions have been derived by Š. Starosta with other methods [14].
Acknowledgments
The work of the first author was supported by the Finnish Cultural Foundation by a personal grant.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Jean-Paul Allouche and Jeffrey Shallit “Sums of digits, overlaps, and palindromes” In Discrete Mathematics and Theoretical Computer Science 4.1 , 2000, pp. 1–10
- 2[2] Jean-Paul Allouche and Jeffrey Shallit “Automatic Sequences” Cambridge University Press, 2003
- 3[3] R. P. Anstee, Lajos Rónyai and Attila Sali “Shattering News” In Graphs and Combinatorics 18.1 , 2002, pp. 59–73 DOI: 10.1007/s 003730200003 · doi ↗
- 4[4] L’ubomíra Balková “Factor frequencies in generalized Thue-Morse words” In Kybernetika 48.3 , 2012, pp. 371–385
- 5[5] Srečko Brlek “Enumeration of factors in the Thue-Morse word” In Discrete Applied Mathematics 24 , 1989, pp. 83–96 DOI: 10.1016/0166-218X(92)90274-E · doi ↗
- 6[6] Aldo de Luca and Stefano Varricchio “Some combinatorial properties of the Thue–Morse sequence and a problem in semigroups” In Theoretical Computer Science 63.3 , 1989, pp. 333–348 DOI: 10.1016/0304-3975(89)90013-3 · doi ↗
- 7[7] Anna Frid “On uniform D 0L words” In 15th Symposium on Theoretical Aspects of Computer Science. STACS’98 , Lecture Notes in Computer Science 1373 Springer, 1998, pp. 544–554 URL: http://iml.univ-mrs.fr/~frid/Papers/Frid_3.ps
- 8[8] David Gale and Frank M. Stewart “Infinite games with perfect information” In Contributions to the Theory of Games 2 , Annals of Mathematics Studies, no 28 Princeton: Princeton University Press, 1953, pp. 245–266
