An Efficient Shift Rule for the Prefer-Max De Bruijn Sequence
Gal Amram, Yair Ashlagi, Amir Rubin, Yotam Svoray, Moshe Schwartz,, Gera Weiss

TL;DR
This paper introduces a new, efficient shift rule for generating prefer-max De Bruijn sequences applicable to all sequence orders and alphabets, with an algorithm that is linear in sequence order.
Contribution
It formulates a universal shift rule for prefer-max De Bruijn sequences and provides a linear-time, memory-efficient algorithm for its implementation.
Findings
The shift rule is applicable to all sequence orders and alphabets.
The algorithm operates in linear time and memory relative to sequence order.
The method improves efficiency over previous approaches.
Abstract
A shift rule for the prefer-max De Bruijn sequence is formulated, for all sequence orders, and over any finite alphabet. An efficient algorithm for this shift rule is presented, which has linear (in the sequence order) time and memory complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Coding theory and cryptography · Cellular Automata and Applications
An Efficient Shift Rule for the
Prefer-Max De Bruijn Sequence111This work was supported in part by the Israeli Science Foundation (ISF) under grant no. 130/14, and grant no. 856/16.
Gal Amram
Yair Ashlagi
Amir Rubin
Yotam Svoray
Moshe Schwartz
Gera Weiss
Department of Computer Science, Ben-Gurion University of The Negev
Department of Electrical and Computer Engineering, Ben-Gurion University of The Negev
Abstract
A shift rule for the prefer-max De Bruijn sequence is formulated, for all sequence orders, and over any finite alphabet. An efficient algorithm for this shift rule is presented, which has linear (in the sequence order) time and memory complexity.
keywords:
De Bruijn sequence , Ford sequence , prefer-max sequence , shift rule
MSC:
[2010] 94A55 , 05C45 , 05C38
††journal: Discrete Mathematics
\addtokomafont
labelinglabel
1 Introduction
A -ary De Bruijn sequence of order (denoted -DB), is a word over the alphabet , i.e., for all , such that all -subwords are distinct (note that means that indices are taken modulo ). Of the many -DB sequences that exist, a particular sequence stands out, featuring in many past works. Consider first the binary case, , start the sequence with , and add bit by bit, always preferring to append a , unless it creates an -word that has already been seen previously. After obtaining a sequence of length , move the prefix to the end of the sequence. The result is an -DB sequence dubbed the prefer-one sequence or Ford sequence [10, 6]. By complementing all the bits, we obtain the prefer-min -DB sequence. In the non-binary case, we can replace the prefer-one rule by a prefer-max (assuming a lexicographical order of the alphabet), resulting in the lexicographically largest -DB sequence, and symmetrically (by complementation), a prefer-min -DB sequence which is the lexicographically smallest -DB sequence.
The greedy bit-by-bit algorithm of [10] is certainly an inefficient way of generating the prefer-max sequence, running in time (integer operations), and requiring memory. Several suggestions have been made since to improve the efficiency of the sequence construction. Fredricksen and Kessler [8], and Fredricksen and Maiorana [9] showed that the prefer-max sequence is in fact a concatenation of certain (Lyndon) words. While seemingly an inefficient way to generate the prefer-max sequence, a later careful analysis [11] has shown that this decomposition allows us to generate the sequence of length in time.
However, another equally important way of generating sequences is of interest, namely, by using a shift rule. It is well known that any -DB sequence can be generated by a feedback shift register (FSR) of order , i.e., there exists a shift-rule function such that for all . Several efficiently computable shift rules for De Bruijn sequences are known, requiring time and memory to generate the next letter in the sequence, given the preceding letters (see [5], as well as [12] for a comprehensive list). We also mention the recent [3], which describes an efficient shift rule for the -ary “grandmama” sequence (which is defined by a co-lexicographic order, compared with the lexicographic order of the prefer-max sequence). However, with a single exception, they only generate non prefer-max sequences, and the only exception [7], addresses only the generation of binary prefer-one sequences.
The main contribution of this paper is an efficient shift-rule function for the prefer-max De Bruijn sequence, (the case of is trivial). The shift rule, given in Algorithm 1, is based on the decomposition of the prefer-max sequence found by [9], and operates in time and memory. This closes a gap in the literature, since while efficient constructions for the entire prefer-max sequence are known, an efficient shift rule is only known in the binary case.
The paper is organized as follows. In Section 2 we provide the necessary notation used throughout the paper, and recall some relevant results. In Section 3 we provide a mathematical expression for the shift rule. We proceed in Section 4 to devise an efficient algorithm that implements the shift rule. We conclude in Section 5 with a short discussion of the results.
2 Preliminaries
Let us start by reviewing the necessary definitions and previous results, before presenting the new results. To avoid trivialities, we assume throughout the paper that . With our alphabet letters we associate a lexicographical order, . This order is extended in the natural way to all finite words from by defining if either is a prefix of , or there exist (possibly empty) and two letters , , such that and .
Given a word , with ,we define the rotation operator, as . We say that two words are cyclically equivalent if there exists such that . The equivalence classes under are called necklaces. The number of necklaces, denoted by , is known to be , where is Euler’s totient function (also the number of cycles in the pure cycling FSR, see [13]). Let be a word. The cyclic order of , denoted , is the smallest positive integer such that or, alternatively, it is the number of elements in the necklace containing . If we say that is primitive. For any there is a unique primitive word such that .
A primitive word that is lexicographically least in its necklace is called a Lyndon word. If is a Lyndon word, we shall also find it useful to define as an expanded Lyndon word222In some places, by abuse of notation, a lexicographically least representative of a necklace (which coincides with our definition of an expanded Lyndon word) is also called a necklace. We shall not do the same since we shall later require a different representative of a necklace, which might cause a confusion., for all . Additionally, we can arrange all the expanded Lyndon words of length in increasing lexicographical order
[TABLE]
where . The main result of [8, 9] (rephrased to simplify the presentation) is that the prefer-min -DB sequence is in fact the concatenation . We shall use this fact later on, and call it the FKM factorization. We also comment that by complementing all the letters via , , for all , the prefer-min -DB sequence becomes the prefer-max -DB sequence, and vice versa. We extend to operate on words in the natural way, i.e., applying it to all letters of the word.
Example 1**.**
Fix and . We then have the following lexicographical order of expanded Lyndon words,
[TABLE]
hence
[TABLE]
and indeed the prefer-min -DB sequence is . After complementing each letter we obtain , which is the prefer-max -DB sequence.
3 Shift-Rule Construction
In this section we state and prove our shift-rule construction. For ease of presentation, we work with the prefer-min sequence, while remembering that by simply complementing letters with , this is equivalent to working with the prefer-max sequence.
We first require a definition that distinguishes another necklace member that is not necessarily the expanded Lyndon word defined above.
Definition 2**.**
A word is a necklace head, tested by the predicate , if we can write as , where , (i.e., ), and is an expanded Lyndon word.
We briefly note that the necklace containing the single word does not formally have a necklace head, whereas all other necklaces have a unique necklace head. Additionally, by the above definition, if is a necklace head, either is empty, or it does not start with the letter .
We now define our shift rule. Traditionally, a shift rule is a function that takes consecutive letters in the sequence (i.e., the current state of an FSR generating the sequence) and its output is the next letter. However, we will find it more convenient to define a shift rule as providing the next state of the FSR. Specifically, let be the prefer-min -DB sequence. A shift rule for the sequence is a function such that , for all .
Definition 3**.**
Let be defined by
[TABLE]
where and .
The main result of this section is the following theorem.
Theorem 4**.**
* is a shift rule for the prefer-min -DB sequence.*
Before proceeding, we provide an example.
Example 5**.**
Continuing our running example from Example 1, consider again the prefer-min -DB sequence . Take as an example the subword , i.e., and . In this case is computed using the second case of Definition 3, and since is true but is false. Thus, , which is consistent with the sequence.
In order to prove Theorem 4 we state a sequence of lemmas, building up to the main result.
Lemma 6**.**
A word is an expanded Lyndon word if and only if for all (i.e., it is lexicographically least in its necklace).
Proof.
Consider the (unique) decomposition , where is primitive. Note that . Thus, if and only if , which holds for all if and only if is a Lyndon word. ∎
A first step we take is showing that increasing the rightmost letter that is not in an expanded Lyndon word maintains the expanded Lyndon property.
Lemma 7**.**
If is an expanded Lyndon word and then is also an expanded Lyndon word.
Proof.
If starts with then it is equal to and the claim follows. Otherwise, write and we shall prove that . If the claim trivially holds. Otherwise, for some word , and . By assumption, . Since , as well. ∎
We now turn, in the following lemmas, to consider connections between successive expanded Lyndon words, and .
Lemma 8**.**
Let be the th expanded Lyndon word in increasing lexicographical order where . Then, the th expanded Lyndon word, , is where is the lexicographically smallest word for which is an expanded Lyndon word.
Proof.
By Lemma 7, is an expanded Lyndon word, i.e., for some . It then follows that must be of the form as claimed. ∎
The following lemma combines the shift rule function, , and the lexicographical order of expanded Lyndon words. We use the notation , , to denote the composition of with itself times.
Lemma 9**.**
, for all .
Proof.
Since , is not the lexicographically last expanded Lyndon word and not the one before it, i.e.,
[TABLE]
We can therefore write , , so . Using these notations
[TABLE]
Our proof proceeds by establishing the following three facts:
2. 2.
3. 3.
, such that is the lexicographically smallest word for which is an expanded Lyndon word.
Combining the three facts together, we get that , and by Lemma 8, we get the desired.
We first prove step (1). We contend that this step’s claim holds since in the first applications of only the third case of the definition of (cf. Definition 3) takes place. To prove this, we need to show that for any decomposition , , , we have
[TABLE]
Hence, we need to show that is not a necklace head, and that if then there is no such that is a necklace head.
For the first condition, assume to the contrary that the predicate is true. By definition, there should exist an integer such that and
[TABLE]
is an expanded Lyndon word. However, we note that
[TABLE]
Since , this contradicts the cyclic order of .
As for the second condition, where , assume to the contrary that for some , the word is a necklace head. Again, there should exist an integer , such that , and
[TABLE]
is an expanded Lyndon word. Thus, on the right-hand side of (2), the rightmost letter that is not , is . By repeated applications of Lemma 7, we get that we can replace by and still have an expanded Lyndon word, i.e.,
[TABLE]
is an expanded Lyndon word. As in the previous case, this contradicts the cyclic order of .
The proof of step (2) is simpler. We need to show that we fall under the first case in the definition of (cf. Definition 3), i.e., that is a necklace head. That is indeed true since
[TABLE]
is an expanded Lyndon word.
Finally, we address step (3), where we need to prove that , such that is the lexicographically smallest word for which is an expanded Lyndon word. Note that by (1), , so by Lemma 8 such an exists. Additionally, for any we have that , thus we never fall within the first case of .
Next, we show that for any , , , , , we have that
[TABLE]
We distinguish between two cases depending on . For the first case, let . We contend that we do not fall within the second case of . Assume to the contrary that there is some such that is a necklace head. Thus, is an expanded Lyndon word. Looking at its suffix of length , we get
[TABLE]
which is a contradiction to the minimality of .
Now, for the case where . By the definition of we know that is an expanded Lyndon word. Using Lemma 7, we get that is also an expanded Lyndon word. Therefore, is a necklace head. Left to be shown is that
[TABLE]
Assuming to the contrary that , then is a necklace head, implying that is an expanded Lyndon word. As in the previous case, since
[TABLE]
we get a contradiction to the minimality of . ∎
Lemma 9 does not apply to the penultimate expanded Lyndon word, for which, by simple inspection of the definition of we state
[TABLE]
We are now in a position to prove the main result.
Proof of Theorem 4.
As a first technical step it is easy to verify that is a shift rule generating some sequence, since indeed for every , , , we have for some .
In the next step, let us examine an unknown sequence , that is initialized with , and whose following letters are generated using . We define the numbers , for all . We prove by induction that for all , . The proof is immediate since the induction base is our initialization of , and the induction step is provided by Lemma 9, since
[TABLE]
By this induction, we already have the prefix of the generated sequence to be , but we are missing the last Lyndon word, . This is easily taken care of, since by (3),
[TABLE]
namely, the last letter is the last Lyndon word,
[TABLE]
We also observe that the shift rule wraps around the end of the sequence. Indeed, by a simple inspection of Definition 3, for every ,
[TABLE]
As the final step in the proof, by FKM [8, 9] this sequence is exactly the prefer-min -DB sequence. ∎
We conclude this section by reminding the reader that in order to generate the prefer-max -DB sequence (instead of the prefer-min one), all that is required is to start the FSR with , and to use the shift rule , where is the complement function defined in Section 2, and denotes function composition.
4 Efficient Shift-Rule Algorithm
Algorithms for implementing shift-rules for the prefer-min (or prefer-max) -DB sequences are known [10, 6]. These greedy algorithms require memory, and time in the worst case (since they in fact need to generate the sequence until the position of the desired next letter). The main result of this section is an efficient algorithm, requiring time and memory, that implements the shift rule we presented in the previous section. By quick inspection, the claim hinges on an efficient implementation of the predicate, as well as finding in the second case of .
Our algorithm uses two key components. The first, is the renowned factorization due to Chen, Fox, and Lyndon [2], namely, that every word has a unique decomposition , such that is a Lyndon word for all , and . We shall call this the CFL factorization of . The second key component is due to Duval [4], who showed that this unique decomposition may be computed for all in time and memory.
First, we address the efficiency of computing the predicate .
Lemma 10**.**
For any it is possible to compute in time and memory.
Proof.
Let be the largest integer such that is a prefix of . We apply Duval’s algorithm [4] to to obtain its CFL factorization . Then is true if and only if . ∎
Next we recall some useful results already known in the literature. A word is called a pre-necklace if there exists such that is an expanded Lyndon word. By [1, Lemma 2.3], a pre-necklace must necessarily be a fractional power of a Lyndon word, i.e., , with being a Lyndon word, , and a proper prefix of . Since the part is a prefix of a CFL decomposition for , this decomposition is unique and it is efficiently computable in time and memory. Finally, we recall [1, Theorem 2.1], whose authors dubbed the “fundamental theorem of necklaces”.
Theorem 11** (Theorem 2.1 of [1]).**
Let , with , be a pre-necklace with fractional-power decomposition . Then, , , is a pre-necklace if and only if . Furthermore, is a Lyndon word if and only if .
We are now in a position to describe the algorithm for and prove its correctness.
Theorem 12**.**
Algorithm 1 correctly computes the shift rule from Definition 3 in time and memory.
Proof.
We argue that Algorithm 1 computes the function . We consider the three cases of Definition 3 separately. First, if and , the algorithm returns in line 2 as required by the first case of Definition 3.
Now, assume the input falls within the third case of Definition 3. If the claim is obvious as the condition in line 1 does not hold. If , then we have . By Lemma 7, if and only if holds. Thus, line 5 correctly checks whether the second case of Definition 3 applies. We therefore reach line 15 exactly when the third case of Definition 3 applies, and correctly return .
We are left with the second case of Definition 3, where and . First, the special case of , is handled correctly in line 4. Otherwise, contains some letter other than , and is well defined.
We now contend that . Since holds, then is an expanded Lyndon word, hence is a pre-necklace. Also, note that if then is a pre-necklace. By Theorem 11, if then is not a pre-necklace. Hence, . However, also by Theorem 11, is a Lyndon word, thus and . This leaves only two possible values for , and consequently, the algorithm terminates in line 10 or in line 12, and returns the desired word.
Finally, as already noted, CFL factorization, , as well as the fractional-power decomposition of line 7, may be computed in linear time and memory (all relying on the CFL factorization algorithm). Thus, the entire algorithm takes linear time and memory. ∎
5 Discussion
In this paper we studied the well known prefer-min and prefer-max -DB sequences. We completed a gap in the literature by presenting a shift-rule for the sequences, as well as an efficient algorithm computing this shift rule. The algorithm receives as input a sub-sequence of letters, and determines the next letter in time and memory.
The shift rule we presented may be seen as an extension to the binary shift rule presented in [7]. Indeed, if we set in our algorithm, the second case of Definition 3 becomes degenerate, we are left with the algorithm of [7]. This also explains the main difficulty in our solution, which is finding efficiently. The crux of solving this difficulty is the proof that we only need to choose between two carefully chosen values.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] K. Cattell, F. Ruskey, J. Sawada, M. Serra, Fast algorithms to generate necklaces, unlabeled necklaces, and irreducible polynomials over G F ( 2 ) 𝐺 𝐹 2 GF(2) , J. of Algorithms 37 (2000) 267–282.
- 2[2] K. T. Chen, R. H. Fox, R. C. Lyndon, Free differential calculus, IV, Annals of Math. 68 (1958) 81–95.
- 3[3] P. B. Dragon, O. I. Hernandez, J. Sawada, A. Williams, D. Wong, Constructing de Bruijn sequences with co-lexicographic order: The k 𝑘 k -ary Grandmama sequence, European J. of Combin. 72 (2018) 1–11.
- 4[4] J. P. Duval, Factorizing words over an ordered alphabet, J. of Algorithms 4 (1983) 363–381.
- 5[5] T. Etzion, An algorithm for constructing m 𝑚 m -ary de Bruijn sequences, J. of Algorithms 7 (3) (1986) 331–340.
- 6[6] L. R. Ford, A cyclic arrangement of m 𝑚 m -tuples, Tech. Rep. P-1071, RAND Corp. (1957).
- 7[7] H. M. Fredricksen, Generation of the Ford sequence of length 2 n superscript 2 𝑛 2^{n} , n 𝑛 n large, J. Combin. Theory 12 (1972) 153–154.
- 8[8] H. M. Fredricksen, I. J. Kessler, Lexicographic compositions and de Bruijn sequences, J. Combin. Theory 22 (1977) 17–30.
