Extending de Bruijn sequences to larger alphabets
Ver\'onica Becher, Lucas Cort\'es

TL;DR
This paper introduces a method to extend de Bruijn sequences to larger alphabets by embedding existing sequences into higher-order sequences with additional symbols, ensuring no long runs without the new symbol.
Contribution
It presents a novel approach using auxiliary graphs and maximum flow algorithms to extend de Bruijn sequences to larger alphabets.
Findings
Successful embedding of de Bruijn sequences into larger alphabets.
Ensured no long runs without the new symbol in extended sequences.
Method applicable to various alphabet sizes and sequence orders.
Abstract
A de Bruijn sequence of order n over a k-symbol alphabet is a circular sequence where each length-n sequence occurs exactly once. We present a way of extending de Bruijn sequences by adding a new symbol to the alphabet: the extension is performed by embedding a given de Bruijn sequence into another one of the same order, but over the alphabet with one more symbol, while ensuring that there are no long runs without the new symbol. Our solution is based on auxiliary graphs derived from the de Bruijn graph and solving a problem of maximum flow.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\widowpenalties
2 1000 0
Extending de Bruijn sequences to larger alphabets
Verónica Becher Lucas Cortés
[email protected] [email protected]
Departamento de Computación, Facultad de Ciencias Exactas y Naturales & ICC
Universidad de Buenos Aires & CONICET, Argentina
Abstract
A de Bruijn sequence of order over a -symbol alphabet is a circular sequence where each length- sequence occurs exactly once. We present a way of extending de Bruijn sequences by adding a new symbol to the alphabet: the extension is performed by embedding a given de Bruijn sequence into another one of the same order, but over the alphabet with one more symbol, while ensuring that there are no long runs without the new symbol. Our solution is based on auxiliary graphs derived from the de Bruijn graph and solving a problem of maximum flow.
Keywords: de Bruijn sequences, Eulerian cycle, maximum flow, combinatorics on words.
Contents
1 Introduction and statement of results
A circular sequence is the equivalence class of a sequence under rotations. A de Bruijn sequence of order over a -symbol alphabet is a circular sequence of length in which every length- sequence occurs exactly once [6, 11], see [4] for a fine presentation and history. For example, writing to denote the circular sequence formed by the rotations of , is de Bruijn of order over the alphabet .
A subsequence of a sequence is a sequence defined by for , where . The same applies to circular sequences, assuming any starting position. For example, for the alphabet of digits from [math] to , , and [5612] are subsequences of .
Clearly, for any given de Bruijn sequence over a -symbol alphabet there is another one over the alphabet enlarged with one new symbol, such that the two sequences have the same order, and the first is a subsequence of the second. This is immediate from the characterization of de Bruijn sequences as Eulerian cycles on de Bruijn graphs: the de Bruijn graph for the original alphabet is a sugbgraph of the de Bruijn graph for the enlarged alphabet, and any cycle in an Eulerian graph can be embedded into a full Eulerian cycle. For instance, such an extension can be constructed with Hierholzer’s algorithm for joining cycles together to create an Eulerian cycle of a graph. However, this gives no guarantee that the new symbol is fairly distributed along the resulting de Bruijn sequence.
In this note we consider the problem of extending a de Bruijn sequence over a -symbol alphabet to another one of the same order over the alphabet enlarged with a new symbol, such that the first is a subsequence of the second and there are no long runs without the new symbol. If in between every two successive occurrences of the new symbol there were fewer than symbols, it would be impossible to accommodate all words of length lacking the new symbol. If there were exactly symbols, to accommodate all words of length lacking the new symbol we would need symbols. But this would be impossible because for all sufficiently large values of this quantity exceeds , the length of a de Bruijn sequence of order over a -symbol alphabet. Theorem 1 proves that there is an extension that in between any two successive occurrences of the new symbol there can be at most other symbols.
Theorem 1**.**
For any de Bruijn sequence over a -symbol alphabet of order there is another one over that alphabet enlarged with a new symbol, of the same order , such that is a subsequence of and for any consecutive symbols in there is at least one occurrence of the new symbol.
For example, for this de Bruijn sequence of order over the alphabet ,
[TABLE]
the following de Bruijn sequence of order over the alphabet satisfies the conditions of the theorem:
[TABLE]
because is a subsequence of and given any consecutive symbols in there is at least one occurrence of the symbol .
To prove Theorem 1, in addition to classical elements from graph theory such as de Bruijn graphs, Eulerian cycles and graph transformations, we pose the fairness condition on the new symbol as a problem of maximum flow and solve it with Edmonds-Karp algorithm [7, 5]. The following is a crude upper bound of the complexity of the construction.
Proposition 1**.**
For order and every -symbol alphabet there is a construction that proves Theorem 1 in mathemtical operations.
It is possible to conceive this extension problem in variants of de Bruijn sequences defined in terms of Eulerian cycles in approrpriate graphs. For instance, the semi-perfect de Bruijn sequences of Repke and Rytter [10] which satisfy that each of the prefixes (large enough) has the largest possible number of distinct words. Or the perfect sequences [1] which, for order , contain each word of length exactly times but each one starting at different positions modulo . Or the subtler nested perfect sequences [3] originated in Mordachay Levin’s [9, Theorem 2].
The extension to a larger alphabet without the fairness condition on the new symbol is particularly simple for the lexicographically greatest de Bruijn sequence: the one over the original alphabet is the suffix of the one of the enlarged alphabet [13], assuming the new symbo is the lexicographically greatest. The extension can be done with an efficient greedy algorithm, see [12].
The extension problem to a larger alphabet that we consider in the present note is dual to the extension problem studied by Becher and Heiber in [2], where they considered extending a de Bruijn sequence of order over a -symbol alphabet to another one of order over the same alphabet such that the first is a prefix of the second.
2 Proof of Theorem 1
In the sequel we use the terms word and sequence interchangeably. A de Bruijn graph is a directed graph whose vertices are the words of length over a -symbol alphabet and whose edges are the pairs where and , for some word of length and possibly two different symbols . Thus, the graph has vertices and edges, it is strongly connected and every vertex has the same in-degree and out-degree. Each de Bruijn sequence of order over a -symbol alphabet can be constructed by taking a Hamiltonian cycle in . Since the line graph of is , each de Bruijn sequence of order over a -symbol alphabet can be constructed as an Eulerian cycle in .
2.1 Graph of circular words
Our main tool is the factorization of the set of edges in in convenient sets of pairwise disjoint cycles. We say that two cycles are disjoint if they have no common edges.
Proposition 2**.**
For every, and , the set of edges in can be partitioned into a disjoint set of cycles identified by the circular words of length .
Proof.
As usual, we identify an edge in by concatenating the starting vertex label with the edge label. Thus, each edge in is identified with a word of length . The set of all rotations of a word of length identifies consecutive edges that form a simple cycle in . And each circular word of length corresponds exactly to one simple cycle in . The partition of the set of words of length in the equivalence classes given by their rotations determines a partition of the set of edges in into disjoint simple cycles, see Figure 3. ∎
We define the graph of circular words. Figure 3 shows it for word length over .
Definition 1** (Graph of circular words).**
For every and , is the graph whose vertices are the circular words of length over the -symbol alphabet and two vertices and are connected if there is a word of length and symbols such that , .
The fact that is a subgraph of motivates the following definition.
Definition 2** (Augmenting graph).**
The augmenting graph is the directed graph where is the set of length- words over the alphabet enlarged by a new symbol , and is the set of pairs such that , for some word of length and symbols , and either or have at least one occurrence of the symbol .
Figure 3 illustrates . Observe that in each of the vertices in has exactly one incoming edge and exactly one outgoing edge. This outcoming edge is always labelled with the new symbol . To prove Theorem 1 we plan to construct an Eulerian cycle in by joining the given Eulerian cycle in with disjoint cycles of the augmenting graph that we call petals. Since the edges in are exactly the edges in minus those in , the edges in can also be partitioned into a disjoint set of cycles which are identified by the circular words of length that have at least one occurrence of the new symbol . To define petals we consider the restriction of to the simple cycles in .
Definition 3** (Petal for a vertex in ).**
Let be the subgraph of whose set of vertices are the circular words of length with at least one occurrence of symbol . A petal for a vertex in is a subgraph of that seen as a cycle in , traverses exactly one vertex in , the vertex .
There is exactly one petal for each vertex in and this petal starts at the circular word , where is the new symbol. Now there are two difficulties. One is to determine where to insert the petals so that we obtain a fair distribution of the new symbol . The other difficulty is that petals must exhaust the augmenting graph . Figures 5 and 5 illustrates petals for vertices in .
2.2 Fair distribution of the new symbol
A pointed cycle is a cycle with a specified starting edge.
Definition 4** (Section of a cycle).**
For a pointed Eulerian cycle in given by the sequence of edges and a non-negative integer such that , the sequence of vertices , where each is the head of , is a section of the cycle.
Figure 7 exemplifies the four sections of an Eulerian cycle in . The de Bruijn graph has vertices and edges. An Eulerian cycle in has sections with vertices each section. Since there are the same number of vertices as sections we would like to choose one vertex from each section to place a petal. The problem is each vertex occurs times in the Eulerian cycle but not necessarily at different sections. We pose it as a matching problem.
Definition 5** (Distribution graph).**
Given pointed Eulerian cycle in the Distribution graph is a -regular bipartite graph where the two vertex classes are the vertices in and the sections of the Eulerian cycle and there is an edge if belongs to the section .
A matching in a graph is a set of edges such that no two edges share a common vertex. A vertex is matched if it is an endpoint of one of the edges in the matching. A perfect matching is a matching that matches all vertices in the graph.
Lemma 1**.**
For every Distribution graph there is a perfect matching.
Proof.
Let be a finite bipartite graph consisting of are two disjoint sets of vertices and with edges that connect a vertex in to a vertex in . For a subset of , let be the set of all vertices in adjacent to some element in . Hall’s marriage theorem [8] states that there is a matching that entirely covers if and only if for every subset in , . Consider a Distribution graph and call to the set of vertices and to the set of sections. For any such that , the sum of the out-degree of these vertices is . Given that the in-degree for any vertex in is , we have that . Then, there is a matching that entirely covers . Furthermore, since the number of vertices is equal to the number of sections, and the matching is perfect. ∎
To obtain a perfect matching in a Distribution graph we can use any method to compute the maximum flow in a network. We define the flow network by adding adding two vertices to the Distribution graph, the source and the sink. Add an edge from the source to each vertex in and add an edge from each vertex in to the sink. Assign capacity to each of the edges of the flow network. The maximum flow of the network is . This flow has the edges of a perfect match. Figure 7 shows a Distribution graph , a possible perfect matching, and the flow network used to obtain it.
2.3 Partition of the augmenting graph
We must partition the set of edges in into petals. We define a Petals tree as a root that branches out in a subgraph of . It has height , the vertices at distance to the root have exactly occurrences of the new symbol , for .
Definition 6** (Petals tree).**
Let be a circular word corresponding to an Eulerian cycle in . We define the *Petals tree * given by the root and all the vertices in . Every vertex where has exactly one occurrence of the symbol is a child of the root . And for every pair of vertices , there is an edge between them exactly when there is an edge between them in and has one more occurrence of the new symbol than .
Figure 8 shows a petals tree. The root branches our in the petal for vertex , which has the circular word ; the petal for vertex , which has ; the petal for vertex , which has , , ,; and the petal for which has .
Given Eulerian cycle in and a starting vertex, divide it in sections. Choose one vertex in each section according to a perfect matching. Fix a Petals tree as a subgraph of . The construction considers all the sections, one after the other, starting at section [math]. At each section the construction inserts the petal for a chosen vertex, guided by the Petals tree. Each traversed edge is added to the construction. The construction starts at the vertex that is the head of the first edge of section [math]. Let be the current vertex.
Case is a vertex in : If is a chosen vertex in the current section and the petal for has not been inserted yet then traverse the edge labelled with symbol and continue traversing the petal for (which starts with ). If the petal for has already been traversed or is not a chosen vertex then continue with the traversal of edges in the current section.
Case is not a vertex in : If the edge labelled with has not been traversed yet, is a child of the current node the tree and has not been traversed yet, then traverse it. Otherwise continue with the traversal of the petal that was already part of.
For example, consider this de Bruijn sequence of order over alphabet and the corresponding Eulerian cycle in . Suppose we start this cycle at vertex and consider the four consecutive sections and . Assume a perfect matching yields for section [math] the vertex and for section 1 the second instance of the vertex . Figure 9 illustrates part the construction of the extended Eulerian cycle for a given Petals tress, inserting the petal for the vertex and the petal for the vertex .
2.4 Actual proof of Theorem 1
Proof of Theorem 1.
Let be the list of edges visited by the Eulerian cycle determined by in , and let be the list of the respective head vertices. Divide these vertices in consecutive sections, each has vertices. We use the Edmonds-Karp algorithm determine a vertex from each section. Consider the petals for . If we place one petal in each section, two consecutive petals can be at most edges away. Consider now the petals as pointed cycles in . A petal for vertex starts with the outgoing edge labelled . Inside the petal, for any consecutive edges there is one edge labelled with . The last edge of the petal is where and . Thus, for any consecutive edges there is at least be one labelled with . ∎
2.5 Proof of Proposition 1
Proof of Proposition 1.
We must consider the Eulerian cycle in and the extended Eulerian cycle in . The search of the maximum flow is the most expensive part of the construction. Edmonds-Karp algorithm has running time , see [7, 5] for the flow graph . In our case has a source, a sink, vertices of the original de Bruijn graph and vertices for the sections. So . There is an edge from the source to each vertex in , there are outgoing edges from each vertex in to sections, and there is one outgoing edge to each section the sink. So, . Then the time complexity of the Edmonds-Karp algorithm in our graph is
[TABLE]
This completes the proof. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Nicolás Álvarez, Verónica Becher, Pablo Ferrari, and Sergio Yuhjtman. Perfect necklaces. Advances in Applied Mathematics , 80:48 – 61, 2016.
- 2[2] Verónica Becher and Pablo Ariel Heiber. On extending de Bruijn sequences. Information Processing Letters , 111(18):930–932, 2011.
- 3[3] Verónica Becher and Olivier Carton. Normal numbers and nested perfect necklaces. Journal of Complexity , 54:101403, 2019.
- 4[4] Jean Berstel and Dominique Perrin. The origins of combinatorics on words. European Journal of Combinatorics , 28(3):996–1022, 2007.
- 5[5] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms . MIT Press, 2009.
- 6[6] Nicolaas G. de Bruijn. A combinatorial problem. Nederl. Akad. Wetensch., Proc. , 49:758–764 = Indagationes Math. 8, 461–467 (1946), 1946.
- 7[7] Jack Edmonds and Richard M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM , 19(2):248–264, 1972.
- 8[8] Philip Hall. On representatives of subsets. Journal of the London Mathematical Society , 10, 1935.
