Algorithmic classification of noncorrelated binary pattern sequences
Jakub Konieczny

TL;DR
This paper presents an algorithm to verify noncorrelation in binary pattern sequences, computes the number of such sequences up to length 4, and proposes a conjecture with partial verification for longer sequences.
Contribution
It introduces an algorithmic method for verifying noncorrelation and provides exact counts for sequences of certain lengths, along with a new sufficient condition for specific pattern classes.
Findings
Exactly 2272 noncorrelated sequences of length ≤ 4
A sufficient condition for noncorrelation when patterns do not end with 0
Conjecture on the necessity of the condition verified for lengths ≤ 5
Abstract
We show that it is possible to algorithmically verify if a given pattern sequence is noncorrelated. As an application, we compute that there are exactly noncorrelated binary pattern sequences of length . If we restrict our attention to patterns that do not end with , we put forward a sufficient condition for a pattern sequence to be noncorrelated. We conjecture that this condition is also necessary, and verify this conjecture for lengths .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Algorithmic classification of
noncorrelated binary pattern sequences
Jakub Konieczny
Camille Jordan Institute, Claude Bernard University Lyon 1, 43 Boulevard du 11 novembre 1918, 69622 Villeurbanne Cedex, France
Faculty of Mathematics and Computer Science, Jagiellonian University in Kraków, Łojasiewicza 6, 30-348 Kraków, Poland
Abstract.
The main subject of this paper are binary pattern sequences, that is, sequences of the form where is a set of strings of s and s, and denotes the total number of times patterns from appear in the binary expansion of . A sequence is said to be noncorrelated if the corresponding spectral measure is equal to the Lebesgue measure.
We show that it is possible to algorithmically verify if a given binary pattern sequence is noncorrelated. As an application, we compute that there are exactly noncorrelated binary pattern sequences of length . If we restrict our attention to patterns that do not end with , we put forward a sufficient condition for a pattern sequence to be noncorrelated. We conjecture that this condition is also necessary, and verify this conjecture for lengths .
2010 Mathematics Subject Classification:
Primary: 47B15; Secondary: 11B50
1. Introduction
Uniformity properties of sequences defined in terms of digital expansions have long been studied. Consider, for instance, the Thue–Morse sequence , where denotes the sum of binary digits of , discussed at length by Allouche and Shallit in the survey paper [AS99]. It was shown by Gelfond [Gel68] (see also [MS98]) that is equidistributed in arithmetic progressions:
[TABLE]
for all and , and the rate of convergence can be made explicit. Analogous results hold also for other bases, with mild additional assumptions to account for the fact that . Mauduit and Sárközy [MS98] also observed that the Thue–Morse sequence admits large self-correlations. Here, the (self-)correlation coefficients of a sequence are defined by
[TABLE]
and a simple computation shows that (see Section 3 for details). By the same token,
[TABLE]
meaning in particular that as . On the other hand, the coefficients tend to be rather small; in particular
[TABLE]
which follows e.g. from results in [Coq76]. The spectral measure on associated to a sequence is characterised by the identity , and (4) is equivalent to absolute continuity of .
Many other notions of uniformity have been investigated for the Thue–Morse sequence. In an influential paper, Mauduit and Rivat showed that and its analogues in different bases are equidistributed along the primes [MR10]. Drmota, Mauduit and Rivat [DMR19] showed that is a normal sequence, meaning that each finite sequence of s appears with the expected frequency. Spiegelhofer [Spi18] proved that has level of distribution , which is a far-reaching quantitative generalisation of (1) and can be used to show equdistribution along Piatetski–Shapiro sequences , (see [FM96] for analogous, but somewhat weaker, results in different bases). It was also shown by the author [Kon19] that has small Gowers norms, meaning that it is uniform from the point of view of higher order Fourier analysis.
Another oft-studied sequence carries the name of Rudin–Shapiro and is given by , where denotes the number of times the pattern appears in the binary expansion of , allowing overlaps. Similarly to the Thue–Morse sequence, the Rudin–Shapiro sequence is equidistributed in arithmetic progressions and along the primes [MR15], and has small Gowers norms [Kon19]. However, in contrast to (3), the Rudin–Shapiro sequence is noncorrelated, by which we mean that for all or, equivalently, that the spectral measure is is the Lebesgue measure. Intuitively, noncorrelated sequences are free of any sort of periodic behaviour.
The Thue–Morse and the Rudin–Shapiro sequences are special cases of what we call binary pattern sequences. In general, a binary pattern sequence takes the form
[TABLE]
where is a finite set of patterns over the alphabet and denotes the total number of appearances of patterns from in the binary expansion of (see Section 2 for details). Pattern sequences were studied in a more general context by Morton and Mourant [MM89, Mor90], Coquet, Kamae and Mendès France [CKMF77], and Boyd, Cook and Morton [BCM89]. Generalised Rudin–Shapiro sequences and their correlation coefficients were studied by Allouche and Liardet [AL91]. Finally, Zheng, Peng and Kamae [ZPK18] studied correlation coefficients of binary pattern sequences, and obtained a complete classification of noncorrelated sequences corresponding to sets of patterns of length . Examples of sets that give rise to noncorrelated sequences include:
- •
(then is the Rudin–Shapiro sequence);
- •
(then ) and (then );
- •
, or more generally for a set .
In this paper, we extend the result of [ZPK18] to patterns of length and put the findings in a wider context provided by the theory of automatic and regular sequences. Many of the ideas we use have their analogues and prototypes in [ZPK18]; throughout the paper we give references to the relevant results therein.
Unfortunately, there does not appear to be a simple criterion that determines if a given pattern sequence is noncorrelated (except for the partial information suggested by Conjecture 1.2 below). Due to practical limitations we only state a counting result here, as opposed to a complete list.111The list, together with the code which can be used to produce it, is available from the author.
Theorem A**.**
There are precisely noncorrelated binary pattern sequences corresponding to patterns of length .
As a key step towards obtaining the above result, we reduce the task of verifying whether a given binary pattern sequence is noncorrelated to a finite computation, which can then be automated. The time complexity of the resulting algorithm is polynomial in , where denotes the length of patterns under consideration. Since it takes approximately bits to specify a binary pattern sequence, this is optimal up to improvements in the exponent.
Theorem B**.**
There exists an algorithm which, given a finite set of patterns , performs operations and decides if the corresponding pattern sequence is noncorrelated.
While we keep the exposition fairly self-contained, we also wish to emphasize that the above problem can be seen as a part of a larger theory. We note that binary pattern sequences are -automatic (see Section 2 for the relevant definitions). A crucial component of our reasoning is Theorem 3.5, which assets that the correlation sequences coming from automatic sequences are regular. While this result will not come as a surprise to the experts in the field, to the best of our knowledge it does not appear in print elsewhere. Its importance stems from the fact that a regular sequence admits a simple recursive description, an hence many properties are easily verified for such a sequence. In our particular application, we reduce the task of determining if a pattern sequence is noncorrelated to the ostensibly simpler task of determining if a -regular sequence is identically zero.
The problem of classifying noncorrelated pattern sequences becomes more tractable if we impose additional assumptions on the set of patterns under consideration. Let us call a binary pattern sequence dilation-invariant if for all , or equivalently, if for a set that contains only patterns that begin and end with (see Section 2.4 for details). In the dilation-invariant case, we have a conjectural classification, which we are able to confirm in one direction in full generality, and in the opposite direction for patterns of length .
Theorem C**.**
Let be a set of patterns over the alphabet , all of which begin and end with . Let be the length of the longest word in and let be the corresponding binary pattern sequence. If and then is noncorrelated. Conversely, if and is noncorrelated then .
Conjecture 1.1**.**
Let , and be as in Theorem C. If is noncorrelated then and .
If then the fact that is noncorrelated follows from [ZPK18]. More generally, Theorem 1.3 in [ZPK18] (see also [AL91]) provides a classification of all noncorrelated binary pattern sequences for sets of patterns of the form where , are words over the alphabet and . Conjecture 1.1 is consistent with said classification.
Returning to the general case, we notice that each binary pattern sequence can be written as the product of a periodic sequence and a dilation-invariant pattern sequence (Lemma 2.9). The correlation coefficients of and are closely related (see also Remark 5.6), and in all cases that we were able to check (i.e., ), if is noncorrelated then so is . This motivates us to put forward the following conjecture.
Conjecture 1.2**.**
Let be a noncorrelated binary pattern sequence. Then is the product of a periodic sequence and an dilation-invariant noncorrelated binary pattern sequence
Above we restricted our attention to base for the sake of brevity. In the remaining part of the paper, we work in arbitrary base . In particular, the natural base- variant of Theorem B holds true. The same applies to the first part of Theorem C, except that it is less clear what the base- variant should be and the resulting statement is vacuous for many values of (see Proposition 5.3). When it comes to computations, we only consider base since for larger bases the number of distinct pattern sequences becomes so large that merely listing them all is already infeasible even for modest pattern lengths.
Acknowledgements
While writing this paper, the author was supported by the ERC grant ErgComNum 682150 at the Hebrew University of Jerusalem. During the review process, the author was working within the framework of the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program ”Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). The author also acknowledges support from the Foundation for Polish Science (FNP).
The author wishes to express his gratitude to Boris Adamczewski, Jakub Byszewski, Aihua Fan and Tamar Ziegler for helpful conversations and to the anonymous Referee for the careful reading of this paper and constrictive suggestions.
2. Background and definitions
Convention: Throughout the paper, denotes the base and is considered to be fixed. In particular, all constructions and constants are allowed to depend on unless explicitly stated otherwise.
2.1. Pattern sequences
We let denote the set of digits in base . For a set , we let denote the monoid consisting of words over the alphabet , equipped with the operation of concatenation and neutral element , the empty word. For , denotes the length of . For , denotes the expansion of in base (without leading zeros). Conversely, for , denotes the integer encoded by .
Let be a set. We say that a word appears in another word , or that is a factor of , if there exist such that . We call a prefix (resp. suffix) of if we may take (resp. ). We further define to be the number of times appears in , that is, the number distinct of pairs such that . We note that this definition allows for overlaps, so for instance . More generally, for a finite set , we define .
Accordingly, for and , denotes the number of times that appears in the base- expansion of padded with sufficiently many leading zeros, that is, . The inclusion of the leading zeros in the expansion of ensures better behaviour of the map ; in particular, for each and sufficiently large we have . The assumption that is not a string of zeros ensures that is well-defined, in the sense that for fixed , the sequence () is eventually constant.
We will call a set admissible if is finite and , so that we may define . For any admissible set , the corresponding pattern sequence is defined by (cf. [ZPK18, Definition 1.1])
[TABLE]
If additionally for all then we say that is a pattern sequence of length , or equivalently we define the length of as the least possible value of among all representations of in the form (5), where is an admissible set. Note that one pattern sequence can have multiple representations of the aforementioned form.
For two sets , we let denote the symmetric difference .
Lemma 2.1**.**
The class of pattern sequences is closed under multiplication.
Proof.
It is enough to note that for any admissible sets we have
[TABLE]
It will usually be convenient to impose further restrictions on the set of patterns . Depending on the context we require that either has not leading zeros (in the sense that that is not a prefix of for any ) or that has constant length (in the sense that there is some such that for all ).
Lemma 2.2**.**
Let and let be a pattern sequence of length . Then there exist admissible sets such that has no leading zeros, has constant length , and . Moreover, and are uniquely determined by .
Proof.
Pick any admissible set with . Note that for each and each we have
[TABLE]
To construct , begin with and as long as contains at least one word starting with , say , replace with . Because of (6), this operation does not change the sequence . Since each iteration decreases the total number of leading zeros in the patterns in , after a finite number of steps this procedure must terminate and the resulting set of patterns has no leading zeros.
To construct , likewise, begin with and as long as contains at least one word with length , pick the shortest such word and replace with . Like before, this operation does not change the sequence . Each iteration either decreases the number of words in with least possible length, or increases the length of the shortest word in . At the same time, no words of length larger than are introduced. Hence, after a finite number of steps this procedure must terminate and the resulting set of patterns has constant length equal to .
It remains to show uniqueness. Using Lemma 2.1, we may assume that . For the sake of contradiction, suppose that one of and is non-empty. Consider first the case when and let be the shortest word in . Then , since contains exactly one pattern from , namely . Hence, we have reached a contradiction. Consider next the case when and choose the word where is largest possible. Then we again reach the contradiction: . ∎
Remark 2.3**.**
We focus our attention on -valued sequences for two basic reasons. The first one is practical: The noncorrelation phenomenon that we are interested in relies on occurrence of certain arithmetic coincidences, which become less likely as the number of possible values increases; accordingly, the computational part of the problem becomes increasingly resource-intensive as sequences under consideration become more complicated. The second reason is conceptual: For a -valued sequence with mean , noncorrelation is tantamount to equidistribution of the pairs . More precisely, for each , if we additionally assume that the limits mentioned above exist then if and only if
[TABLE]
The analogous characterisation is false without the assumption that is allowed to take more than values.
2.2. Automatic sequences
In this section we briefly discuss the basics of the theory of automatic sequences; for extensive background see [AS03a]. For , we define the operators acting on sequences by
[TABLE]
The -kernel of a sequence consists of all sequences that can be obtained from by repeated application of ’s, that is,
[TABLE]
It will also be convenient to introduce the shift operator acting on sequences by . For future reference, we record how the introduced operators interact.
Lemma 2.4**.**
For each we have . Moreover, .
Proof.
Direct computation. ∎
A sequence is -automatic (or just automatic, if is clear from the context) if and only if is finite. Many equivalent definitions of automaticity are possible, and we briefly mention some of them to provide context. Details and terminology can be found in [AS03a]. As the name suggests, a sequence is -automatic if and only if it is computed by a deterministic finite -automaton with output. Any fixed point of a -uniform morphism is -automatic, and conversely any -automatic sequence can be obtained as a letter-to-letter coding of a fixed point of a -uniform morphism. When is a prime and is a sequence taking values in a finite field of characteristic , yet another criterion due to Christol shows that is automatic if and only if the associated formal power series is algebraic over .
It is a well-known fact that the class of -automatic complex-valued sequences is closed under addition, multiplication, conjugation and restriction to subsequences, that is, if are -automatic, then so are , , and for any , . More generally, if are -automatic and is arbitrary, then the sequence is -automatic.
For a sequence , we define the mean and the logarithmic mean:
[TABLE]
We note that are not guaranteed to exist, even when the sequence is automatic. (Consider, for instance, the sequence defined by and if , .) On the other hand, we have the following positive result for logarithmic means.
Theorem 2.5** ([AS03a, Thm. 8.4.8]).**
Let be a -automatic sequence. Then exists.
We also record the fact that that if is a bounded sequence and exists then also exists and , see e.g. [AS03a, Prop. 8.4.4 (a)].
Pattern sequences are, unsurprisingly, automatic. In fact, we have the following characterisation of pattern sequences in terms of their -kernels (cf. [ZPK18, Lemma 2.2]).
Lemma 2.6**.**
Let be a sequence with and . Then the following conditions are equivalent:
- (i)
There exists a set with . 2. (ii)
For each , the sequence has period .
Proof.
(i) (ii): Let and . Then each factor of is also a factor of and conversely each factor of that is not a suffix is a factor of . More precisely, for each we have
[TABLE]
Consequently, if the suffix of of length belongs to and otherwise. It follows that is -periodic. Since for each , the operator maps -periodic sequences to -periodic sequences (or constant sequences, if ), it follows that all sequences in the -kernel of take the form where is -periodic.
(ii) (i): For each , let . Note that and that the sequences take values in . Conversely, given any -tuple of -periodic -valued sequences () with , we can inductively construct a sequence with and for all . Hence, the number of sequences that satisfy (ii) is
[TABLE]
On the other hand, the number of subsets of is also equal to , and by the previously proven implication and Lemma 2.2, each of these choices gives rise to a different sequence satisfying (ii). It follows that each sequence satisfying (ii) has a representation as in (i). ∎
2.3. Regular sequences
The class of -regular sequences was introduced by Allouche and Shallit [AS92, AS03b] as a natural generalization of the class of -automatic sequences.
Let be a ring contained in . A sequence is -regular if is contained in a finitely generated -module. Note that if is another ring and then any -regular sequence is also -regular. In our context, the choice of the ring does not play a major role: For the sake of brevity, we set throughout the paper and omit from the notation. (Strictly speaking we could have worked with , making some results marginally stronger.) The fact that the ring under consideration is in fact a field leads to a slightly more succinct definition of regularity: A sequence is -regular if and only if its -kernel spans a finite dimensional vector space over : .
The class of -regular sequences enjoys closure properties analogous to -automatic sequences: If are -regular, then so are , , , () and (). In particular, -regular sequences form an involutive algebra over (with addition and multiplication defined pointwise).
We will need a method to verify if a given regular sequence is identically zero. The following lemma provides a simple criterion.
Lemma 2.7**.**
Let be -regular and non-zero. Then there exists with .
Proof.
For the sake of contradiction, suppose that for all . We show by induction on that for all and . If then , so there is nothing to prove. If and then where and . Hence, by the inductive assumption. ∎
2.4. Invariant sequences
We will say that a sequence is dilation-invariant if for all . The dilation-invariant pattern sequences admit a simple description. Following the convention in Section 2.1, we will say that a set has no trailing zeros if is not a suffix of any .
Lemma 2.8**.**
Let be a pattern sequence. Then is dilation-invariant if and only if there exists a set that has no leading and no trailing zeros and such that .
Proof.
If has no trailing zeros then for all and , so is dilation-invariant.
Conversely, suppose that is dilation-invariant and let be a set of patterns without leading zeros such that , which exists by Lemma 2.2. Suppose for the sake of contradiction that contains a pattern ending with , say for some , and let be as short as possible. Since is dilation-invariant, we have
[TABLE]
On the other hand, each either ends in a non-zero digit (in which case ), or ends in and is not a factor or (in which case ), or is equal to (in which case and ). As a consequence,
[TABLE]
which contradicts (10) and finishes the argument. ∎
We also record the fact that every pattern sequence is the product of a dilation-invariant sequence and a periodic sequence. As we will see (cf. Remark 5.6) the introduction of the multiplicative factor affects the correlation coefficients in a relatively simple way, which motivates our focus on dilation-invariant sequences.
Lemma 2.9**.**
Let and let be a pattern sequence of length . Then there exist a unique dilation-invariant pattern sequence of length such that is -periodic.
Proof.
By Lemma 2.2, we may assume that for a set without leading zeros. Reasoning along similar lines as in the proof of Lemma 2.2, we note that for any word , we have
[TABLE]
In particular, letting , we see that the sequence is -periodic. We construct a sequence of sets , where if contains the word for some and is the first index such that no word in ends with . This construction is guaranteed to terminate because each step decreases the total length of words in that end with . Letting we observe that is the product of -periodic sequences and hence -periodic. ∎
3. Correlation coefficients
In this section we study correlation coefficients of -automatic sequences and show that they are -regular (Corollary 3.5). This allows us to reduce the task of verifying if a given -automatic sequence is noncorrelated to checking if a given -regular sequence is identically zero on , which can be accomplished with the help of Lemma 2.7.
3.1. Definitions
For two sequences , we define the correlation coefficients:
[TABLE]
if the limit exists (otherwise, is considered undefined). We are often interested in the case where , when we write in place of . Unfortunately, the limit defining is not guaranteed to converge even if and are automatic. This motivates us to consider the logarithmic correlation coefficients, defined by
[TABLE]
If and are automatic and , then the sequence is also automatic. Since Theorem 2.5 guarantees existence of logarithmic means of automatic sequences, we have the following fact.
Corollary 3.1**.**
Let be -automatic sequences. Then the coefficients are well-defined for all . Moreover, if the coefficient is well-defined for some then .
3.2. Recurrence
Our next goal is to obtain a recursive description of the correlation coefficients discussed above. Recall that for a -automatic sequence , the kernel is finite and closed under the operators defined in (7) for all .
Lemma 3.2**.**
Let be a finite set of sequences , closed under the operators for all . Then for all it holds that
[TABLE]
where and are given by
[TABLE]
Proof.
Rescaling if necessary, we assume that all sequences in are -bounded (that is, for all and ). For each , splitting into residue classes modulo we obtain
[TABLE]
where and are given by (15), and we use the estimate together with the fact that is summable. Dividing by and recalling that we obtain
[TABLE]
Letting , we obtain (14). ∎
While the coefficients are better-behaved in general, our original motivation concerns the coefficients (where additionally ). Fortunately, existence of the latter is easy to ensure under mild additional assumptions.
Lemma 3.3**.**
Let be a finite set of sequences , closed under the operators for all . Suppose that exists for each . Then also exists for all and and, using the notation from (15), satisfy
[TABLE]
Proof.
Rescaling if necessary, we may assume that all sequences in are -bounded. Generalizing the definition of slightly, for let us put
[TABLE]
Then, following the same reasoning as in Lemma 3.2, we find the recursive relation
[TABLE]
where and are given by (15).
In particular, for we obtain
[TABLE]
Iterating (19) times, we conclude that there exist weights () with and sequences such that
[TABLE]
Since as for each , letting in (20) we conclude that there exists a number such that
[TABLE]
It follows that the sequence () is Cauchy, and is well-defined:
[TABLE]
We are now ready to prove by induction on that the coefficients are well-defined for all and . The case is included in the assumptions, and we have dealt with above. Suppose now that . For each , since for all , we have
[TABLE]
Hence, existence of follows from (19) and the inductive assumption. Finally, to obtain (16) it remains to pass to the limit in (18) (or use Lemma 3.2 combined with the remark after Theorem 2.5). ∎
3.3. Regularity
We are now ready to show that the logarithmic correlation sequences coming from -automatic sequences are -regular. In fact, bearing in mind applications in Section 4 we record a slightly more precise statement. Recall that for a sequence , the sequence is given by . Similar ideas can be seen in [AS03b, Thm. 6].
Proposition 3.4**.**
Let be a finite set of sequences , closed under the operators for all . Let . Then is closed under the operators for all .
Proof.
Pick any (, ) and . It follows from Lemma 3.2 that
[TABLE]
where for each , and
[TABLE]
It remains to note that each of the functions of appearing under the sum on the right hand side of (24) belongs to . ∎
Theorem 3.5**.**
If is -automatic then the sequence is -regular and .
4. Verifying noncorrelation
We now discuss the practical details of how one can check if a given pattern sequence is noncorrelated. We begin by setting up the notation and adapting the general results from previous sections to the situation at hand; this is done in subsections 4.1 and 4.2. Then, in subsections 4.3 and 4.4 we discuss how the relevant computations can be performed. Finally, in subsection 4.5 we discuss the complexity of the resulting algorithm, which finishes the proof of Theorem B. Implementation of this algorithm allows us to verify Theorem A by direct computation.
4.1. Setup
Throughout this section, denotes an admissible set and denotes the corresponding pattern sequence:
[TABLE]
We also introduce the sequence given by
[TABLE]
Our task amounts to verifying that is well-defined (i.e., that the limits defining exist for all ) and determining whether it is identically zero. The existence question is easily accounted for (cf. [ZPK18, Section 3]).
Lemma 4.1**.**
For each and , the coefficient exists.
Proof.
By Lemma 2.6, all sequences in are products of and -periodic sequences. Hence, there is a -periodic sequence such that for all , and consequently
[TABLE]
exists. Existence of for now follows from Lemma 3.3. ∎
Recall that is -regular by Theorem 3.5. In principle, in order to decide if is identically zero, it is now enough to follow the arguments in Section 3 to describe the structure of the -kernel of and then apply Lemma 2.7. In practice, we essentially follow this route, but we also take advantage of the fact that is a -regular sequence of a rather specific form.
4.2. Recursive relations
As a first step towards describing the recursive relations that define , we introduce a set that spans , in analogy to Proposition 3.4. It will be convenient to introduce the restricted averages
[TABLE]
Note that these averages are well-defined thanks to Theorem 2.5. Additionally, it follows from Lemma 2.6 and Lemma 4.1 that the logarithmic averages can be replaced with unweighted averages:
[TABLE]
As a direct consequence of the relevant definitions, we have
[TABLE]
Proposition 4.2**.**
Each sequence in is a linear combination of the sequences , where , and . In particular, .
The proof of the above proposition will follow directly once we describe the behaviour of the base sequences under the operators (). To simplify this description, it will be convenient to introduce the auxiliary sequence , given by
[TABLE]
The following basic fact is analogous to [ZPK18, Lemma 2.1].
Lemma 4.3**.**
The sequence given by (28) is -periodic.
Proof.
Follows immediately from Lemma 2.6. ∎
Lemma 4.4**.**
Let , , and . If then . If then
[TABLE]
where the value of and the ranges of the summations are given by
[TABLE]
Proof.
The case follows by a standard adaptation of the proof of Lemma 3.2. Then, the case is derived using Lemma 2.4. ∎
4.3. Small shifts
Bearing in mind that we hope to apply Lemma 2.7, we need to be able to compute the values for and . This can, in principle, be accomplished by straightforward adaptations of the arguments in Lemma 3.3 and Lemma 4.1. Here, we discuss the practical details of how the computations are performed. Recall that , so we only need to compute .
For , let denote the first position where a digit distinct from appears in the base- expansion ; if for some then . We consider in nondecreasing order with respect to . We have three ranges to consider: , and
If then it follows from Lemma 4.4 that
[TABLE]
here and elsewhere, the summation over runs through . Since we can readily compute and , we can compute .
If then another application of Lemma 4.4 yields
[TABLE]
For all appearing in the above sum we have , and hence has been previously computed. Hence, again, we can directly compute .
Finally, if (meaning that ) then (31) continues to hold, and we have for all summands on the right-hand-side except for the one corresponding to . Hence, we can compute as
[TABLE]
4.4. Basis construction
Recall that our general strategy calls for a construction of a spanning set of . For technical reasons, it appears to be slightly more convenient and efficient to instead work with the potentially larger space
[TABLE]
It remains true that if and only if , and that is closed under for all . Additionally, admits a decomposition
[TABLE]
where the sequences are given by
[TABLE]
By Lemma 2.7, to show that it suffices to verify that for each , which is trivially satisfied for for all .
We proceed to construct a list of sequences which spans . Additionally, we ensure that for each , the sequence belongs to for some and we keep track the value of . By Proposition 4.2, each has a decomposition
[TABLE]
for some coefficients , which we also keep track of. While we cannot ensure that are linearly independent (in fact, we are primarily interested in the case when ), we will ensure that for each , the (multi-)set of coefficient vectors is linearly independent.
We start by setting for ,
[TABLE]
and accordingly (, ).
Suppose next that at a certain stage we have constructed and that for all we have ensured that for all . (Initially, and .) If then is a subset of that is closed under () and under multiplication by (), hence and the construction is complete.
Let us next consider the case when . Put , and . Recall that the only value of for which could be non-zero is . If then . Hence, either , in which case is not noncorrelated and we are done; or , in which case and so as well. Suppose now that . Applying Lemma 4.4, we obtain a representation of in the form
[TABLE]
where the ranges of summation are given by , and , and the coefficients are given by explicit formulae coming from (29). Bearing in mind that , we find the decomposition
[TABLE]
where the coefficients are given by:
[TABLE]
For each , we append to the list if (and only if)
[TABLE]
If (38) holds then we also record (that is, we append to the list ) and that the decomposition of as the sum of basis sequences is given by (36) (what is, we append to the list . Each time a new sequence is added, increases by and after all have been processed, increases by .
The linear independence condition (38) ensures that for each , there are at most values of with , and hence the construction needs to terminate after a bounded number of steps. As the result, we either find, for some , a sequence with (in which case is not noncorrelated) or we construct a finite list of sequences that spans and satisfies for all (in which case is noncorrelated). In either case, we are able to determine whether is noncorrelated.
4.5. Complexity
We now provide quantitative estimates for the amount of computational power needed to verify if the pattern sequence is noncorrelated using the method described above. Throughout, we treat as fixed, and hence are interested in the regime . It will be convenient to introduce, for a function , the shorthand to denote . Thus, for instance, addition or multiplication of two integers of size can be performed using operations.
At several points, we need to compute the values of where . For a word with length , computing directly from the definition requires operations. Since , the values and can be computed in time . Consequently, we can also compute in time .
Following the steps in subsection 4.3, we compute for all . It takes operations to write the values of () in an order consistent with . Note that each of the formulae (30), (31), (32) produces the corresponding value of using arithmetic operations on rational numbers. One can also check by a simple inductive argument that all denominators and numerators that appear in these computations are bounded by , and hence each arithmetic operation takes only basic operations. We also note that all the denominators take the form .
We next proceed to the computation of the sequences () in subsection 4.4. Strictly speaking, we compute the sequence , which uniquely determine via (33), and the auxiliary sequence . For , the explicit formula (34) allows us to compute and with operations (note that w^{(t)}=\big{(}w^{(t)}_{r,e}\big{)}_{r,e} has entries, so this is the least number of operations possible).
Let us now consider the amount of computation required to compute for . Consider any , as in the iterative procedure in second half of subsection 4.4. We note that the application of Lemma 4.4 used to compute in (35) requires no more than arithmetic operations (for each of summands in the decomposition of , we substitute a sum of size ). Once is computed, it only takes operations to compute . Then, for each of values of , in order to verify if should be appended to the list , we need to verify if the corresponding vector of coefficients belongs to a certain linear subspace of , see (38). Keeping track of how much the complexity increases in each step of the construction, we see that for each , the entries of are rational numbers whose numerators are , and whose denominators are and divide for some integer . Thus, in (38) we may scale all of the relevant vectors by a factor of , leaving us with the task of verifying if an integer-valued vector belongs to the span of other integer-valued vectors. The latter task is well-known to have polynomial complexity (with respect to dimensions and lengths of representations of entries), see e.g. [BCS97, Chpt. 16]. Hence, for each in order to decide if should appended, we perform operations. Consequently, the number of operations needed to process the step corresponding to the index is .
Because of the linear independence conditions discussed at the end of subsection 4.4, the total number of the sequences we construct is at most . It follows that in total, we perform at most operations.
5. Dilation-invariant sequences
We now turn to the classification of dilation-invariant pattern sequences. Throughout, let be a set of patterns with no leading or trailing zeros, and let be the corresponding pattern sequence. We also retain the notation from Section 4, specifically the coefficients defined in (25). We let denote the length of , and we assume that .
The following condition turns out to be closely connected to the question of whether is noncorrelated:
[TABLE]
Above, using the standard notation from semigroup theory, for a word and a set , we let .
Remark 5.1**.**
The condition ( ‣ 5) can be stated in simpler terms when . Then, necessarily, and since has no trailing zeros, . Hence, ( ‣ 5) says that for all . Because all patterns in have length , ; and because has no leading zeros, . Thus, ( ‣ 5) reduces to the statement that for all , that is, . This is precisely the assumption that appears in Theorem C.
Remark 5.2**.**
For general , it is not a priori clear if there exists a set of patterns such that ( ‣ 5) holds. Fix and consider the matrix M=\big{(}M_{i,j}^{(u)}\big{)}_{i,j=0}^{k-1} where if and otherwise. Then ( ‣ 5) says that , where denotes the identity matrix, meaning that is a Hadamard matrix. Additionally, if or , meaning that is normalized. Conversely, given any normalized Hadamard matrix , one can easily reconstruct so that for each choice of . Thus, it is possible to satisfy the condition ( ‣ 5) if and only if there is at least one Hadamard matrix of dimension .
The question of existence of Hadamard matrices of a given dimension has long been investigated. They are easily constructed when is a power of through a tensor-power construction. More generally, given Hadamard matrices of dimensions and one can construct a Hadamard matrix of dimension . It is conjectured that Hadamard matrices exist for and all divisible by . So far, this has been confirmed for . See e.g. [CD07, Chpt. V] for further discussion.
The main goal of this section is to prove a slightly more general variant of Theorem C. The second part of this theorem asserts that if is noncorrelated, and then ( ‣ 5) holds. This is verified by exhaustive search222Code available from the author., using the methods developed in Section 4. The remaining part of Theorem C follows from the following result, whose proof will occupy the remainder of this section.
Proposition 5.3**.**
Suppose that ( ‣ 5) holds. Then the sequence is noncorrelated.
From this point onwards, assume that ( ‣ 5) holds. Proceeding along similar lines as in Lemma 3.3 (or Section 4.3), we will compute for small values of (). The following lemma is the main consequence of ( ‣ 5) that we use.
Lemma 5.4**.**
Let and , . Then
[TABLE]
Proof.
Multiplying by , we see that (39) is equivalent to
[TABLE]
Each pattern in of length and each , considering the different positions where can appear, one can check that
[TABLE]
Conversely, if and then
[TABLE]
since , and for each
[TABLE]
Substituting the above identities into the sum on the left-hand side of (39) and applying ( ‣ 5) we conclude that
[TABLE]
Lemma 5.5**.**
Let and . Put . Then
[TABLE]
Proof.
Let us write with and . Then by Lemma 4.4 (or, equivalently, by Lemma 3.2) we have
[TABLE]
where as usual and . We consider several different cases.
Case 0: . It follows directly from the definition of that
[TABLE]
Case 1: and . Applying (41) and noticing that , , and , we obtain
[TABLE]
where the second equality holds because .
In all of the remaining cases, we will show that . We start with the simplest situation where .
Case 2: and , meaning that . Let denote the first position where a digit distinct from appears in the expansion of , allowing if . By (41),
[TABLE]
If then from the previously considered cases and Lemma 5.4 it follows that
[TABLE]
If then for all that enter the sum (43). Hence, reasoning by induction on we conclude that . Finally, if then , and for all that appear in the sum (43) except for . It follows that
[TABLE]
which is only possible if .
Case 3: and . By (41) and Case 2,
[TABLE]
Case 4: . By (41),
[TABLE]
Let and . Note that for all in the sum in (45), where we are using the fact that . We have several subcases to consider. If then
[TABLE]
by Cases 0 and 1 and Lemma 5.4. If while (i.e. or ) then for all by Cases 2 and 3, and consequently also . Finally, if (i.e. and ) then
[TABLE]
by the previously considered subcases.
Case 5: . We reason by induction on . By (41) and the inductive assumption,
[TABLE]
since . ∎
Now that we have computed the values of the coefficients , the remainder of the argument is straightforward.
Proof of Proposition 5.3.
We need to show that
[TABLE]
for all . If there is nothing to prove since . Suppose now that . We may write arbitrary in the form where and . Then, if and otherwise. It follows that
[TABLE]
where the inner-most sum vanishes by Lemma 5.4. ∎
Remark 5.6**.**
Let be a sequence such that is -periodic. Then is pattern by Lemma 2.2. Defining and in analogy to and , with in place of , by a direct computation we show for all and that
[TABLE]
It follows that for all . In particular, for all .
We check by exhaustive search that all noncorrelated binary pattern sequences of length can arise as in the construction outlined above. It seems plausible that the same holds for all lengths. If this is the case, and if Conjecture 1.1 holds true, then the task of verifying if a given binary pattern sequence is noncorrelated can be split into two independent steps: First, check if the dilation-invariant sequence obtained from in Lemma 2.9 satisfies ( ‣ 5); if not then is not noncorrelated333For the sake of simplicity, we work under the additional assumption that and have equal lengths, which is not true in general.. Second, check if the signs in (the analogue of) (47) align in a way that ensures . While the condition from the first step is quite conceptual, it appears that the second step relies mostly on arithmetic coincidence. This would provide an intuitive explanation for why the results in the dilation-invariant case are considerably more concise.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AL 91] J.-P. Allouche and P. Liardet. Generalized Rudin-Shapiro sequences. Acta Arith. , 60(1):1–27, 1991.
- 2[AS 92] J.-P. Allouche and J. Shallit. The ring of k 𝑘 k -regular sequences. Theoret. Comput. Sci. , 98(2):163–197, 1992.
- 3[AS 99] J.-P. Allouche and J. Shallit. The ubiquitous Prouhet-Thue-Morse sequence. In Sequences and their applications (Singapore, 1998) , Springer Ser. Discrete Math. Theor. Comput. Sci., pages 1–16. Springer, London, 1999.
- 4[AS 03a] J.-P. Allouche and J. Shallit. Automatic sequences . Cambridge University Press, Cambridge, 2003. Theory, applications, generalizations.
- 5[AS 03b] J.-P. Allouche and J. Shallit. The ring of k 𝑘 k -regular sequences. II. Theoret. Comput. Sci. , 307(1):3–29, 2003. Words.
- 6[BCM 89] D. W. Boyd, J. Cook, and P. Morton. On sequences of ± 1 plus-or-minus 1 \pm 1 ’s defined by binary patterns. Dissertationes Math. (Rozprawy Mat.) , 283:64, 1989.
- 7[BCS 97] P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory , volume 315 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] . Springer-Verlag, Berlin, 1997. With the collaboration of Thomas Lickteig.
- 8[CD 07] C. J. Colbourn and J. H. Dinitz, editors. Handbook of combinatorial designs . Discrete Mathematics and its Applications (Boca Raton). Chapman & Hall/CRC, Boca Raton, FL, second edition, 2007.
