The Kolakoski sequence and related conjectures about orbits
Bobby Shen

TL;DR
This paper explores conjectures about the Kolakoski sequence, focusing on its discrete properties, correlations, and orbit behaviors, supported by empirical evidence and proposing new hypotheses about its structure.
Contribution
It introduces new conjectures on the parity of finite sequence lengths and the periodicity of correlation frequencies in the Kolakoski sequence, expanding understanding of its discrete properties.
Findings
Conjecture that certain finite sequences have odd length for all positive indices.
Proposed that the sign of correlation frequency differences is periodic mod m+n.
Empirical evidence supports the proposed conjectures.
Abstract
The Kolakoski sequence is the unique infinite sequence with values in and first term twems which equals the sequence of run-lengths of itself, we call this We define similarly for odd. A well-known open problem is that its limiting density is one-half. Indeed, not much is known about the Kolakoski sequence. The focus of this paper in on conjectures related to the Kolakoski sequence which are more discrete in nature. We conjecture that a certain doubly infinite family of finite sequences has odd length for all and even We define to be the "correlation frequency" or limiting probability that terms in which are apart are equal. We conjecture that the sign of is periodic mod We also discuss extensive empirical evidence for these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Coding theory and cryptography · Mathematical Dynamics and Fractals
The Kolakoski sequence and related questions about orbits
Abstract.
The Kolakoski sequence is the unique infinite sequence with values in and first two term which equals the sequence of run-lengths of itself; we call this We define similarly for odd. The focus of this paper is not on well-known conjectures about limiting densities but rather on conjectures which are more discrete in nature.
We define two functions, and which are naturally encountered when studying iterated run-length expansion222These were probably introduced by V. Chvatal in [2], but we currently cannot find the paper. We conjecture that a certain doubly infinite family of finite sequences has odd length for all and even We prove that this statement is equivalent to orbits of certain functions being as large as possible. We empirically verify this for all even and
Bobby Shen111Bobby Shen, Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, [email protected]
††2010 Mathematics Subject Classsifications: 05A99
Keywords: Kolakoski sequence, Recurrence
1. Introduction
The Kolakoski sequence is the unique infinite sequence with values in and first two terms which equals the sequence of run-lengths in the run-length encoding of itself. See [3] for one definition of run-length encoding. The existence and uniqueness is relatively easy to prove. The Kolakoski sequence begins There are many open problems associated with the Kolakoski sequence. Perhaps the most famous conjecture is that the limiting density of “” in the Kolakoski sequence equals one-half.
The Kolakoski sequence can be defined for other “alphabets” where and are distinct positive integers. In particular, we define to be the unique infinite sequence with values in whose first term is and which equals the sequence of run-lengths of itself. When is even, the sequence is much easier to understand because is realized as a fixed point of substitution rules. Therefore, we focus on the case in which is odd.
In 1993, V. Chvatal proved that the superior and inferior limits of the density of in the first terms is bounded by Unfortunately, we currently cannot find the paper online. However, we believe that he proved the intermediate result that in an infinite sequence with values in and whose first run-length encodings all have values in , the superior and inferior limits of the density of in the first terms is bounded by 333We believe this because the author did a math research project with Yongyi Chen and Michael Yan in Spring 2015 at MIT. We independently used this intermediate idea and found that the density is bounded by which we think is unlikely to coincide with Chvatal’s bound by pure chance. The density is also bounded by if one replaces “20” by “26.” These results require a fair amount of computation power and memory. See an semi-formal summary of our methods at [1].
We think Chvatal proved this result by introducing functions at least for These functions are naturally encountered when discussing the Kolakoski sequence and iterated run-length encoding. We reproduce their definitions here. The functions have recursive definitions, and the functions may be interesting in their own right.
One remarkable conjecture regarding the function is the following. First, we note that for all sequences the function is a length-preserving bijection over the set Thus, it makes sense to discuss the orbit of this function. We conjecture that for all and even , the orbit of the sequence with ones under the function has size On the other hand, we prove that the orbit of a sequence with length under the function divides In this sense, the sequence has the maximum possible orbit length under We also prove that this conjecture is related equivalent to the following: For all and even the sequence has odd length. In this sense, this conjecture states that a certain doubly infinite family of sequences all has odd length. This conjecture is fundamentally different than the usual conjectures about limiting densities in the Kolakoski sequence.
Using a lot of computational power and a lemma, we have empirically verified this conjecture for all and even . (The lemma reduces this particular infinite set of cases to a finite set.)
1.1. Outline
Section 2 introduces two auxiliary functions and which are naturally encountered when discussing the Kolakoski sequence and iterated run-length encoding. In section 3, we prove basic facts about these orbits, formulate our three main conjectures, and prove that two of them are equivalent. Sectuib 4 discusses our algorithms for verifying many small cases of these conjectures.
1.2. Notation
All numbers in this paper are positive integers except when otherwise specified. If is a sequence, then its terms are The length of is Sometimes, we will use etc. to denote different sequences. If and are sequences with finite, then is the concatenation of (first) and If is a finite sequence, is a number, and is a positive integer, then is the concatenation of copies of and is the sequence of length , all of whose terms are The complement of a sequence with values in means the unique sequence of the same length with values in which is termwise different. The notation means the set of sequences of length with values in The notation means the set of finite sequences with values in
Henceforth, a subsequence of a sequence will always mean a subsequence of consecutive elements.
2. Iterated run-length encoding and the functions and
Let be distinct positive integers. Recall that is the unique infinite sequence with values in and first term which equals the sequence of run lengths of itself. It is clear that the infinite sequence has local structure. Therefore, it is natural to try to express as an infinite concatenation of finite sequences in a meaningful way. In this section, we develop one way of achieving this, and in doing so, we will define the functions and
We define the function (the “run-length” function) such that for any possibly infinite sequence is the sequence of run lengths of Note that
Because we are focusing on ”alphabets” of size the function almost has an inverse. To be precise, given and a posisbly infinite sequence of positive integers there are exactly two different possible values of with values in such that The function also almost has an inverse. To be precise, given a possibly infinite sequence of positive integers there are exactly four different possible values of such that both and have values in and These four values of are realized by first choosing a “starting point” for , either “m” or “n,” then choosing a starting point for again either “m” or “n” and unrelated. This motivates the following definition.
Definition 2.1**.**
Let We define the function , (the “expansion” function) which maps pairs of sequences to sequences as follows. Let be a possibly infinite sequence of positive integers, usually but not necessarily with values in Let be a finite, nonempty sequence of positive integers with values in
If has length then we define to be the unique sequence with values in and whose first term equals and such that
If has length greater than then let be without its first term. Then we define to equal where we regard the number as a length 1 sequence. In other words, is the unique sequence such that
- •
- •
For has values in .
- •
For the first term of equals . (Note the reversal.)
Remark*.*
Informally, we call the “expansion of the sequence with starting points in .” Note that is the first starting point used when expanding, is the second, etc. We insist on having the subscripts in because later, we will discuss can be an arbitrary integer greater than and this expression depends on
Remark*.*
Recall that For any we have and the first term of equals Therefore,
It is natural to consider what happens when we expand a concatenation of two sequences with a single set of starting points: First suppose that has length It’s easy to see that the function is the concatenation of two sequences, such that for The first term of equals the first term of which is The first term of is different from the last term of ; specifically, one term is and the other is in an unspecified order. Note that the last term of only depends on and ; spefically, the last term of equals if is odd, and this last term equals if is even. Combining these observations, we have
[TABLE]
Next consider where has length (where is regarded as a sequence of length ) can be characterized as in the previous paragraph. Let be as in the previous paragraph. It is easy to see that is the concatenation of two sequences, such that The first term of equals the first term of which equals The first term of is different from the last term of Note that and Therefore, Also, the last term of only depends on and . (To be very precise, the last term of doesn’t depend on but we don’t need to be that precise.)
The discussion in the previous two paragraphs motivates the following definition and proposition.
Definition 2.2**.**
We define the “torsion” function which maps pairs of finite sequences to finite sequences as follows. Let be a finite sequence with values in and be a finite sequence, usually with values in as well. For let be the sequence (of length ) which is the first terms of in order. We define to be a sequence with values in of the same length as such that for the term of equals minus the last term of In other words, is the complement of the sequence of the last terms in the intermediate expansions of
Informally, we call the “torsion of the sequence by the sequence .”
Proposition 2.3**.**
Let be a finite sequence with values in be a finite sequence of positive integers, and be an arbitrary sequence of positive integers. Then
[TABLE]
In other words, the expansion of the concatenation by equals the expansion of by concatenated with the expansion of by the torsion of by
The proof of this proposition is basically induction on the length of
Here are some further useful propositions involving and where we repeat the previous proposition for convenience.
Proposition 2.4**.**
Let and be a finite sequence, be an arbitrary sequence, and be finite sequences with values in
[TABLE]
We omit the proofs of these statements. There are some semi-formal proofs provided in [1].
3. A conjecture about orbits of
For a fixed finite sequence and integer the function maps to itself. In this section, we discuss the nature of this function, such as bijectivity. We also formulate a conjecture about the orbit lengths of this function.
Proposition 3.1**.**
Let be any finite sequence, and Let Then the sequence agree in their first terms iff the two sequences agree in their first terms.
Proof.
The terms of the sequences are determined by the intermediate expansions of The first digits of are determined by the first intermediate expansions of Therefore, if agree in their first terms, then agree in their first terms.
On the other hand, suppose that agree in their first terms but not their first terms. As in the previous paragraph, agree in their first terms. The intermediate expansions of and are expansions of the same sequence, namely the common intermediate expansion of but with different starting points: One starting point is and the other starting point is in an arbitrary order. Therefore, the two expansions are exactly “complement sequences” i.e. the two sequences have the same length and elementwise add to Therefore, these two sequences must have different last elements, and the two sequences have different elements at index ∎
Setting we see that is a bijection on Therefore, it makes sense to discuss the orbits and orbit lengths of The following proposition shows that the orbit lengths must be powers of
Proposition 3.2**.**
Let be any finite sequence, and The orbit lengths of the map on the set are all powers of
Proof.
For shorthand, let be the function Suppose that is one orbit under of length . Write in the form where is odd. Assume, for the sake of contradiction, that is not a power of Then and
Consider the two sequences Since these two sequences are different elements of so they do not agree in their first elements. Let be maximal so that these two sequences agree in their first terms. Then By Proposition 3.1, the two sequences agree in their first terms but not their first terms. These two sequences equal by construction. By induction, we have that for the two sequences agree in their first terms but not their first terms. In particular, consider
We have the sequences with values in Any two consecutive elements agree in their first terms but not their first terms. Therefore, all terms have the same first terms, but their index terms alternate. The quantity is odd, so there is an odd number of alternations from to This is a contradiction because these two sequences are actually equal (hence have the same index term). Therefore, must be a power of ∎
The following proposition uses another characterization of orbit lengths to give an upper bound.
Proposition 3.3**.**
let be any sequence, and The size of the orbit of under the map is at most
Proof.
We proceed by induction on Our base cases are If then the set has size so an orbit has size at most as desired.
If then we must show that By identity (3), this is equivalent to showing that Let To show that the first terms of and are equal, observe that has an even number of terms, so has an even number of runs, so For the second terms, observe that Both subsequences on the right hand side have the same number of terms (since they are either equal or complements), so the left hand side has an even number of terms, so has an even number of runs, so
Now assume that and that the proposition is true for smaller Let We must show that (This is a repeated application of identity (3).) Let be the sequence excluding its last two terms so that By the inductive hypothesis, Therefore, and the sequence agrees with in all terms except possibly the last two. To verify that the second-to-last terms are equal, we need to check that has even length. Indeed,
[TABLE]
To verify that the last terms are equal, we need to check that has even length, where is the sequence excluding just its last term. Indeed,
[TABLE]
and as in the case, an expansion of the form has even length if .
∎
The next natural questions are if there are stronger bounds on the sizes of orbits in Unfortunately, lwer bounds on orbits remain quite mysterious, but we have startling conjectures that imply that our upper bound is tight.
Conjecture 3.4**.**
Let be even, and Then the orbit of under the map has length
The following proposition provides a somewhat more concrete equivalent statement.
Proposition 3.5**.**
Let be even. The conjecture 3.4 for the case is equivalent to the statement that the sequence has odd length for all
Proof.
Fix even. First, we show that Conjecture 3.4 implies that has odd length for all
By Conjecture 3.4, the orbit of under has length and the orbit of has length
The orbit of must be at least as big as the orbit of which has length By Proposition 3.3, the orbit of has length at most Therefore, the length is exactly
We have because the orbit has length but by the previous paragraph, the left hand side is a sequence which begins with at least ones, so The fact that the last term is flipped means that the penultimate expansion has odd length, as desired.
Now, we show the converse. Assume that has odd length for all We now prove Conjecture 3.4 by induction on The base case is which is trivial.
Suppose that and that Conjecture 3.4 is true for We must show that the orbit of has length The orbit of is at least as large as the orbit of which has length by the inductive hypothesis. On the other hand, Proposition 3.3 states that the orbit has length at most Since the orbit length is a power of it suffices to show that the orbit length does not divide
Assume for the sake of contradiction that this is not true. Equivalently, we are assuming that Then begins with at least ones. We compute
[TABLE]
Since is equal to either or the two subsequences on the right hand side are either identical or complements. In either case, the left hand side has even length. This contradicts our assumption that has odd length. Therefore, in fact the orbit length of is exactly as desired. ∎
Here is a conjecture similar to Conjecture 3.4 which involves a generalization to To formulate the case, we must first explain in what sense negative values of and/or terms in the sequence are well-defined.
Let be arbitrary integers. We will only be considering the function , not We wish to define where is a finite sequence with values in We can first define for possibly negative integers (or sequences of length ) then use identity (3).
If has length then we define Otherwise, suppose that is empty, and let be the sequence without its first term, so that By identity (4),
[TABLE]
The first subsequence, , is easily seen to be the single term or the complement of . In the case, it makes sense to write The by identity (3), we have
[TABLE]
Observe that the right hand side makes sense for arbitrary provided that we define inductively based on the length of Indeed, this is our definition.
Definition 3.6**.**
Let be distinct integers, possibly nonpositive, be a finite sequence of integers, possibly nonpositive, and be a finite sequence with values in We define as follows.
We first define for integers then use identity (3) to define We define inductively based on the length of If has length then the output is also the empty sequence. Otherwise, write Then we define
[TABLE]
Note that if is negative, we are using the fact that is a length-preserving bijection.
With this definition, we now formulate the following conjecture.
Conjecture 3.7**.**
Let be even and possibly nonpositive. Let Let be the sequence Let be either the sequence or the sequence which are both length sequences. Then the orbit of under the map has length
One can show that the upper bound in Proposition 3.3 applies to the nonpositive integer case. This conjecture is not associated with an odd-expansion analog like Proposition 3.5.
4. Empirical evidence for the conjectures for
The two conjectures stated above are quite mysterious to us, and in fact our only compelling reason for believing them is our extensive empirical evidence. A much less compelling reason is that both conjectures are true for In the case, the map is the identity map, so the computations become much simpler, and the claims reduce to straightforward induction arguments. We omit the details.
First, we show that certain cases of Conjectures 3.4 and 3.7 equivalent.
Proposition 4.1**.**
Let and be even. Then for all even integers such that divides the cases of Conjecture 3.4 are equivalent to each other. Also, for all such the cases of Conjecture 3.7 are equivalent to each other. We make no claim of equivalence between the two conjectures nor any claim that the conjectures are true in these cases.
Proof.
Fix Let be the set of even integers such that In the rest of this proof, is restricted to be in
Let be a function such that for any sequence replaces all occurrences of terms which are not equal to with Let be a function such that for any sequence with values in and replaces all occurrences of [math] with
We claim that if , or and is in , then the value is independent of We proceed by induction on The base case, is easy since switches and for all
Suppose that is independent of over the set Let be in and be in The first term of is independent of because always ”flips” the first term of It remains to check that terms after the first are independent of
Let be without its first term. By identity (4), the sequence , excluding the first term, equals
[TABLE]
The sequence , excluding the first term, equals
[TABLE]
Note that either for all or for all Also, either for all or for all
If for all then the function becomes which is independent of when applied to because has length and the ”base” is independent of over length sequences.
If for all we must prove that is independent of when applied to length sequences, but we have more work to do because the number of iterations depends on We now use Proposition 3.3, which states that an orbit has length dividing In this case, we are concerned with length sequences, and so orbits have length dividing Therefore, is the identity. Since values of in all have the same residue is independent of over length sequences. This completes the proof that is independent of over length sequences.
In the above claim, we now specialize to . Note that for all and are inverses betwen and We can intertwine these bijections with the set automorphism over to get some other automorphism on the set These two set automorphisms are essentially equivalent. In particular, the length of the orbit of under the first automorphism equals the length of the orbit of under the second automorphism. Since is independent of , the orbit lengths of are independent of over length sequences, so the statement of Conjecture 3.4 (if ) or Conjecture 3.7 (if ) is independent of ∎
4.1. Explicit computation to verify the conjectures
Everything in this section is implemented in C++.
Fix an integer in In this subsection, we discuss how we explicitly compute and for in order to verify Conjectures 3.4 and 3.7.
We recursively compute and completely store the functions and for arguments of length at most . This data is stored as two vectors of vectors of unsigned integers, and with the following convention.
Definition 4.2**.**
We define the function as follows. The domain of is pairs with and For and , is given by converting to a binary string padded to length reversing to form the binary string turning into the sequence and replacing the values with the values respectively in to form the sequence
Thus is a bijection from to We define to be a sort of inverse: If is in then where
Example. Let The integer as a binary string of length is so Then and is the sequence Finally, is the sequence Likewise,
Remark*.*
Note that there is a reversal of bits. Of course, the algorithm would work fine if one consistently did not reverse bits.
For and equals and equals Note that both of these values are integers.
The recursion formulas are essentially identity (4). In particular, let be a sequence of length less at most , and let be without its first term. We have
[TABLE]
The relations among sequences are readily converted to relations among integers. For example, is an integer which is given by the integer quotient The hardest part is dealing with Evaluating this the naive way introduces a factor of into the runtime, which is too much. Instead, we decompose the permutation induced by on into cycles in time We can then exponentiate a cyclic permutation in time which grows negligibly with
We then compute the length of the orbit of for under by repeatedly calling As such, we have empirically verified Conjecture 3.4 for all even with and all In view of Proposition 4.1, we have also verified the conjecture for all even and all We have also performed the same empirical verifications for Conjecture 3.7. Note that when computing for Conjecture 3.7, we must compute inverses of maps. This is readily done using the cycle decomposition.
We have also used ad-hoc methods to verify the conjecture for and Essentially, we expand for some small as an intermediate step. This is not feasible for because the intermediate sequences would be too long to be useful.
We conclude by admitting that verifying the conjectures for larger even would use all of our random access memory.
5. Acknowledgements
The author originally researched this problem with Yongyi Chen and Michael Yan as a part of MIT’s class Math Project Lab in Spring 2015. Our main paper for this project is at [1]. Conjecture 3.4 was first observed by Yongyi Chen for
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Chen, Yongyi, Bobby Shen, and Michael Yan. https://www.overleaf.com/4401114 cgstjy
- 2[2] Chvatal, Vasek. “Notes on the Kolakoski Sequence.” Technical Report 93-84. DIMACS. http://dimacs.rutgers.edu/Technical Reports/abstracts/1993/93-84.html . Unfortunately, we currently cannot find the actual paper.
- 3[3] http://mathworld.wolfram.com/Run-Length Encoding.html
