On the longest common subsequence of Thue-Morse words

Joakim Blikstad

arXiv:1904.00248·cs.DM·December 3, 2019

On the longest common subsequence of Thue-Morse words

Joakim Blikstad

PDF

TL;DR

This paper investigates the length of the longest common subsequence between Thue-Morse words and their complements, establishing that this length approaches the full sequence length as the sequence grows large.

Contribution

The paper provides new lower bounds on the longest common subsequence length, demonstrating it approaches the sequence length asymptotically, and generalizes results to any prefix of the Thue-Morse sequence.

Findings

01

Longest common subsequence length approaches the sequence length as n increases

02

Constructed explicit common subsequences for lower bounds

03

Generalized bounds to any prefix of Thue-Morse sequence

Abstract

The length $a (n)$ of the longest common subsequence of the $n$ 'th Thue-Morse word and its bitwise complement is studied. An open problem suggested by Jean Berstel in 2006 is to find a formula for $a (n)$ . In this paper we prove new lower bounds on $a (n)$ by explicitly constructing a common subsequence between the Thue-Morse words and their bitwise complement. We obtain the lower bound $a (n) = 2^{n} (1 - o (1))$ , saying that when $n$ grows large, the fraction of omitted symbols in the longest common subsequence of the $n$ 'th Thue-Morse word and its bitwise complement goes to $0$ . We further generalize to any prefix of the Thue-Morse sequence, where we prove similar lower bounds.

Equations65

a (1)

a (1)

a (2)

a (3)

a (4)

a (5)

a (6)

μ^{0} (0) = 0, μ^{1} (0) = 01, μ^{2} (0) = 0110, μ^{3} (0) = 01101001.

μ^{0} (0) = 0, μ^{1} (0) = 01, μ^{2} (0) = 0110, μ^{3} (0) = 01101001.

X

X

Y

C S (0) : μ^{1} (0) =

C S (0) : μ^{1} (0) =

μ^{1} (1) =

C S (1) : μ^{2} (0) =

μ^{2} (1) =

C S (2) : μ^{4} (0) =

μ^{4} (1) =

μ^{4} (0) =

μ^{4} (0) =

μ^{4} (1) =

f (k + 1) \leq 2^{2^{k}} + (2^{2^{k} - 1} - 1) f (k) .

f (k + 1) \leq 2^{2^{k}} + (2^{2^{k} - 1} - 1) f (k) .

f (s + 1)

f (s + 1)

\leq 2^{2^{s}} + (2^{2^{s} - 1} - 1) (2^{2^{s} - s + 1} - 2)

= 2^{2^{s}} + 2^{2^{s} - 1 + 2^{s} - s + 1} - 2^{2^{s} - 1} \cdot 2 - 2^{2^{s} - s + 1} + 2

= 2^{2^{s + 1} - (s + 1) + 1} - 2^{2^{s} - s + 1} + 2.

f (s + 1)

f (s + 1)

\leq 2^{2^{s + 1} - (s + 1) + 1} - 2.

∣ C S (k) ∣ \geq 2^{n} (1 - \frac{1}{n /2}) = 2^{2^{k}} (1 - \frac{1}{2 ^{k - 1}}) .

∣ C S (k) ∣ \geq 2^{n} (1 - \frac{1}{n /2}) = 2^{2^{k}} (1 - \frac{1}{2 ^{k - 1}}) .

μ^{n} (0)

μ^{n} (0)

μ^{n} (1)

2^{n} (1 - \frac{1}{n /4}) .

2^{n} (1 - \frac{1}{n /4}) .

k = 1 \sum s + 1 \frac{2 ^{k}}{k} = k = 1 \sum s \frac{2 ^{k}}{k} + \frac{2 ^{s + 1}}{s + 1} \leq \frac{2 ^{s + 2}}{s} - 1 + \frac{2 ^{s + 1}}{s + 1} = \frac{2 ^{s + 1} ( 3 s + 2 )}{s ( s + 1 )} - 1 \leq \frac{2 ^{s + 1} ( 4 s )}{s ( s + 1 )} - 1 = \frac{2 ^{s + 3}}{( s + 1 )} - 1,

k = 1 \sum s + 1 \frac{2 ^{k}}{k} = k = 1 \sum s \frac{2 ^{k}}{k} + \frac{2 ^{s + 1}}{s + 1} \leq \frac{2 ^{s + 2}}{s} - 1 + \frac{2 ^{s + 1}}{s + 1} = \frac{2 ^{s + 1} ( 3 s + 2 )}{s ( s + 1 )} - 1 \leq \frac{2 ^{s + 1} ( 4 s )}{s ( s + 1 )} - 1 = \frac{2 ^{s + 3}}{( s + 1 )} - 1,

1 + k = 1 \sum ⌊ l o g_{2} (n)⌋ \frac{2 ^{k + 2}}{k} = 1 + 4 k = 1 \sum ⌊ l o g_{2} (n)⌋ \frac{2 ^{k}}{k}

1 + k = 1 \sum ⌊ l o g_{2} (n)⌋ \frac{2 ^{k + 2}}{k} = 1 + 4 k = 1 \sum ⌊ l o g_{2} (n)⌋ \frac{2 ^{k}}{k}

1 + 4 (\frac{2 ^{⌊ l o g_{2} (n)⌋ + 2}}{⌊ lo g _{2} ( n )⌋} - 1) = \frac{2 ^{⌊ l o g_{2} (n)⌋ + 4}}{⌊ lo g _{2} ( n )⌋} - 3 \leq \frac{n}{⌊ lo g _{2} ( n )⌋ /16} .

1 + 4 (\frac{2 ^{⌊ l o g_{2} (n)⌋ + 2}}{⌊ lo g _{2} ( n )⌋} - 1) = \frac{2 ^{⌊ l o g_{2} (n)⌋ + 4}}{⌊ lo g _{2} ( n )⌋} - 3 \leq \frac{n}{⌊ lo g _{2} ( n )⌋ /16} .

n (1 - \frac{1}{⌊ lo g _{2} ( n )⌋ /16}) .

n (1 - \frac{1}{⌊ lo g _{2} ( n )⌋ /16}) .

e q (n) = {\frac{1}{3} (2^{n} - 1) \frac{1}{3} (2^{n} - 2) if n is even if n is odd .

e q (n) = {\frac{1}{3} (2^{n} - 1) \frac{1}{3} (2^{n} - 2) if n is even if n is odd .

x_{2 i + 1} = y_{2 i + 2} ⟺ t_{2 i + 1} = \overline{t_{2 i + 2}} ⟺ \overline{t_{i}} = \overline{t_{i + 1}} ⟺ t_{i} = t_{i + 1}

x_{2 i + 1} = y_{2 i + 2} ⟺ t_{2 i + 1} = \overline{t_{2 i + 2}} ⟺ \overline{t_{i}} = \overline{t_{i + 1}} ⟺ t_{i} = t_{i + 1}

f (k + 1) = 2^{2^{k}} + (2^{2^{k} - 1} - 1 - e q (2^{k} - 1)) f (k) = 2^{2^{k}} + (2^{2^{k} - 1} - 1 - \frac{1}{3} (2^{2^{k} - 1} - 2)) f (k) .

f (k + 1) = 2^{2^{k}} + (2^{2^{k} - 1} - 1 - e q (2^{k} - 1)) f (k) = 2^{2^{k}} + (2^{2^{k} - 1} - 1 - \frac{1}{3} (2^{2^{k} - 1} - 2)) f (k) .

f (k) = 1, 2, 6, 46, 4166, 91071806, 130383480383828886, \dots

f (k) = 1, 2, 6, 46, 4166, 91071806, 130383480383828886, \dots

f (k + 1) = 2^{2^{k}} + (2^{2^{k} - 1} - 1 - \frac{1}{3} (2^{2^{k} - 1} - 2)) f (k) \leq 2^{2^{k}} + \frac{2}{3} 2^{2^{k} - 1} f (k) = 2^{2^{k}} + 2^{2^{k} - w} f (k) .

f (k + 1) = 2^{2^{k}} + (2^{2^{k} - 1} - 1 - \frac{1}{3} (2^{2^{k} - 1} - 2)) f (k) \leq 2^{2^{k}} + \frac{2}{3} 2^{2^{k} - 1} f (k) = 2^{2^{k}} + 2^{2^{k} - w} f (k) .

f (s + 1)

f (s + 1)

\leq 2^{2^{s}} + 2^{2^{s} - w} (2^{2^{s} - w s + 3} - 6)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the longest common subsequence of Thue-Morse words

Joakim Blikstad

[email protected]

University of Waterloo, Canada

Abstract

The length $a(n)$ of the longest common subsequence of the $n$ ’th Thue-Morse word and its bitwise complement is studied. An open problem suggested by Jean Berstel in 2006 is to find a formula for $a(n)$ . In this paper we prove new lower bounds on $a(n)$ by explicitly constructing a common subsequence between the Thue-Morse words and their bitwise complement. We obtain the lower bound $a(n)=2^{n}(1-o(1))$ , saying that when $n$ grows large, the fraction of omitted symbols in the longest common subsequence of the $n$ ’th Thue-Morse word and its bitwise complement goes to [math]. We further generalize to any prefix of the Thue-Morse sequence, where we prove similar lower bounds.

keywords:

Thue-Morse sequence , Longest common subsequence , Combinatorial problems

††journal: Information Processing Letters

1 Introduction

The Thue-Morse sequence is a well known sequence in mathematics and computer science, with many interesting properties.

The Thue-Morse sequence has a lot of self-symmetry in it, but is at the same time cube-free and overlap-free (for a more in depth introduction to the Thue-Morse sequence, see, for instance, Allouche and Shallit [1]).

In 2006, Jean Berstel [2] formulated the problem of finding the length $a(n)$ of the longest common subsequence between the $n$ ’th Thue-Morse word and its bitwise complement. By bitwise complement we mean replacing [math] with $1$ and $1$ with [math]. This paper primarily studies $a(n)$ (sequence A297618 on the Online Encyclopedia of Integer Sequences [3]). Since the Thue-Morse words are prefixes of length $2^{k}$ for some $k$ , of the Thue-Morse sequence, a natural generalization is to consider other length prefixes of the Thue-Morse sequence. This paper also studies $b(n)$ , the longest common subsequence between the length $n$ prefix of the Thue-Morse sequence and its bitwise complement (sequence A320847).

Example 1.1.

The first few values of $a(n)$ and $b(n)$ are:

[TABLE]

To show a lower bound for $a(n)$ , it suffices to construct a common subsequence of the Thue-Morse words and their bitwise complements. This is what is done in this paper, using the symmetries of the sequence. In particular, we provide a recursive construction for such a common subsequence, which has length at least $2^{n}(1-\mathcal{O}(n^{-\log_{2}3}))=2^{n}(1-o(1))$ .

This new lower bound is interesting as it means that $\frac{a(n)}{2^{n}}$ goes to $1$ , that is when $n$ grows large the longest common subsequence will only omit a vanishingly small fraction of symbols.

2 Setup

There are many equivalent definitions of the Thue-Morse sequence and Thue-Morse words. We will define them using morphisms.

Definition 2.1.

A morphism over an alphabet $\Sigma$ is a function $m:\Sigma^{*}\to\Sigma^{*}$ that satisfies $m(xy)=m(x)m(y)$ (concatenation) for all $x,y\in\Sigma^{*}$ . Note that this means $m$ is uniquely defined by its behaviour on $\Sigma$ .

Definition 2.2.

Let $\mu$ denote the morphism on $\{0,1\}$ defined by $\mu(0)=01$ and $\mu(1)=10$ .

There are some basic properties that follow directly from the definition.

Proposition 2.1.

If $n\geq 0$ then

$\mu^{n}(1)=\overline{\mu^{n}(0)}$ * where $\overline{z}$ denotes taking the bitwise complement of $z$ (i.e., swapping 0s and 1s).* 2. 2.

$\mu^{m+n}(0)=\mu^{m}(\mu^{n}(0))$ . 3. 3.

$\left|\mu^{n}(0)\right|=2^{n}$ . 4. 4.

$\mu^{n+1}(0)=\mu^{n}(0)\mu^{n}(1)$ * and $\mu^{n+1}(1)=\mu^{n}(1)\mu^{n}(0)$ .*

Proof.

(i) follows from the symmetry (between [math] and $1$ ) in the definition of $\mu$ . (ii) holds for all morphisms. (iii) follows from an induction argument since $|\mu(x)|=2|x|$ for every binary string $x$ . (iv) can be seen from $\mu^{n+1}(0)=\mu^{n}(\mu(0))=\mu^{n}(01)=\mu^{n}(0)\mu^{n}(1)$ . ∎

Definition 2.3.

We call $\mu^{n}(0)$ the $n$ ’th Thue-Morse word. We also say the Thue-Morse sequence, denoted by $\mathbf{t}$ , is the the unique fixed point of $\mu$ (extended to the domain of infinite binary strings) beginning with a [math]. See Allouche et al. [1] for why such a fixed point exists and is unique.

Definition 2.4.

Denote by $a(n)$ the length of the longest common subsequence of $\mu^{n}(0)$ and $\mu^{n}(1)$ . Similarly, denote by $b(n)$ the length of the longest common subsequence of the prefix of length $n$ of the Thue-Morse sequence and its bitwise complement.

Example 2.1.

The first few Thue-Morse words are

[TABLE]

The Thue-Morse sequence starts as follows $\mathbf{t}=0110100110010110\ldots$

*Remark**.*

The Thue-Morse words are sometimes defined by the recurrence relation in Proposition 2.1 part (iv), and then the Thue-Morse sequence as the infinite application of this rule. We see that $n$ ’th Thue-Morse word is the prefix of length $2^{n}$ of the Thue-Morse sequence. This also means that $b(2^{n})=a(n)$ .

We also need the following proposition, for which the proof can be found in [1].

Proposition 2.2.

If $\mathbf{t}=t_{0}t_{1}t_{2}\ldots$ are the symbols of the Thue-Morse sequence we have $t_{2n}=t_{n}$ and $t_{2n+1}=\overline{t_{n}}$ for all $n\geq 0$ . Moreover, $t_{n}$ equals the parity of the number of “1” bits in the binary representation of $n$ .

Corollary 2.3.

The $(2i)$ ’th digit of $\mu^{n}(0)$ is the same as the $(2i+1)$ ’th digit of $\mu^{n}(1)$ (where we use zero-indexing).

Proof.

The $(2i)$ ’th digit of $\mu^{n}(0)$ is $t_{2i}=t_{i}$ , and the $(2i+1)$ ’th digit of $\mu^{n}(1)$ is $\overline{t_{2i+1}}=t_{i}$ , by the above proposition. ∎

3 Construction of a common subsequence

We are now ready for a construction of a common subsequence between $\mu^{n}(0)$ and $\mu^{n}(1)$ when $n=2^{k}$ is a power of $2$ . We call this common subsequence $CS(k)$ , and define it recursively.

When $k=0,n=2^{0}=1$ , and we define $CS(0)=0$ , a subsequence of $\mu(0)=\underline{0}1$ and $\mu(1)=1\underline{0}$ .

2.

For $k\geq 1$ , $CS(k)$ will be defined recursively as follows.

Let $n=2^{k}$ and $m=2^{k-1}$ . Say $X=\mu^{n}(0)$ and $Y=\mu^{n}(1)$ , that is, we are constructing $CS(k)$ as a common subsequence of $X$ and $Y$ . Write $X$ and $Y$ as concatenations of $2^{m}$ blocks of size $2^{m}$ (since $|X|=|Y|=2^{n}=(2^{m})^{2}$ this is possible), say

[TABLE]

Since $X=\mu^{2^{k}}(0)=\mu^{2^{k-1}}(\mu^{2^{k-1}}(0))$ , each $x_{i}$ is one of $\mu^{m}(0)$ or $\mu^{m}(1)$ . Similarly each $y_{i}$ is one of $\mu^{m}(0)$ or $\mu^{m}(1)$ . It is also worth noting that $x_{i}=\mu^{m}(d)$ if the $i$ ’th digit of $\mu^{m}(0)$ is $d$ , and similarly $y_{i}=\mu^{m}(d)$ if the $i$ ’th digit of $\mu^{m}(1)$ is $d$ .

Now we compare $x_{i}$ to $y_{i+1}$ for $0\leq i<2^{m}-1$ , and find a common subsequence $cs_{i}$ between them.

(a)

When $i$ is even, $x_{i}=y_{i+1}$ by Corollary 2.3, so we take $cs_{i}=x_{i}$ .

(b)

When $i$ is odd, either $x_{i}$ and $y_{i+1}$ are the same, or one is $\mu^{m}(0)$ and the other is $\mu^{m}(1)$ . If they are the same we take $cs_{i}=x_{i}$ , otherwise $cs_{i}=CS(k-1)$ .

We then let $CS(k)$ be the concatenation of the $cs_{i}$ ’s.

Example 3.1.

The common subsequence $CS(0),CS(1)$ , and $CS(2)$ are underlined below:

[TABLE]

*Remark**.*

$CS(k)$ is not necessarily the longest common subsequence. For example

[TABLE]

is the longest common subsequence between $\mu^{4}(0)$ and $\mu^{4}(1)$ , which has length $12$ , while $|CS(2)|=10$ .

4 Analysis of length

In this section we analyse the length of the common subsequence $CS(k)$ constructed in the previous section.

Definition 4.1.

For an integer $k\geq 0$ , let $f(k)=|\mu^{2^{k}}(0)|-|CS(k)|=2^{2^{k}}-|CS(k)|$ be the number of symbols omitted by the common subsequence $CS(k)$ .

*Remark**.*

$f(0)=1$ , as $|CS(0)|=1$ .

When constructing $CS(k+1)$ , all the even indexed blocks (of size $2^{2^{k}}$ ) in $\mu^{2^{k+1}}(0)$ are chosen to be in $CS(k+1)$ . So only the odd indexed blocks can contribute to $f(k+1)$ . The last block will be completely omitted, and for the other blocks in odd positions we either miss $f(k)$ if matching $\mu^{2^{k}}(0)$ with $\mu^{2^{k}}(1)$ recursively, or miss nothing if choosing to include the complete block. This leads us to the following lemma.

Lemma 4.4.

For every integer $k\geq 0$

[TABLE]

Proof.

The last block has size $2^{2^{k}}$ , and there are $(2^{2^{k}-1}-1)$ other odd indexed blocks, and in each we miss at most $f(k)$ . So the lemma follows from the above discussion. ∎

We are now ready to prove an upper bound on $f(k)$ .

Lemma 4.5.

For every integer $k\geq 0$ , $f(k)\leq 2^{2^{k}-k+1}-2$ .

Proof.

We proceed by induction on $k$ .

The inequality clearly holds for $k=0$ since $f(0)=1\leq 4-2=2^{2^{0}-0+1}-2$

Now suppose the inductive assertion holds for $k=s\geq 0$ , that is $f(s)\leq 2^{2^{s}-s+1}-2$ . Using Lemma 4.4 and the induction hypothesis we have

[TABLE]

Note that $2^{2^{s}-s+1}\geq 4$ for all integers $s\geq 0$ , since $2^{s}-s\geq 1$ for all integers $s\geq 0$ . Thus

[TABLE]

This concludes the induction proof. ∎

By Lemma 4.4 it follows that $f(k)\leq 2^{2^{k}-k+1}-2\leq 2^{2^{k}-(k-1)}$ for all $k\geq 0$ . This means that the length of our constructed common subsequence $CS(k)$ of $\mu^{n}(0)$ and $\mu^{n}(1)$ where $n=2^{k}$ must be at least $2^{n}-f(k)\geq 2^{2^{k}}-2^{2^{k}-(k-1)}=2^{2^{k}}(1-2^{-(k-1)})=2^{n}(1-\frac{1}{n/2})$ . This proves the following theorem.

Theorem 4.6.

For $k\geq 0$ and $n=2^{k}$ :

[TABLE]

5 Extension to all $n$

Up to this point we have only considered the common subsequence of $\mu^{n}(0)$ and $\mu^{n}(1)$ where $n=2^{k}$ for some $k\geq 0$ . We wish to extend our construction to work for arbitrary $n$ .

If $n\geq 1$ and $n\not=2^{k}$ , then say $2^{k}<n<2^{k+1}$ for some integer $k\geq 0$ . Write

[TABLE]

This is saying that $\mu^{n}(x)$ ( $x\in\{0,1\}$ ) can be written as $2^{n-2^{k}}$ blocks, where each block is either $\mu^{2^{k}}(0)$ or $\mu^{2^{k}}(1)$ . We can concatenate $2^{n-2^{k}}$ copies of the subsequence $CS(k)$ to obtain a common subsequence of $\mu^{n}(0)$ and $\mu^{n}(1)$ , i.e., we use our previous construction for each of the blocks independently. Using Theorem 4.6 we see that the length of this common subsequence is at least $2^{n-2^{k}}(2^{2^{k}}(1-\frac{1}{2^{k-1}}))\geq 2^{n}(1-\frac{1}{n/4})$ , since $\frac{n}{4}\leq 2^{k-1}$ by choice of $k$ . We thus get a similar result as Theorem 4.6 for arbitrary $n$ .

Theorem 5.7.

For every $n\geq 1$ , there exists a common subsequence between $\mu^{n}(0)$ and $\mu^{n}(1)$ with length at least

[TABLE]

Corollary 5.8.

$a(n)=2^{n}(1-\mathcal{O}(n^{-1}))$ , or more generally $a(n)=2^{n}(1-o(1))$ .

We can generalize the result further to all prefixes of the Thue-Morse sequence. Let $\mathbf{t}_{n}$ be the prefix of length $n$ of the Thue-Morse sequence, and $\overline{\mathbf{t}}_{n}$ its bitwise complement. Based on the binary representation of the number $n$ , $\mathbf{t}_{n}$ and $\overline{\mathbf{t}}_{n}$ can be split up into at most $\lfloor\log_{2}(n)\rfloor+1$ blocks, each with a size which is a power of $2$ . We will assume the blocks are in order of decreasing size, so that a block of size $2^{k}$ is either $\mu^{k}(0)$ or $\mu^{k}(1)$ . Then common subsequences satisfying the inequality in Theorem 5.7 for these blocks can be concatenated to form a common subsequence between $\mathbf{t}_{n}$ and $\overline{\mathbf{t}}_{n}$ . To bound the length of this common subsequence we use the following lemma:

Lemma 5.9.

$\sum_{k=1}^{s}\frac{2^{k}}{k}\leq\frac{2^{s+2}}{s}-1$ * for all $s\geq 1$ .*

Proof.

We prove the inequality by induction on $s$ .

For $s=1$ we have $\sum_{k=1}^{s}\frac{2^{k}}{k}=2\leq 7=\frac{2^{s+2}}{s}-1$ , and for $s=2$ we have $\sum_{k=1}^{s}\frac{2^{k}}{k}=4\leq 7=\frac{2^{s+2}}{s}-1$ .

Now suppose $s\geq 2$ and $\sum_{k=1}^{s}\frac{2^{k}}{k}\leq\frac{2^{s+2}}{s}$ . This means that

[TABLE]

which concludes the induction proof. ∎

Now we continue to analyse the common subsequence between $\mathbf{t}_{n}$ and $\overline{\mathbf{t}}_{n}$ . This subsequence omits at most $\frac{2^{k+2}}{n}$ symbols for the block of size $2^{k}$ (by Theorem 5.7). There is at most one block of size $2^{k}$ for each $1\leq k\leq\lfloor\log_{2}(n)\rfloor$ . The potential block of size $1=2^{0}$ will miss at most one symbol. Hence at most

[TABLE]

symbols are omitted, which by Lemma 5.9 is at most

[TABLE]

This proves the following theorem.

Theorem 5.10.

For all $n\geq 1$ , there exists a common subsequence between $\mathbf{t}_{n}$ and $\overline{\mathbf{t}}_{n}$ with length at least

[TABLE]

Corollary 5.11.

$b(n)=n(1-\mathcal{O}(\frac{1}{\log n}))$ , or more generally $b(n)=n(1-o(1))$ .

6 Strengthening the analysis

The constructed common subsequence $CS(k)$ , and the generalizations in the previous section, does in fact have a slightly better asymptotic behaviour than what was proven in Section 4.

The previous length analysis was based on Lemma 4.4 which states that $f(k+1)\leq 2^{2^{k}}+\left(2^{2^{k}-1}-1\right)f(k)$ . This inequality is only tight when all $x_{i}\neq y_{i+1}$ for odd $0\leq i<2^{m}-1$ , using the same notation as in Section 3. However, we can get a better bound on $f(k+1)$ in terms of $f(k)$ by estimating how many of the blocks $x_{i}$ and $y_{i+1}$ are equal for odd $i$ .

Lemma 6.12.

If $\mathbf{t}=t_{0}t_{1}t_{2}\ldots$ are the digits of the Thue-Morse sequence, then $t_{n}=t_{n+1}$ if and only if $n$ written in binary ends with a block of $1$ ’s with odd length.

Proof.

We use Proposition 2.2. $t_{n}=t_{n+1}$ if an only if $n$ and $n+1$ have the same number of “ $1$ ” bits modulo 2, when written in binary. This condition is equivalent to $n$ ending with a block of $1$ ’s of odd length when written in binary. ∎

Lemma 6.13.

Let $eq(n)=|\{i:0\leq i<2^{n}-1\text{ and }t_{i}=t_{i+1}\}|$ . Then

[TABLE]

Proof.

For a fixed $n$ , we count how many $n$ -bit numbers (except $2^{n}-1$ ) which ends with a block of $1$ ’s of odd length. We can fix the $n$ -bit number to end with a “[math]” followed by $2k-1$ “ $1$ ”s, for different values of $k$ , and then have $2^{n-2k}$ possibilities for the leading digits. This works as we do not wish to count $2^{n}-1$ , which is the unique $n$ -bit binary number with all “1”s.

If $n=2m$ is even $eq(n)=\sum_{k=1}^{m}2^{n-2k}=\frac{1}{3}(2^{n}-1)$ .

2.

If $n=2m+1$ is odd, then $eq(n)=\sum_{k=1}^{m}2^{n-2k}=\frac{1}{3}(2^{n}-2)$ . ∎

By Proposition 2.2 we see that

[TABLE]

By Lemma 6.13 we thus know that when constructing $CS(k+1)$ , exactly $eq(2^{k}-1)$ of the odd indexed blocks will already be equal. Hence exactly $(2^{2^{k}-1}-1)-eq(2^{k}-1)$ of the $(x_{i},y_{i+1})$ pairs will need to be recursively matched using $CS(k)$ . This leads to the following improved version of Lemma 4.4:

Lemma 6.14.

For every integer $k\geq 1$ ,

[TABLE]

*Remark**.*

From the above lemma, we can solve for $f(k)$ exactly. The first few values for $k\geq 0$ are:

[TABLE]

Corollary 6.15.

Let $w=\log_{2}(3)\approx 1.58$ . For every integer $k\geq 1$ , $f(k+1)\leq 2^{2^{k}}+2^{2^{k}-w}f(k)$ .

Proof.

If $k\geq 1$ , we have by the lemma

[TABLE]

∎

By a similar induction proof as in Lemma 4.5 we get a new upper bound on $f$ .

Theorem 6.16.

Let $w=\log_{2}(3)\approx 1.58$ . For every integer $k\geq 0$ , $f(k)\leq 2^{2^{k}-wk+3}-6$ .

Proof.

We proceed by induction on $k$ .

It is easy to verify that the inequality holds for $k\leq 2$ .

Now suppose the inductive assertion holds for $k=s\geq 2$ , that is $f(s)\leq 2^{2^{s}-ws+3}-6$ . Using Corollary 6.15 and the induction hypothesis we have

[TABLE]

since $2^{2^{s}}\geq 6$ when $s\geq 2$ . This concludes the induction proof. ∎

This means that the length of the common subsequence $CS(k)$ is

[TABLE]

This asymptotic behaviour propagate through the other generalizations, and we obtain a slightly better versions of Corollaries 5.8 and 5.11.

Theorem 6.17.

$a(n)=2^{n}(1-\mathcal{O}(\frac{1}{n^{w}}))$ * and $b(n)=n\left(1-\mathcal{O}\left(\frac{1}{(\log n)^{w}}\right)\right)$ where $w=\log_{2}(3)\approx 1.58$ .*

7 Acknowledgment

I thank Jeffrey Shallit for telling me about the problem.

Bibliography3

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Allouche et al. [1999] J.-P. Allouche, J. Shallit, The ubiquitous Prouhet-Thue-Morse sequence. Sequences and Their Applications: Proceedings of SETA ’98 , Springer-Verlag, 1999, pp. 1-16
2Jean Berstel [2006] Jean Berstel, Combinatorics on Words Examples and Problems. http://www-igm.univ-mlv.fr/~berstel/Exposes/2006-05-24Turku Cow.pdf (2006)
3[3] N. J. A. Sloane, Online Encyclopedia of Integer Sequences. http://oeis.org

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the longest common subsequence of Thue-Morse words

Abstract

keywords:

1 Introduction

Example 1.1**.**

2 Setup

Definition 2.1**.**

Definition 2.2**.**

Proposition 2.1**.**

Proof.

Definition 2.3**.**

Definition 2.4**.**

Example 2.1**.**

Remark*.*

Proposition 2.2**.**

Corollary 2.3**.**

Proof.

3 Construction of a common subsequence

Example 3.1**.**

Remark*.*

4 Analysis of length

Definition 4.1**.**

Remark*.*

Lemma 4.4**.**

Proof.

Lemma 4.5**.**

Proof.

Theorem 4.6**.**

5 Extension to all nnn

Theorem 5.7**.**

Corollary 5.8**.**

Lemma 5.9**.**

Proof.

Theorem 5.10**.**

Corollary 5.11**.**

6 Strengthening the analysis

Lemma 6.12**.**

Proof.

Lemma 6.13**.**

Proof.

Lemma 6.14**.**

Remark*.*

Corollary 6.15**.**

Proof.

Theorem 6.16**.**

Proof.

Theorem 6.17**.**

7 Acknowledgment

Example 1.1.

Definition 2.1.

Definition 2.2.

Proposition 2.1.

Definition 2.3.

Definition 2.4.

Example 2.1.

*Remark**.*

Proposition 2.2.

Corollary 2.3.

Example 3.1.

*Remark**.*

Definition 4.1.

*Remark**.*

Lemma 4.4.

Lemma 4.5.

Theorem 4.6.

5 Extension to all $n$

Theorem 5.7.

Corollary 5.8.

Lemma 5.9.

Theorem 5.10.

Corollary 5.11.

Lemma 6.12.

Lemma 6.13.

Lemma 6.14.

*Remark**.*

Corollary 6.15.

Theorem 6.16.

Theorem 6.17.