The non-tightness of the reconstruction threshold of a 4 states   symmetric model with different in-block and out-block mutations

Wenjian Liu; Ning Ning

arXiv:1906.09479·stat.ML·June 25, 2019

The non-tightness of the reconstruction threshold of a 4 states symmetric model with different in-block and out-block mutations

Wenjian Liu, Ning Ning

PDF

Open Access

TL;DR

This paper investigates the reconstruction problem in a 4-state symmetric stochastic block model with varying transition probabilities, establishing conditions under which the reconstruction threshold is not tight, revealing a complex phase where information is theoretically recoverable but computationally hard.

Contribution

It provides the first rigorous analysis of the non-tightness of the reconstruction threshold in a 4-state stochastic block model with asymmetric transition probabilities.

Findings

01

Identifies conditions for non-tightness of the reconstruction threshold.

02

Extends understanding of phase transitions in multi-state stochastic block models.

03

Highlights the complexity of the hybrid-hard phase in 4-state models.

Abstract

The tree reconstruction problem is to collect and analyze massive data at the $n$ th level of the tree, to identify whether there is non-vanishing information of the root, as $n$ goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block model, an "information-theoretically-solvable-but-computationally-hard" region, or say "hybrid-hard phase", appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Although it has been studied in numerous contexts, the existing literature with rigorous reconstruction thresholds established are very limited, and it becomes extremely challenging when the model under investigation has $4$ states (the stochastic block model with $4$ …

Equations251

n \to \infty lim sup d_{T V} (σ^{i} (n), σ^{j} (n)) > 0,

n \to \infty lim sup d_{T V} (σ^{i} (n), σ^{j} (n)) > 0,

\mathbf{M}=\frac{1}{2}\left(\begin{array}[]{cc}1+\theta&\;1-\theta\\ 1-\theta&\;1+\theta\\ \end{array}\right)+\frac{\Delta}{2}\left(\begin{array}[]{cc}-1&\;1\\ -1&\;1\\ \end{array}\right),\quad\quad|\theta|+|\Delta|\leq 1,

\mathbf{M}=\frac{1}{2}\left(\begin{array}[]{cc}1+\theta&\;1-\theta\\ 1-\theta&\;1+\theta\\ \end{array}\right)+\frac{\Delta}{2}\left(\begin{array}[]{cc}-1&\;1\\ -1&\;1\\ \end{array}\right),\quad\quad|\theta|+|\Delta|\leq 1,

\mathbf{M}=\left(\begin{array}[]{cccccccc}p_{0}&p_{1}&\cdots&p_{1}\\ p_{1}&p_{0}&\cdots&p_{1}\\ \vdots&\vdots&\ddots&\vdots\\ p_{1}&p_{1}&\cdots&p_{0}\end{array}\right)_{q\times q}.

\mathbf{M}=\left(\begin{array}[]{cccccccc}p_{0}&p_{1}&\cdots&p_{1}\\ p_{1}&p_{0}&\cdots&p_{1}\\ \vdots&\vdots&\ddots&\vdots\\ p_{1}&p_{1}&\cdots&p_{0}\end{array}\right)_{q\times q}.

M_{ij}=\left\{\begin{array}[]{ll}p_{0}&\quad\textrm{if}\ i=j,\\ p_{1}&\quad\textrm{if}\ i\neq j\ \textrm{and}\ i,j\ \textrm{are in the same category},\\ p_{2}&\quad\textrm{if}\ i\neq j\ \textrm{and}\ i,j\ \textrm{are in different categories}.\end{array}\right.

M_{ij}=\left\{\begin{array}[]{ll}p_{0}&\quad\textrm{if}\ i=j,\\ p_{1}&\quad\textrm{if}\ i\neq j\ \textrm{and}\ i,j\ \textrm{are in the same category},\\ p_{2}&\quad\textrm{if}\ i\neq j\ \textrm{and}\ i,j\ \textrm{are in different categories}.\end{array}\right.

\mathbf{P}=\left(\begin{array}[]{@{}cc|cc@{}}p_{0}&\;p_{1}&\;p_{1}^{\prime}&\;p_{1}^{\prime}\\ p_{1}&\;p_{0}&\;p_{1}^{\prime}&\;p_{1}^{\prime}\\ \hline\cr p_{1}&\;p_{1}&\;p_{0}^{\prime}&\;p_{1}^{\prime}\\ p_{1}&\;p_{1}&\;p_{1}^{\prime}&\;p_{0}^{\prime}\end{array}\right),

\mathbf{P}=\left(\begin{array}[]{@{}cc|cc@{}}p_{0}&\;p_{1}&\;p_{1}^{\prime}&\;p_{1}^{\prime}\\ p_{1}&\;p_{0}&\;p_{1}^{\prime}&\;p_{1}^{\prime}\\ \hline\cr p_{1}&\;p_{1}&\;p_{0}^{\prime}&\;p_{1}^{\prime}\\ p_{1}&\;p_{1}&\;p_{1}^{\prime}&\;p_{0}^{\prime}\end{array}\right),

\mathbf{P}=\left(\begin{array}[]{@{}cc|cc@{}}p_{0}&\;p_{1}&\;p_{2}&\;p_{2}\\ p_{1}&\;p_{0}&\;p_{2}&\;p_{2}\\ \hline\cr p_{2}&\;p_{2}&\;\overline{p}_{0}&\;\overline{p}_{1}\\ p_{2}&\;p_{2}&\;\overline{p}_{1}&\;\overline{p}_{0}\end{array}\right).

\mathbf{P}=\left(\begin{array}[]{@{}cc|cc@{}}p_{0}&\;p_{1}&\;p_{2}&\;p_{2}\\ p_{1}&\;p_{0}&\;p_{2}&\;p_{2}\\ \hline\cr p_{2}&\;p_{2}&\;\overline{p}_{0}&\;\overline{p}_{1}\\ p_{2}&\;p_{2}&\;\overline{p}_{1}&\;\overline{p}_{0}\end{array}\right).

n \to \infty lim x_{n} = n \to \infty lim \overline{x}_{n} = 0.

n \to \infty lim x_{n} = n \to \infty lim \overline{x}_{n} = 0.

\left\{\begin{array}[]{ll}\mathcal{X}_{n+1}=d\lambda_{1}^{2}\mathcal{X}_{n}+\frac{d(d-1)}{2}\left(-4\lambda_{1}^{4}\mathcal{X}_{n}^{2}+8\lambda_{1}^{2}\lambda_{2}^{2}\mathcal{X}_{n}\mathcal{Z}_{n}\right)+R_{x}+R_{z}+V_{x}\\ \\ \mathcal{Z}_{n+1}=d\lambda_{2}^{2}\mathcal{Z}_{n}+\frac{d(d-1)}{2}\left[\lambda_{1}^{4}\mathcal{X}_{n}^{2}-8\lambda_{2}^{4}\mathcal{Z}_{n}^{2}+\frac{1}{4}\lambda_{3}^{4}(\overline{x}_{n}-\overline{y}_{n})^{2}\right]-R_{z}+V_{z}.\end{array}\right.

\left\{\begin{array}[]{ll}\mathcal{X}_{n+1}=d\lambda_{1}^{2}\mathcal{X}_{n}+\frac{d(d-1)}{2}\left(-4\lambda_{1}^{4}\mathcal{X}_{n}^{2}+8\lambda_{1}^{2}\lambda_{2}^{2}\mathcal{X}_{n}\mathcal{Z}_{n}\right)+R_{x}+R_{z}+V_{x}\\ \\ \mathcal{Z}_{n+1}=d\lambda_{2}^{2}\mathcal{Z}_{n}+\frac{d(d-1)}{2}\left[\lambda_{1}^{4}\mathcal{X}_{n}^{2}-8\lambda_{2}^{4}\mathcal{Z}_{n}^{2}+\frac{1}{4}\lambda_{3}^{4}(\overline{x}_{n}-\overline{y}_{n})^{2}\right]-R_{z}+V_{z}.\end{array}\right.

f_{n} (i, A) = P (σ_{ρ} = i ∣ σ (n) = A) = P (σ_{u_{j}} = i ∣ σ_{j} (n + 1) = A),

f_{n} (i, A) = P (σ_{ρ} = i ∣ σ (n) = A) = P (σ_{u_{j}} = i ∣ σ_{j} (n + 1) = A),

X_{i} (n) = f_{n} (i, σ (n)), i = 1, 2, 3, 4.

X_{i} (n) = f_{n} (i, σ (n)), i = 1, 2, 3, 4.

X_{1} (n) + X_{2} (n) + X_{3} (n) + X_{4} (n) = 1.

X_{1} (n) + X_{2} (n) + X_{3} (n) + X_{4} (n) = 1.

π_{1} = π_{2} = π_{3} = π_{4} = \frac{1}{4},

π_{1} = π_{2} = π_{3} = π_{4} = \frac{1}{4},

E (X_{1} (n)) = E (X_{2} (n)) = E (X_{3} (n)) = E (X_{4} (n)) = \frac{1}{4} .

E (X_{1} (n)) = E (X_{2} (n)) = E (X_{3} (n)) = E (X_{4} (n)) = \frac{1}{4} .

f_{n} (i, σ^{j} (n)) = f_{n} (j, σ^{i} (n)), for i \neq = j, i, j \in {1, 2} or {3, 4},

f_{n} (i, σ^{j} (n)) = f_{n} (j, σ^{i} (n)), for i \neq = j, i, j \in {1, 2} or {3, 4},

f_{n} (1, σ^{3} (n)) = f_{n} (1, σ^{4} (n)) .

f_{n} (1, σ^{3} (n)) = f_{n} (1, σ^{4} (n)) .

Y_{ij} (n) = f_{n} (i, σ_{j}^{1} (n + 1)), for i = 1, 2, 3, 4, j = 1, \dots, d,

Y_{ij} (n) = f_{n} (i, σ_{j}^{1} (n + 1)), for i = 1, 2, 3, 4, j = 1, \dots, d,

Y_{1 j} (n) + Y_{2 j} (n) + Y_{3 j} (n) + Y_{4 j} (n) = 1.

Y_{1 j} (n) + Y_{2 j} (n) + Y_{3 j} (n) + Y_{4 j} (n) = 1.

x_{n} = E (f_{n} (1, σ^{1} (n)) - \frac{1}{4}), y_{n} = E (f_{n} (2, σ^{1} (n)) - \frac{1}{4}),

x_{n} = E (f_{n} (1, σ^{1} (n)) - \frac{1}{4}), y_{n} = E (f_{n} (2, σ^{1} (n)) - \frac{1}{4}),

z_{n} = E (f_{n} (1, σ^{3} (n)) - \frac{1}{4}), u_{n} = E (f_{n} (1, σ^{1} (n)) - \frac{1}{4})^{2},

z_{n} = E (f_{n} (1, σ^{3} (n)) - \frac{1}{4}), u_{n} = E (f_{n} (1, σ^{1} (n)) - \frac{1}{4})^{2},

v_{n} = E (f_{n} (2, σ^{1} (n)) - \frac{1}{4})^{2}, w_{n} = E (f_{n} (1, σ^{3} (n)) - \frac{1}{4})^{2},

v_{n} = E (f_{n} (2, σ^{1} (n)) - \frac{1}{4})^{2}, w_{n} = E (f_{n} (1, σ^{3} (n)) - \frac{1}{4})^{2},

\overline{x}_{n} = E (f_{n} (3, σ^{3} (n)) - \frac{1}{4}), \overline{y}_{n} = E (f_{n} (4, σ^{3} (n)) - \frac{1}{4}),

\overline{x}_{n} = E (f_{n} (3, σ^{3} (n)) - \frac{1}{4}), \overline{y}_{n} = E (f_{n} (4, σ^{3} (n)) - \frac{1}{4}),

\overline{z}_{n} = E (f_{n} (3, σ^{1} (n)) - \frac{1}{4}), \overline{u}_{n} = E (f_{n} (3, σ^{3} (n)) - \frac{1}{4})^{2},

\overline{z}_{n} = E (f_{n} (3, σ^{1} (n)) - \frac{1}{4}), \overline{u}_{n} = E (f_{n} (3, σ^{3} (n)) - \frac{1}{4})^{2},

\overline{v}_{n} = E (f_{n} (4, σ^{3} (n)) - \frac{1}{4})^{2}, \overline{w}_{n} = E (f_{n} (3, σ^{1} (n)) - \frac{1}{4})^{2} .

\overline{v}_{n} = E (f_{n} (4, σ^{3} (n)) - \frac{1}{4})^{2}, \overline{w}_{n} = E (f_{n} (3, σ^{1} (n)) - \frac{1}{4})^{2} .

E f_{n} (1, σ^{1} (n))

E f_{n} (1, σ^{1} (n))

x_{n} = 4 (E (X_{1} (n))^{2} - (\frac{1}{4})^{2}) = 4 E (X_{1} (n) - \frac{1}{4})^{2} .

x_{n} = 4 (E (X_{1} (n))^{2} - (\frac{1}{4})^{2}) = 4 E (X_{1} (n) - \frac{1}{4})^{2} .

x_{n}

x_{n}

z_{n} = 4 E (X_{1} (n) X_{3} (n)) - \frac{1}{4} = E (f_{n} (1, σ^{3} (n)) - \frac{1}{4}) = \overline{z}_{n},

z_{n} = 4 E (X_{1} (n) X_{3} (n)) - \frac{1}{4} = E (f_{n} (1, σ^{3} (n)) - \frac{1}{4}) = \overline{z}_{n},

y_{n} + \frac{1}{4} = A \sum f_{n} (2, A) P (σ (n) = A ∣ σ_{ρ} = 1) = 4 E (X_{1} (n) X_{2} (n)),

y_{n} + \frac{1}{4} = A \sum f_{n} (2, A) P (σ (n) = A ∣ σ_{ρ} = 1) = 4 E (X_{1} (n) X_{2} (n)),

y_{n} = 4 E (X_{1} (n) - \frac{1}{4}) (X_{2} (n) - \frac{1}{4}) .

y_{n} = 4 E (X_{1} (n) - \frac{1}{4}) (X_{2} (n) - \frac{1}{4}) .

[E (X_{1} (n) - \frac{1}{4}) (X_{2} (n) - \frac{1}{4})]^{2} \leq E (X_{1} (n) - \frac{1}{4})^{2} E (X_{2} (n) - \frac{1}{4})^{2},

[E (X_{1} (n) - \frac{1}{4}) (X_{2} (n) - \frac{1}{4})]^{2} \leq E (X_{1} (n) - \frac{1}{4})^{2} E (X_{2} (n) - \frac{1}{4})^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Topological and Geometric Data Analysis · Bayesian Methods and Mixture Models

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim \headersThe non-tightness of the reconstruction thresholdW. Liu and N. Ning

\externaldocumentex_supplement

The non-tightness of the reconstruction threshold of a $4$ states symmetric model with different in-block and out-block mutations

Wenjian Liu Dept.of Mathematics and Computer Science, Queensborough Community College, City University of New York (). [email protected]

Ning Ning Dept. of Applied Mathematics, University of Washington, Seattle (). [email protected]

Abstract

The tree reconstruction problem is to collect and analyze massive data at the $n$ th level of the tree, to identify whether there is non-vanishing information of the root, as $n$ goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block model, an “information-theoretically-solvable-but-computationally-hard” region, or say “hybrid-hard phase”, appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Although it has been studied in numerous contexts, the existing literature with rigorous reconstruction thresholds established are very limited, and it becomes extremely challenging when the model under investigation has $4$ states (the stochastic block model with $4$ communities). In this paper, inspired by the newly proposed $q_{1}+q_{2}$ stochastic block model, we study a $4$ states symmetric model with different in-block and out-block transition probabilities, and rigorously give the conditions for the non-tightness of the reconstruction threshold.

keywords:

Reconstruction, Markov random fields on trees, Deep generative hierarchical model, Unsupervised learning, Phase transition

{AMS}

60K35 62F15 82B20 68R01

1 Introduction

1.1 The tree reconstruction problem

The tree reconstruction problem, as an interdisciplinary subject, has been studied in numerous contexts including statistical physics, information theory, and computational biology. The reconstructability plays a crucial role in phylogenetic reconstruction in evolutionary biology (see, for instance, [18, 8]), communication theory in the study of noisy computation (see, for instance, [9]), analogous investigations in the realm of network tomography (see, for instance, [3]), reconstructability and distinguishability in the clustering problem of the stochastic block model (see, for instance, [21, 22, 23, 1, 6]), etc.

The tree reconstruction model has two building blocks, with one being an irreducible aperiodic Markov chain on a finite characters set $\mathcal{C}$ and the other one being a rooted $d$ -ary tree (every vertex having exactly $d$ offspring). The tree is denoted as $\mathbb{T}=(\mathbb{V},\mathbb{E},\rho)$ , where $\mathbb{V}$ stands for vertices, $\mathbb{E}$ stands for edges, and $\rho\in\mathbb{V}$ stands for the root. Denote $\sigma_{v}$ as the state assigned to vertex $v$ , and denote $\sigma_{\rho}$ specially for the state of the root $\rho$ that is chosen according to an initial distribution $\pi$ on $\mathcal{C}$ . The root signal propagates in the tree according to a transition matrix $\mathbf{M}$ which is also called noisy channel, in a way that for each vertex $v$ having $u$ as its parent, the spin/configuration at $v$ is assigned according to the probability $M_{ij}=\mathbf{P}(\sigma_{v}=j\mid\sigma_{u}=i)$ for $i,j\in\mathcal{C}$ .

The reconstruction problem on an infinite tree is to analyze that given the configurations realized at the $n$ th layer of the tree which is denoted as $\sigma(n)$ , whether there exists non-vanishing information on the letter transmitted by the root, as $n$ goes to infinity. Based on $\sigma^{i}(n)$ which is defined as $\sigma(n)$ conditioned on $\sigma_{\rho}=i$ , the following definition gives one mathematical formulation on reconstructibility:

Definition 1.1.

We say that a model is reconstructible on an infinite tree $\mathbb{T}$ , if for some $i,j\in\mathcal{C}$

[TABLE]

*where $d_{TV}$ is the total variation distance. When the $\limsup$ is [math], we say that the model is non-reconstructible on $\mathbb{T}$ . *

1.2 Existing results with states other than $4$

The reconstructibility is closely related to, the second largest eigenvalue by absolute value of the transition matrix $\mathbf{M}$ , denoted as $\lambda$ . It is well known that the reconstruction problem is solvable when $d\lambda^{2}>1$ which is the Kesten-Stigum bound ([10, 11]), however when $d\lambda^{2}<1$ the problem becomes much more challenging and its solvability highly depends on the channel.

The binary model with $2$ states corresponds to the Ising model in statistical physics, whose transition matrix is given by

[TABLE]

where $\Delta$ is used to describe the deviation from the symmetric channel, i.e. when $\Delta\neq 0$ the channel is asymmetric. For the binary symmetric channel, [4] showed that the reconstruction problem is solvable if and only if $d\lambda^{2}>1$ . For the binary asymmetric channel with sufficiently large asymmetry, [17, 19] showed that the Kesten-Stigum bound is not the bound for reconstruction. When the asymmetry is sufficiently small, [5] established the first tightness result of the Keston-Stigum reconstruction bound in roughly a decade, and later [15] gave a complete answer to the question on how small the asymmetry is necessary for the tightness of the reconstruction threshold.

For non-binary models, the simplest case is the $q$ -state symmetric channel which corresponds to the Potts model in statistical physics, with the following transition matrix

[TABLE]

[25] established the Kesten-Stigum bound for the $3$ -state Potts model on regular trees of large degree and showed that the Kesten-Stigum bound is not tight when $q\geq 5$ . Motivated by the K $80$ model ([12]) that is one of the most classical Markov DNA evolution models, [13] proposed the following model to distinguish between transitions and transversions, whose transition matrix has two mutation classes with $q$ states in each class

[TABLE]

When the number of states are more than or equal to $8$ , [13] showed that the Kesten-Stigum bound is not tight.

1.3 Existing results with $4$ states and the importance of non-tightness

Well known, the $2$ -state and $4$ -state cases give the most important reconstruction on the tree models, especially for the applications in phylogenetic reconstruction since they correspond to some of the most basic phylogenetic evolutionary models (see, for instance, the discussions in Section $2.5.1$ of [20]). However, the $4$ -state case is much more challenging and open until very few new results established recently. For the symmetric model with $4$ states, [24] showed that in the assortative (ferromagnetic) case the Kesten-Stigum bound is always tight, while in the disassortative (antiferromagnetic) case the Kesten-Stigum bound is tight in a large degree regime and not tight in a low degree regime. Later, [14] investigated a $4$ -state asymmetric model whose transition matrix is of the form

[TABLE]

and gave specific conditions under which the Kesten-Stigum bound is not tight.

The stochastic block model has wide applications in statistics, machine learning, and data mining, to name a few. The connection between the reconstruction on the tree problem and the clustering problem in the setting of the stochastic block model, has been well established in recent years (see, for instance, [21, 22, 23, 24]). Specifically, the technique used in handling balanced two clusters models is to transfer the problem of clustering to the reconstructability on trees. For the stochastic block model, an “information-theoretically-solvable-but-computationally-hard” region appears, whenever the Kesten-Stigum bound is not tight for the corresponding reconstruction on the tree problem. Further information can be seen in [24] under the name “hybrid-hard phases”.

1.4 Motivation and main result

While the reconstructability of the $4$ -state case of the model in equation (1) is still an open problem, in this paper we are able to give a rigorous answer to the reconstructible question of the $4$ -state case of a more complicated and generalized model. Inspired by the $q_{1}+q_{2}$ stochastic block model proposed in [24] (see Fig. $5$ therein for an illustration), we extend model in equation (1) to incorporate different in-block transition probabilities. That is, in this paper, we focus on a $4$ -state model with the transition matrix

[TABLE]

Besides different out-block transition probabilities ( $p_{2}$ ) characterized in [13], the model under investigation has different in-block transition probabilities ( $p_{0}$ and $p_{1}$ in one block, $\overline{p}_{0}$ and $\overline{p}_{1}$ in the other block).

It is easy to see that $\mathbf{P}$ has $4$ eigenvalues: $1$ , $\lambda_{1}=p_{0}-p_{1}$ , $\lambda_{2}=p_{0}+p_{1}-2p_{2}$ , and $\lambda_{3}=\overline{p}_{0}-\overline{p}_{1}$ . Let $\lambda$ be the second largest eigenvalue by absolute value. Considering that $d|\lambda|^{2}>1$ always implies reconstruction, we only investigate $d|\lambda|^{2}\leq 1$ in the following context. Our main result is the following theorem, whose rigorous proof is given in Section 5.

Main Theorem.

*If $|\lambda_{1}|\neq|\lambda_{3}|$ and $0<|\lambda_{2}|<\max\left\{|\lambda_{1}|,|\lambda_{3}|\right\}$ , the Kesten-Stigum bound is not tight for every $d$ , i.e. the reconstruction is solvable for some $\lambda$ even if $d\lambda^{2}<1$ . *

Since $\lambda_{1}$ and $\lambda_{3}$ play symmetric roles in this symmetric model (2), without loss of generality, we presume $|\lambda_{1}|>|\lambda_{3}|$ in the sequel.

1.5 Structure of the paper and proof sketch

The technique used here was initiated in [7] in the context of spin glasses. In Section 2, we give detailed definitions and interpretations, conduct preliminary analyses, and then provide an equivalent condition for non-reconstruction:

[TABLE]

Here, $x_{n}$ and $\overline{x}_{n}$ represent the probabilities of giving a correct guess of the root given the spins $\sigma(n)$ at distance $n$ from the root minus the probability of guessing the root randomly which is $1/4$ in this case, for the root being in block $1$ and block $2$ respectively. Nonreconstruction means that the mutual information between the root and the spins at distance $n$ goes to [math] as $n$ tends to infinity, therefore one standard to classify reconstruction and nonreconstruction is to analyze the quantity $x_{n}$ while in this paper we also need to consider the limiting behavior of $\overline{x}_{n}$ .

In Section 3, after in-depth investigation of the recursive relationship, we develop a two dimensional dynamical system of the linear diagonal canonical form regarding quantities $x_{n+1}$ and $\overline{z}_{n+1}$ through two new variables $\mathcal{X}_{n}=x_{n}+\overline{z}_{n}$ and $\mathcal{Z}_{n}=-\overline{z}_{n}$ :

[TABLE]

Here, $\overline{z}_{n}$ represents the opposite case of $x_{n}$ as giving a wrong guess in another block. By symmetry, we can also obtain the dynamical system involving $\overline{x}_{n}$ simply through replacing $\lambda_{1}$ by $\lambda_{3}$ . In Section 4, we show that $R_{x}$ , $R_{z}$ , $V_{x}$ , and $V_{z}$ are just small perturbations in the above dynamical system in order to study its stability, ensure that the decrease from $x_{n}$ to $x_{n+1}$ is never too large to lose construction, and establish crucial concentration results, by fully taking advantage of the Markov random field property and the symmetries in the probability transition matrix and the network structure. In Section 5, by means of the method of reductio ad absurdum, we show that $x_{n}$ and $\overline{x}_{n}$ can not simultaneously converge to zero as $n$ goes to $\infty$ , and then establish the nontightness of Kesten-Stigum bound.

2 Preparation

2.1 Notations

Let $u_{1},\ldots,u_{d}$ be the children of the root $\rho$ and $\mathbb{T}_{v}$ be the subtree of descendants of $v\in\mathbb{V}$ . Denote the $n$ th level of the tree by $L_{n}=\{v\in\mathbb{V}:d(\rho,v)=n\}$ with $d(\cdot,\cdot)$ being the graph distance on $\mathbb{T}$ . Denote $\sigma(n)$ as the spins on $L_{n}$ , $\sigma^{i}(n)$ as $\sigma(n)$ conditioned on $\sigma_{\rho}=i$ , and $\sigma_{j}(n)$ as the spins on $L_{n}\cap\mathbb{T}_{u_{j}}$ where $u_{j}$ is one of the children of the root $\rho$ . For the notations involving $\sigma(n)$ in the sequel, we consistently use superscript to denote the conditional on a specific configuration of the root, and use the subscript to denote the conditional on a specific offspring of the root.

For a configuration $A$ on the spins of $L_{n}$ , define the posterior function by

[TABLE]

for $i=1,2,3,4$ and $j=1,\cdots,d$ , where the second equality holds by the recursive nature of the tree. Define $X_{i}(n)$ as the posterior probability that the root $\rho$ is taking the configuration $i$ given the random configuration $\sigma(n)$ on the spins in $L_{n}$ , i.e.,

[TABLE]

Apparently one has

[TABLE]

By the block characteristic of the model, we know that regarding the first (resp. second) block, $X_{1}(n)$ and $X_{2}(n)$ (resp. $X_{3}(n)$ and $X_{4}(n)$ ) have the same distribution. Considering that the stationary distribution $\pi=(\pi_{1},\pi_{2},\pi_{3},\pi_{4})$ of $\mathbf{P}$ is given by

[TABLE]

we further have

[TABLE]

From the symmetry and the block characteristic of the model, we know that

[TABLE]

and

[TABLE]

Define $Y_{ij}(n)$ as the posterior probability that $\sigma_{u_{j}}=i$ given the random configuration $\sigma^{1}_{j}(n+1)$ on spins in $L(n+1)\cap\mathbb{T}_{u_{j}}$ , i.e.,

[TABLE]

where the random variables $\{Y_{ij}(n)\}$ are independent and identically distributed and satisfy

[TABLE]

We define the following moment variables to analyze the differences between different inferences of $\sigma_{\rho}$ given the spins $\sigma(n)$ at distance $n$ from the root $\rho$ and the probability of guessing the root randomly:

[TABLE]

2.2 Preliminary analyses

We firstly establish some important lemmas which will be used frequently in the sequel.

Lemma 2.1.

For any $n\in\mathbb{N}\cup\{0\}$ , we have

(a)

$\displaystyle x_{n}=4\mathbf{E}\left(X_{1}(n)-\frac{1}{4}\right)^{2}=u_{n}+v_{n}+2w_{n}\geq 0$ . 2. (b)

$\displaystyle-\frac{x_{n}+y_{n}}{2}=z_{n}=\overline{z}_{n}=-\frac{\overline{x}_{n}+\overline{y}_{n}}{2}\leq 0$ . 3. (c)

$\displaystyle x_{n}+z_{n}\geq 0,\quad\overline{x}_{n}+z_{n}\geq 0$ .

Proof 2.2.

(a)

By the law of total probability and Bayes’ theorem, we have

[TABLE]

Recall that $x_{n}$ is defined as $x_{n}=\mathbf{E}\left(f_{n}(1,\sigma^{1}(n))-\frac{1}{4}\right)$ , and then by the fact that $\mathbf{E}(X_{1}(n))=\frac{1}{4}$ we have

[TABLE]

Furthermore, by the law of total expectation, we have

[TABLE] 2. (b)

Similarly, we have

[TABLE]

and then

[TABLE]

It follows from the Cauchy-Schwarz inequality that

[TABLE]

which implies

[TABLE]

By the definitions of $x_{n}$ , $y_{n}$ and $z_{n}$ , we know that $z_{n}=-\frac{x_{n}+y_{n}}{2}$ , and thus equation (5) implies $z_{n}\leq 0$ . 3. (c)

An analogous proof of

[TABLE]

can be easily carried out.

Lemma 2.3.

For any $n\in\mathbb{N}\cup\{0\}$ , we have

(a)

$\displaystyle\mathbf{E}\left(f_{n}(1,\sigma^{1}(n))-\frac{1}{4}\right)\left(f_{n}(2,\sigma^{1}(n))-\frac{1}{4}\right)=\frac{1}{4}y_{n}+\left(v_{n}-\frac{1}{4}x_{n}\right)$ . 2. (b)

$\!\begin{aligned} &\displaystyle\mathbf{E}\left(f_{n}(1,\sigma^{1}(n))-\frac{1}{4}\right)\left(f_{n}(3,\sigma^{1}(n))-\frac{1}{4}\right)\\ =&\frac{1}{4}z_{n}-\frac{1}{2}\left(u_{n}-\frac{1}{4}x_{n}\right)-\frac{1}{2}\left(v_{n}-\frac{1}{4}x_{n}\right).\end{aligned}$ ** 3. (c)

$\displaystyle\mathbf{E}\left(f_{n}(2,\sigma^{1}(n))-\frac{1}{4}\right)\left(f_{n}(3,\sigma^{1}(n))-\frac{1}{4}\right)=\frac{1}{4}z_{n}-\left(v_{n}-\frac{1}{4}x_{n}\right)$ . 4. (d)

$\!\begin{aligned} &\displaystyle\mathbf{E}\left(f_{n}(3,\sigma^{1}(n))-\frac{1}{4}\right)\left(f_{n}(4,\sigma^{1}(n))-\frac{1}{4}\right)\\ =&\frac{1}{4}\overline{y}_{n}+\frac{1}{2}\left(u_{n}-\frac{1}{4}x_{n}\right)+\frac{3}{2}\left(v_{n}-\frac{1}{4}x_{n}\right)-\left(\overline{w}_{n}-\frac{1}{4}\overline{x}_{n}\right).\end{aligned}$ ** 5. (e)

$\displaystyle\mathbf{E}\left(f_{n}(1,\sigma^{3}(n))-\frac{1}{4}\right)\left(f_{n}(2,\sigma^{3}(n))-\frac{1}{4}\right)=\frac{1}{4}y_{n}-\left(v_{n}-\frac{1}{4}x_{n}\right)$ .

Proof 2.4.

We only prove (a) and (b) and the others can be shown analogously.

(a)

By the law of total probability, one has

[TABLE]

therefore

[TABLE] 2. (b)

By the fact that $f_{n}(3,\sigma^{1}(n))$ and $f_{n}(4,\sigma^{1}(n))$ have the same distribution, and the equation that

[TABLE]

plugging in the result of (a), we can obtain that

[TABLE]

*as desired. *

Recall that $Y_{ij}(n)$ is defined as the posterior probability that $\sigma_{u_{j}}=i$ given the random configuration $\sigma^{1}_{j}(n+1)$ on spins in $L(n+1)\cap\mathbb{T}_{u_{j}}$ , i.e., $Y_{ij}(n)=f_{n}(i,\sigma_{j}^{1}(n+1))$ , for $i\in\{1,2,3,4\}$ and $j\in\{1,\cdots,d\}$ . The random vectors $(Y_{ij}(n))_{i=1}^{4}$ are independent by the symmetry of the model, and its central moments are investigated in the following lemma.

Lemma 2.5.

For each $1\leq j\leq d$ , we have

(a)

$\!\begin{aligned} \mathbf{E}\left(Y_{1j}(n)-\frac{1}{4}\right)=\lambda_{1}x_{n}+(\lambda_{1}-\lambda_{2})z_{n}\end{aligned}$ . 2. (b)

$\!\begin{aligned} \mathbf{E}\left(Y_{2j}(n)-\frac{1}{4}\right)=-\lambda_{1}x_{n}-(\lambda_{1}+\lambda_{2})z_{n}\end{aligned}$ . 3. (c)

$\!\begin{aligned} \mathbf{E}\left(Y_{ij}(n)-\frac{1}{4}\right)=\lambda_{2}z_{n},\quad i=3,4.\end{aligned}$ ** 4. (d)

$\!\begin{aligned} \mathbf{E}\left(Y_{1j}(n)-\frac{1}{4}\right)^{2}=\frac{1}{4}x_{n}+\lambda_{1}\left(u_{n}-\frac{1}{4}x_{n}\right)+(\lambda_{1}-\lambda_{2})\left(w_{n}-\frac{1}{4}x_{n}\right).\end{aligned}$ ** 5. (e)

$\!\begin{aligned} \mathbf{E}\left(Y_{2j}(n)-\frac{1}{4}\right)^{2}=\frac{1}{4}x_{n}-\lambda_{1}\left(u_{n}-\frac{1}{4}x_{n}\right)-(\lambda_{1}+\lambda_{2})\left(w_{n}-\frac{1}{4}x_{n}\right).\end{aligned}$ ** 6. (f)

$\!\begin{aligned} \mathbf{E}\left(Y_{ij}(n)-\frac{1}{4}\right)^{2}=\frac{1}{4}\overline{x}_{n}+\lambda_{2}\left(\overline{w}_{n}-\frac{1}{4}\overline{x}_{n}\right),\quad i=3,4.\end{aligned}$ ** 7. (g)

$\!\begin{aligned} \mathbf{E}\left(Y_{1j}(n)-\frac{1}{4}\right)\left(Y_{2j}(n)-\frac{1}{4}\right)=\frac{1}{4}y_{n}+\lambda_{2}\left(v_{n}-\frac{1}{4}x_{n}\right).\end{aligned}$ ** 8. (h)

$\!\begin{aligned} &\mathbf{E}\left(Y_{1j}(n)-\frac{1}{4}\right)\left(Y_{ij}(n)-\frac{1}{4}\right)\\ =&\frac{z_{n}}{4}+\frac{\lambda_{1}-\lambda_{2}}{2}\left(v_{n}-\frac{1}{4}x_{n}\right)+\frac{\lambda_{1}+\lambda_{2}}{2}\left(w_{n}-\frac{1}{4}x_{n}\right),\quad i=3,4.\end{aligned}$ ** 9. (i)

$\!\begin{aligned} &\mathbf{E}\left(Y_{2j}(n)-\frac{1}{4}\right)\left(Y_{ij}(n)-\frac{1}{4}\right)\\ =&\frac{z_{n}}{4}-\frac{\lambda_{1}+\lambda_{2}}{2}\left(v_{n}-\frac{1}{4}x_{n}\right)-\frac{\lambda_{1}-\lambda_{2}}{2}\left(w_{n}-\frac{1}{4}x_{n}\right),\quad i=3,4.\end{aligned}$ ** 10. (j)

$\!\begin{aligned} \mathbf{E}\left(Y_{3j}(n)-\frac{1}{4}\right)\left(Y_{4j}(n)-\frac{1}{4}\right)=&\frac{1}{4}\overline{y}_{n}-\lambda_{2}\left(\overline{v}_{n}-\frac{1}{4}\overline{x}_{n}\right).\end{aligned}$ **

Proof 2.6.

We only prove (a), (b), and (c) and the others can be shown analogously.

(a)

Conditioning on $\sigma_{u_{j}}=i$ for $i\in\{1,2,3,4\}$ , we have

[TABLE] 2. (b)

Similar, we can obtain

[TABLE] 3. (c)

It follows immediately from the identity $\sum_{i=1}^{4}Y_{ij}(n)=1$ that, for $i=3,4,$

[TABLE]

2.3 An equivalent condition for non-reconstruction

If the reconstruction problem is solvable, $\sigma(n)$ contains significant information of the root variable. This can be expressed in several equivalent ways (see [17, 19]).

Lemma 2.7.

The non-reconstruction is equivalent to

[TABLE]

3 Recursive formulas

3.1 Distributional recursion

Consider $A$ as a configuration on $L(n+1)$ , and let $A_{j}(j=1,\cdots,d)$ be its restriction to $\mathbb{T}_{u_{j}}\bigcap L(n+1)$ where $u_{j}$ is the $j$ th child of the root $\rho$ . Then from the Markov random field property, we have

[TABLE]

where $N_{k}(n)$ is given by

[TABLE]

Recall that $Y_{ij}(n)=f_{n}(i,\sigma_{j}^{1}(n+1))$ . Setting $A=\sigma^{1}(n+1)$ , we have

[TABLE]

where

[TABLE]

i.e., $Z_{i}(n)=\frac{N_{i}(n)}{\prod_{j=1}^{d}\mathbf{P}(\sigma_{j}(n+1)=A_{j})}.$

Lemma 3.1.

For any nonnegative $n\in\mathbb{Z}^{+}$ , we have

[TABLE]

Proof 3.2.

For any configuration $A=(A_{1},\ldots,A_{d})$ with $A_{j}$ denoting the spins on $L_{n+1}\cap\mathbb{T}_{u_{j}}$ , we have

[TABLE]

By the symmetry of the tree, we have

[TABLE]

*as desired. *

By Lemma 2.5, the means and variances of monomials of $Z_{i}(n)$ can be approximated as follows:

Lemma 3.3.

One has

(i)

$\!\begin{aligned} \mathbf{E}Z_{1}(n)=&1+d\lambda_{1}^{2}4(x_{n}+z_{n})-d\lambda_{2}^{2}4z_{n}\\ &+\frac{d(d-1)}{2}\left[4\lambda_{1}^{2}(x_{n}+z_{n})-4\lambda_{2}^{2}z_{n}\right]^{2}+O(x_{n}^{3}).\end{aligned}$ ** 2. (ii)

$\!\begin{aligned} \mathbf{E}Z_{2}(n)=&1-d\lambda_{1}^{2}4(x_{n}+z_{n})-d\lambda_{2}^{2}4z_{n}\\ &+\frac{d(d-1)}{2}\left[4\lambda_{1}^{2}(x_{n}+z_{n})+4\lambda_{2}^{2}z_{n}\right]^{2}+O(x_{n}^{3}).\end{aligned}$ ** 3. (iii)

$\!\begin{aligned} \mathbf{E}Z_{i}(n)=1+d\lambda_{2}^{2}4z_{n}+\frac{d(d-1)}{2}\left(4\lambda_{2}^{2}z_{n}\right)^{2}+O(x_{n}^{3}),\quad i=3,4.\end{aligned}$ ** 4. (iv)

$\!\begin{aligned} \mathbf{E}Z_{1}^{2}(n)=1+d\Pi_{1}+\frac{d(d-1)}{2}\Pi_{1}^{2}+O(x_{n}^{3}),\end{aligned}$ * where*

[TABLE] 5. (v)

$\!\begin{aligned} \mathbf{E}Z_{2}^{2}(n)=\mathbf{E}Z_{1}(n)Z_{2}(n)=1+d\Pi_{2}+\frac{d(d-1)}{2}\Pi_{2}^{2}+O(x_{n}^{3}),\end{aligned}$ * where*

[TABLE] 6. (vi)

$\!\begin{aligned} \mathbf{E}Z_{i}^{2}(n)=1+d\Pi_{3}+\frac{d(d-1)}{2}\Pi_{3}^{2}+O(x_{n}^{3}),\end{aligned}$ * for $i=3,4$ , where*

[TABLE] 7. (vii)

$\!\begin{aligned} \mathbf{E}Z_{1}(n)Z_{i}(n)=1+d\Pi_{4}+\frac{d(d-1)}{2}\Pi_{4}^{2}+O(x_{n}^{3}),\end{aligned}$ * for $i=3,4$ , where*

[TABLE] 8. (viii)

$\!\begin{aligned} \mathbf{E}Z_{2}(n)Z_{i}(n)=1+d\Pi_{5}+\frac{d(d-1)}{2}\Pi_{5}^{2}+O(x_{n}^{3})\end{aligned}$ , for $i=3,4$ , where

[TABLE] 9. (ix)

$\!\begin{aligned} \mathbf{E}Z_{3}(n)Z_{4}(n)=1+d\Pi_{6}+\frac{d(d-1)}{2}\Pi_{6}^{2}+O(x_{n}^{3}),\end{aligned}$ * where*

[TABLE]

3.2 Main expansions of $x_{n+1}$ and $\overline{z}_{n+1}$

In this section, we investigate the second order recursive relations associated with $x_{n+1}$ and $\overline{z}_{n+1}$ , with the assistance of the following identity

[TABLE]

Plugging $a=Z_{1}(n)$ , $r=Z_{1}(n)+Z_{2}(n)+Z_{3}(n)+Z_{4}(n)-1$ , and $s=1$ into equation (8), by the definition of $x_{n}$ and equation (7), we have

[TABLE]

Next, plugging $a=Z_{3}(n)$ , $r=Z_{1}(n)+Z_{2}(n)+Z_{3}(n)+Z_{4}(n)-1$ , and $s=1$ in equation (8), by the definition of $\overline{z}_{n}$ and an analogous derivation as equation (7), we can obtain

[TABLE]

Finally, plugging the results of Section 3.1 into equation (LABEL:xexpansion) and equation (LABEL:zexpansion), and then taking substitutions of

[TABLE]

we obtain a two-dimensional recursive formula of the linear diagonal canonical form:

[TABLE]

where

[TABLE]

where $C_{V}$ is an absolute constant.

4 Concentration analysis

In order to study the stability of the dynamical system (11), we show that $R_{x}$ , $R_{z}$ , $V_{x}$ , and $V_{z}$ are just small perturbations, in the following two lemmas. The proof of Lemma 4.1 resembles that of Lemma $9$ in [14] and is skipped for conciseness.

Lemma 4.1.

Assume $|\lambda_{2}|\geq\varrho>0$ and $|\lambda_{1}|/|\lambda_{2}|\geq\kappa$ for some $\kappa>1$ . For any $\varepsilon>0$ , there exist $N=N(\kappa,\varepsilon)$ and $\delta=\delta(\kappa,\varrho,\varepsilon)>0$ , such that if $n\geq N$ and $\overline{x}_{n}\leq x_{n}\leq\delta$ , then

[TABLE]

The following lemma improves the result of Lemma 2.1 (c) by establishing the strict positivity of the sum of $x_{n}$ and $z_{n}$ .

Lemma 4.2.

Assume $\lambda_{1}\neq 0$ . For any nonnegative $n\in\mathbb{Z}$ , we always have

[TABLE]

Proof 4.3.

In Lemma 2.1 we proved that $x_{n}+z_{n}\geq 0$ , so it suffices to exclude the equality. Now let us apply reductio ad absurdum and assume $x_{n}+z_{n}=0$ for some $n\in\mathbb{N}$ . Similar to the derivation in Lemma 2.1 (a) and (b), one can obtain that

[TABLE]

For any configuration set $A$ on the $n$ th level, we always have

[TABLE]

Denote the leftmost vertex on the $n$ th level by $v_{n}(1)$ , and it follows that

[TABLE]

Define the transition matrices at distance $s$ by $U_{s}=M_{1,1}^{s}$ , $V_{s}=M_{1,2}^{s}$ , and $W_{s}=M_{1,3}^{s}$ , and then we have the following recursive system

[TABLE]

The difference of the above two equations evolves as

[TABLE]

and then considering that $U_{0}=1$ and $V_{0}=W_{0}=0$ , we have

[TABLE]

Finally, from the reversible property of the channel, we can conclude that

[TABLE]

*i.e., $\lambda_{1}=0$ , a contradiction to the assumption that $\lambda_{1}\neq 0$ . *

The following lemma ensures that $x_{n}$ does not drop too fast.

Lemma 4.4.

Suppose that there exists an integer $N>0$ , such that $x_{n}\geq\overline{x}_{n}$ when $n\geq N$ . For any $\varrho>0$ , if $\min\{|\lambda_{1}|,|\lambda_{2}|\}\geq\varrho$ , then there exists a constant $\gamma=\gamma(\varrho,N)>0$ such that

[TABLE]

Proof 4.5.

Different to the definition of $Y_{ij}(n)=f_{n}(i,\sigma_{j}^{1}(n+1))$ which is the posterior probability that $\sigma_{u_{j}}$ takes value $i$ given the random configuration $\sigma^{1}_{j}(n+1)$ on spins in $\mathbb{T}_{u_{j}}\cap L(n+1)$ , we consider a configuration set $A$ on $\mathbb{T}_{u_{1}}\cap L(n+1)$ and define the posterior function $g_{n+1}(1,A)$ as

[TABLE]

Setting $A=\sigma_{1}^{1}(n+1)$ , by Lemma 2.5, we have

[TABLE]

Apparently, we have the following inequalities (see [16]), regarding the estimator $g_{n+1}(1,\sigma_{1}^{1}(n+1))$ and the maximum-likelihood estimator:

[TABLE]

where the last inequality follows from the condition that $\overline{x}_{n+1}\leq x_{n+1}$ . Therefore,

[TABLE]

If $\lambda_{1}^{2}\geq\lambda_{2}^{2}$ , then it is concluded from $x_{n}\geq-z_{n}\geq 0$ in Lemma 2.1 that

[TABLE]

If $\lambda_{1}^{2}\leq\lambda_{2}^{2}$ , then $\lambda_{1}^{2}x_{n}\leq x_{n+1}^{1/2}$ , since $z_{n}\leq 0$ . To sum up, we always have

[TABLE]

Under the condition that $x_{n+1}\geq\overline{x}_{n+1}$ , it can be concluded from the dynamical system (11), Lemma 4.1, and the following inequalities achieved in Lemma 2.1

[TABLE]

that there exists a $\delta=\delta(q,\varepsilon)>0$ such that when $x_{n}<\delta$ one has

[TABLE]

Under the condition that $\min\{|\lambda_{1}|,|\lambda_{2}|\}\geq\varrho$ for any $\varrho>0$ , set $\varepsilon=\varrho^{2}$ and then we further obtain

[TABLE]

On the other hand, if $x_{n}\geq\delta$ , by equation (14), one has

[TABLE]

Finally, by Lemma 4.2, it follows that $x_{n}\geq x_{n}+z_{n}>0$ , and thus $\frac{x_{n+1}}{x_{n}}>0$ for all $n$ . Therefore, taking

[TABLE]

*completes the proof. *

The following lemma provides the crucial concentration estimates of $u_{n}-\frac{x_{n}}{4}$ and $w_{n}-\frac{x_{n}}{4}$ , when $x_{n}$ is small.

Lemma 4.6.

Assume $|\lambda_{2}|\geq\varrho>0$ and $|\lambda_{1}|/|\lambda_{2}|\geq\kappa$ for some $\kappa>1$ . For any $\varepsilon>0$ , there exist $N=N(\kappa,\varepsilon)$ and $\delta=\delta(\kappa,\varrho,\varepsilon)>0$ , such that if $n\geq N$ and $\overline{x}_{n}\leq x_{n}\leq\delta$ , one has

[TABLE]

As a result, we have the estimates

[TABLE]

Proof 4.7.

It follows from 2.3 (d) and (e) that

[TABLE]

and

[TABLE]

Then by Lemma 2.1 (a) we have

[TABLE]

By the definitions of $v_{n}$ , $w_{n}$ , $\overline{v}_{n}$ , and $\overline{w}_{n}$ , and by symmetry, it follows that

[TABLE]

Plugging $a=\left(Z_{1}(n)-\frac{1}{4}\sum_{i=1}^{4}Z_{i}(n)\right)^{2}$ , $r=\left(\left(\sum_{i=1}^{4}Z_{i}(n)\right)^{2}-16\right)$ , and $s=\frac{1}{16}$ into equation (8), we have

[TABLE]

The first expectation of equation (18) will contribute to the major terms of the expansion:

[TABLE]

where Lemma 3.3 is used in the last equity and the following derivations. Similarly, we can bound both the second and third terms of equation (18) by $O(x_{n}^{2})$ :

[TABLE]

and

[TABLE]

Considering that $\mathcal{X}_{n}=x_{n}+\overline{z}_{n}$ and $\mathcal{Z}_{n}=-\overline{z}_{n}$ , the dynamical system (11) yields that

[TABLE]

Equation (18) gives

[TABLE]

and then

[TABLE]

Next display the discussion in the $\mathcal{X}O\mathcal{Z}$ plane. First consider the case that $|\lambda_{1}|/|\lambda_{2}|\geq\kappa$ for $\kappa>1$ . In a small neighborhood of $(0,0)$ , since $d\lambda_{2}^{2}<\kappa^{2}d|\lambda_{2}^{2}|\leq d\lambda_{1}^{2}<1$ and $\mathcal{X}_{n}>0$ , the discrete trajectory approaches the origin point in a way that is “tangential” to the $\mathcal{X}$ -axis, when $x_{n}$ is small enough (see [2]). Furthermore, the conclusion of Lemma 4.2 excludes the possibility that the trajectory moves along the $\mathcal{Z}$ -axis. Then for some $M>1$ , there exist constants $N_{1}=N_{1}(\kappa,M)$ and $\delta_{1}=\delta_{1}(\kappa,M)$ , such that if $n\geq N_{1}$ and $x_{n}\leq\delta_{1}$ , we have

[TABLE]

where the remainder term $O(x_{n}^{2})$ comes from the expansion of $x_{n+1}$ . Consequently, it follows

[TABLE]

and by the fact that $z_{n}\leq 0$ then

[TABLE]

For fixed $k$ , by the fact that $\frac{1}{4}\lambda_{3}^{4}(\overline{x}_{n}-\overline{y}_{n})^{2}$ can be bounded by $O(x_{n}^{2})$ for the reason that $|\overline{x}_{n}|>|\overline{y}_{n}|$ implied in Lemma 2.1 (b) and (c), it is known from the dynamical system (11) that

[TABLE]

Furthermore, one has

[TABLE]

and then there exists $\delta_{2}=\delta_{2}(\kappa,M,k)<\delta_{1}$ , such that if $x_{n}<\delta_{2}$ then for any $1\leq\ell\leq k$ one has $x_{n+\ell}<2\delta_{2}$ . Therefore, for any positive integer $k$ , equation (20) yields

[TABLE]

where, by equation (20) and with $C$ denoting the $O$ constant therein,

[TABLE]

and by equation (21)

[TABLE]

Firstly, from Lemma 2.1 (a) one has $0\leq\frac{u_{n}}{x_{n}}\leq 1$ , which implies that $\left|\frac{u_{n}}{x_{n}}-\frac{1}{4}\right|<1$ . Secondly, by the fact that $|\lambda_{2}|\leq|\lambda_{1}|\leq d^{-1/2}\leq 1/\sqrt{2}$ , it is possible to achieve $\frac{M}{M-1}|\lambda_{2}|<1$ by choosing $M=4$ . Therefore, we can conclude that it is feasible to take $k=k(\varepsilon)$ sufficiently large and $\delta_{3}=\delta_{3}(\kappa,k,\varepsilon)=\delta_{3}(\kappa,\varepsilon)<\delta_{2}$ sufficiently small to guarantee that

[TABLE]

Finally, under the condition that $|\lambda_{2}|\geq\varrho>0$ , by Lemma 4.4, we know that there exists $\gamma=\gamma(\varrho)$ such that $x_{n-k}\leq\gamma^{-k}x_{n}$ . Thus, we can choose $N=N(\kappa,\varepsilon,k)=N(\kappa,\varepsilon)>N_{1}+k$ and $\delta=\gamma^{k}\delta_{3}$ , such that if $x_{n}\leq\delta$ and $n\geq N$ then

[TABLE]

*The second part of the lemma can be shown similarly as above. *

5 Proof of the Main Theorem

First, consider $\varrho\leq|\lambda_{2}|\leq|\lambda_{1}|$ for any fixed $\varrho>0$ . To investigate the non-tightness, it would be convenient to assume that $1>d\lambda_{1}^{2}\geq d\lambda_{2}^{2}\geq\frac{1}{2}$ , say, $|\lambda_{1}|\geq\frac{1}{\sqrt{2d}}$ . We take $\varrho=\frac{1}{\sqrt{2d}}$ in the following context. Consider $|\lambda_{2}|>\varrho$ fixed and just $\lambda_{1}$ varying, and without loss of generality, assume $d\lambda_{1}^{2}>\frac{1+d\lambda_{2}^{2}}{2}$ . Consequently choose $\kappa=\kappa(d,\lambda_{2})=\left(\frac{1+d\lambda_{2}^{2}}{2d\lambda_{2}^{2}}\right)^{1/2}>1$ and thus $|\lambda_{1}|/|\lambda_{2}|\geq\kappa$ .

By the definition of non-reconstruction in equation (2.7), it suffices to show that when $d\lambda_{1}^{2}$ is close enough to $1$ , $\mathcal{X}_{n}$ does not converge to [math] for the reason that it implies that $x_{n}$ does not converge to [math] considering $0\leq\mathcal{X}_{n}=x_{n}+z_{n}\leq x_{n}$ . We apply reductio ad absurdum, by assuming that

[TABLE]

Therefore, there exists $\mathcal{N}_{1}=\mathcal{N}_{1}(d)$ , such that whenever $n>\mathcal{N}_{1}$ , we have $x_{n}\leq\delta$ . Next, recalling that $\mathcal{X}_{n}=x_{n}+\overline{z}_{n}$ , we further define $\overline{\mathcal{X}}_{n}=\overline{x}_{n}+\overline{z}_{n}$ . Then by the symmetry of the model, we can obtain the dynamical form for $\overline{\mathcal{X}}_{n}$ analogously as the dynamical form for $\mathcal{X}_{n}$ in equation (11) :

[TABLE]

where $R_{\overline{x}}$ and $V_{\overline{x}}$ are counterparts of $R_{x}$ and $V_{x}$ simply by replacing $x$ by $\overline{x}$ .

Then we display the discussion in the $\mathcal{X}O\overline{\mathcal{X}}$ plane. Since $|\lambda_{1}|>|\lambda_{3}|$ and $\mathcal{X}_{n},\overline{\mathcal{X}}_{n}\to 0$ as $n\to\infty$ from equation (23), in a small neighborhood of $(0,0)$ , the discrete trajectory approaches the origin point in a way that is “tangential” to the $\mathcal{X}$ -axis. Furthermore, the conclusion of Lemma 4.2 excludes the possibility that the trajectory moves along the $\overline{\mathcal{X}}$ -axis. Therefore, it implies that there exists $\mathcal{N}=\mathcal{N}(d)>\mathcal{N}_{1}$ , such that whenever $n>\mathcal{N}$ ,

[TABLE]

From the proof of Lemma 4.6, we know that in the $\mathcal{X}O\mathcal{Z}$ plane there exist $N=N(\kappa,\varrho)>\mathcal{N}$ and $\delta=\delta(d,\kappa,\varrho)>0$ , such that if $n\geq N$ and $x_{n}\leq\delta$ , then in the small neighborhood of $(0,0)$ , we have

[TABLE]

By equation (24), applying Lemma 4.1, and taking $\varepsilon=\frac{4}{25}\frac{d(d-1)}{4}\lambda_{1}^{4}$ , one can obtain

[TABLE]

Next by the result of Lemma 4.6 that $\left|\frac{u_{n}}{x_{n}}-\frac{1}{4}\right|<\varepsilon^{\prime}$ and $\left|\frac{w_{n}}{x_{n}}-\frac{1}{4}\right|<\varepsilon^{\prime}$ for any $\varepsilon^{\prime}>0$ , now we take $\varepsilon^{\prime}=\frac{1}{12C_{V}}\frac{d(d-1)}{4}\lambda_{1}^{4}.$ Therefore, by equation (11) and the condition that $\lambda_{1}\geq\lambda_{2}$ , we have

[TABLE]

Note that the initial point $x_{0}=1-\frac{1}{4}=\frac{3}{4}>0$ and Lemma 4.4 implies that there exists $\gamma=\gamma(\varrho,\mathcal{N})=\gamma(d)$ such that $x_{n}\geq x_{0}\gamma^{n}$ . Define $\varepsilon=\varepsilon(d)=\left(\frac{x_{0}\gamma^{N}}{10}\right)^{2}>0$ . Because $\varepsilon$ is independent of $\lambda_{1}$ , considering that $d\lambda_{2}^{2}$ sufficiently close to $1$ , we can choose $|\lambda_{1}|<d^{-1/2}$ such that

[TABLE]

Noting that $\frac{d(d-1)}{2}\lambda_{1}^{4}\geq\left(\frac{d\lambda_{1}^{2}}{2}\right)^{2}\geq\frac{1}{16}$ , equation (5) implies that

[TABLE]

Suppose $\mathcal{Z}_{n}\geq\varepsilon$ for some $n>N$ , and it follows from equations (5) and (27) that

[TABLE]

Therefore, by induction we have $x_{n}\geq\mathcal{Z}_{n}\geq\varepsilon$ for all $n>N$ , which contradicts to the assumption imposed in equation (23). Thus, the proof is completed.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Banks, C. Moore, J. Neeman, and P. Netrapalli , Information-theoretic thresholds for community detection in sparse networks , in Conference on Learning Theory, 2016, pp. 383–416.
2[2] J. Bernussou and J.-L. Abatut , Point mapping stability , Pergamon, 1977.
3[3] S. Bhamidi, R. Rajagopal, and S. Roch , Network delay inference from additive metrics , Random Structures & Algorithms, 37 (2010), pp. 176–203.
4[4] P. M. Bleher, J. Ruiz, and V. A. Zagrebnov , On the purity of the limiting Gibbs state for the Ising model on the bethe lattice , Journal of Statistical Physics, 79 (1995), pp. 473–482.
5[5] C. Borgs, J. Chayes, E. Mossel, and S. Roch , The Kesten-Stigum reconstruction bound is tight for roughly symmetric binary channels , in Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, IEEE, 2006, pp. 518–530.
6[6] G. Brito, I. Dumitriu, S. Ganguly, C. Hoffman, and L. V. Tran , Recovery and rigidity in a regular stochastic block model , in Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, 2016, pp. 1589–1601.
7[7] J. Chayes, L. Chayes, J. P. Sethna, and D. Thouless , A mean field spin glass with short-range interactions , Communications in Mathematical Physics, 106 (1986), pp. 41–89.
8[8] C. Daskalakis, E. Mossel, and S. Roch , Optimal phylogenetic reconstruction , in Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, ACM, 2006, pp. 159–168.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

The non-tightness of the reconstruction threshold of a 444 states symmetric model with different in-block and out-block mutations

Abstract

keywords:

1 Introduction

1.1 The tree reconstruction problem

Definition 1.1**.**

1.2 Existing results with states other than 444

1.3 Existing results with 444 states and the importance of non-tightness

1.4 Motivation and main result

Main Theorem**.**

1.5 Structure of the paper and proof sketch

2 Preparation

2.1 Notations

2.2 Preliminary analyses

Lemma 2.1**.**

Proof 2.2**.**

Lemma 2.3**.**

Proof 2.4**.**

Lemma 2.5**.**

Proof 2.6**.**

2.3 An equivalent condition for non-reconstruction

Lemma 2.7**.**

3 Recursive formulas

3.1 Distributional recursion

Lemma 3.1**.**

Proof 3.2**.**

Lemma 3.3**.**

3.2 Main expansions of xn+1x_{n+1}xn+1​ and z‾n+1\overline{z}_{n+1}zn+1​

4 Concentration analysis

Lemma 4.1**.**

Lemma 4.2**.**

Proof 4.3**.**

Lemma 4.4**.**

Proof 4.5**.**

Lemma 4.6**.**

Proof 4.7**.**

5 Proof of the Main Theorem

The non-tightness of the reconstruction threshold of a $4$ states symmetric model with different in-block and out-block mutations

Definition 1.1.

1.2 Existing results with states other than $4$

1.3 Existing results with $4$ states and the importance of non-tightness

Main Theorem.

Lemma 2.1.

Proof 2.2.

Lemma 2.3.

Proof 2.4.

Lemma 2.5.

Proof 2.6.

Lemma 2.7.

Lemma 3.1.

Proof 3.2.

Lemma 3.3.

3.2 Main expansions of $x_{n+1}$ and $\overline{z}_{n+1}$

Lemma 4.1.

Lemma 4.2.

Proof 4.3.

Lemma 4.4.

Proof 4.5.

Lemma 4.6.

Proof 4.7.