The non-tightness of the reconstruction threshold of a 4 states symmetric model with different in-block and out-block mutations
Wenjian Liu, Ning Ning

TL;DR
This paper investigates the reconstruction problem in a 4-state symmetric stochastic block model with varying transition probabilities, establishing conditions under which the reconstruction threshold is not tight, revealing a complex phase where information is theoretically recoverable but computationally hard.
Contribution
It provides the first rigorous analysis of the non-tightness of the reconstruction threshold in a 4-state stochastic block model with asymmetric transition probabilities.
Findings
Identifies conditions for non-tightness of the reconstruction threshold.
Extends understanding of phase transitions in multi-state stochastic block models.
Highlights the complexity of the hybrid-hard phase in 4-state models.
Abstract
The tree reconstruction problem is to collect and analyze massive data at the th level of the tree, to identify whether there is non-vanishing information of the root, as goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block model, an "information-theoretically-solvable-but-computationally-hard" region, or say "hybrid-hard phase", appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Although it has been studied in numerous contexts, the existing literature with rigorous reconstruction thresholds established are very limited, and it becomes extremely challenging when the model under investigation has states (the stochastic block model with …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Topological and Geometric Data Analysis · Bayesian Methods and Mixture Models
\newsiamremark
remarkRemark \newsiamremarkhypothesisHypothesis
\newsiamthmclaimClaim \headersThe non-tightness of the reconstruction thresholdW. Liu and N. Ning
\externaldocumentex_supplement
The non-tightness of the reconstruction threshold of a states symmetric model with different in-block and out-block mutations
Wenjian Liu Dept.of Mathematics and Computer Science, Queensborough Community College, City University of New York (). [email protected]
Ning Ning Dept. of Applied Mathematics, University of Washington, Seattle (). [email protected]
Abstract
The tree reconstruction problem is to collect and analyze massive data at the th level of the tree, to identify whether there is non-vanishing information of the root, as goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block model, an “information-theoretically-solvable-but-computationally-hard” region, or say “hybrid-hard phase”, appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Although it has been studied in numerous contexts, the existing literature with rigorous reconstruction thresholds established are very limited, and it becomes extremely challenging when the model under investigation has states (the stochastic block model with communities). In this paper, inspired by the newly proposed stochastic block model, we study a states symmetric model with different in-block and out-block transition probabilities, and rigorously give the conditions for the non-tightness of the reconstruction threshold.
keywords:
Reconstruction, Markov random fields on trees, Deep generative hierarchical model, Unsupervised learning, Phase transition
{AMS}
60K35 62F15 82B20 68R01
1 Introduction
1.1 The tree reconstruction problem
The tree reconstruction problem, as an interdisciplinary subject, has been studied in numerous contexts including statistical physics, information theory, and computational biology. The reconstructability plays a crucial role in phylogenetic reconstruction in evolutionary biology (see, for instance, [18, 8]), communication theory in the study of noisy computation (see, for instance, [9]), analogous investigations in the realm of network tomography (see, for instance, [3]), reconstructability and distinguishability in the clustering problem of the stochastic block model (see, for instance, [21, 22, 23, 1, 6]), etc.
The tree reconstruction model has two building blocks, with one being an irreducible aperiodic Markov chain on a finite characters set and the other one being a rooted -ary tree (every vertex having exactly offspring). The tree is denoted as , where stands for vertices, stands for edges, and stands for the root. Denote as the state assigned to vertex , and denote specially for the state of the root that is chosen according to an initial distribution on . The root signal propagates in the tree according to a transition matrix which is also called noisy channel, in a way that for each vertex having as its parent, the spin/configuration at is assigned according to the probability for .
The reconstruction problem on an infinite tree is to analyze that given the configurations realized at the th layer of the tree which is denoted as , whether there exists non-vanishing information on the letter transmitted by the root, as goes to infinity. Based on which is defined as conditioned on , the following definition gives one mathematical formulation on reconstructibility:
Definition 1.1**.**
We say that a model is reconstructible on an infinite tree , if for some
[TABLE]
*where is the total variation distance. When the is [math], we say that the model is non-reconstructible on . *
1.2 Existing results with states other than
The reconstructibility is closely related to, the second largest eigenvalue by absolute value of the transition matrix , denoted as . It is well known that the reconstruction problem is solvable when which is the Kesten-Stigum bound ([10, 11]), however when the problem becomes much more challenging and its solvability highly depends on the channel.
The binary model with states corresponds to the Ising model in statistical physics, whose transition matrix is given by
[TABLE]
where is used to describe the deviation from the symmetric channel, i.e. when the channel is asymmetric. For the binary symmetric channel, [4] showed that the reconstruction problem is solvable if and only if . For the binary asymmetric channel with sufficiently large asymmetry, [17, 19] showed that the Kesten-Stigum bound is not the bound for reconstruction. When the asymmetry is sufficiently small, [5] established the first tightness result of the Keston-Stigum reconstruction bound in roughly a decade, and later [15] gave a complete answer to the question on how small the asymmetry is necessary for the tightness of the reconstruction threshold.
For non-binary models, the simplest case is the -state symmetric channel which corresponds to the Potts model in statistical physics, with the following transition matrix
[TABLE]
[25] established the Kesten-Stigum bound for the -state Potts model on regular trees of large degree and showed that the Kesten-Stigum bound is not tight when . Motivated by the K model ([12]) that is one of the most classical Markov DNA evolution models, [13] proposed the following model to distinguish between transitions and transversions, whose transition matrix has two mutation classes with states in each class
[TABLE]
When the number of states are more than or equal to , [13] showed that the Kesten-Stigum bound is not tight.
1.3 Existing results with states and the importance of non-tightness
Well known, the -state and -state cases give the most important reconstruction on the tree models, especially for the applications in phylogenetic reconstruction since they correspond to some of the most basic phylogenetic evolutionary models (see, for instance, the discussions in Section of [20]). However, the -state case is much more challenging and open until very few new results established recently. For the symmetric model with states, [24] showed that in the assortative (ferromagnetic) case the Kesten-Stigum bound is always tight, while in the disassortative (antiferromagnetic) case the Kesten-Stigum bound is tight in a large degree regime and not tight in a low degree regime. Later, [14] investigated a -state asymmetric model whose transition matrix is of the form
[TABLE]
and gave specific conditions under which the Kesten-Stigum bound is not tight.
The stochastic block model has wide applications in statistics, machine learning, and data mining, to name a few. The connection between the reconstruction on the tree problem and the clustering problem in the setting of the stochastic block model, has been well established in recent years (see, for instance, [21, 22, 23, 24]). Specifically, the technique used in handling balanced two clusters models is to transfer the problem of clustering to the reconstructability on trees. For the stochastic block model, an “information-theoretically-solvable-but-computationally-hard” region appears, whenever the Kesten-Stigum bound is not tight for the corresponding reconstruction on the tree problem. Further information can be seen in [24] under the name “hybrid-hard phases”.
1.4 Motivation and main result
While the reconstructability of the -state case of the model in equation (1) is still an open problem, in this paper we are able to give a rigorous answer to the reconstructible question of the -state case of a more complicated and generalized model. Inspired by the stochastic block model proposed in [24] (see Fig. therein for an illustration), we extend model in equation (1) to incorporate different in-block transition probabilities. That is, in this paper, we focus on a -state model with the transition matrix
[TABLE]
Besides different out-block transition probabilities () characterized in [13], the model under investigation has different in-block transition probabilities ( and in one block, and in the other block).
It is easy to see that has eigenvalues: , , , and . Let be the second largest eigenvalue by absolute value. Considering that always implies reconstruction, we only investigate in the following context. Our main result is the following theorem, whose rigorous proof is given in Section 5.
Main Theorem**.**
*If and , the Kesten-Stigum bound is not tight for every , i.e. the reconstruction is solvable for some even if . *
Since and play symmetric roles in this symmetric model (2), without loss of generality, we presume in the sequel.
1.5 Structure of the paper and proof sketch
The technique used here was initiated in [7] in the context of spin glasses. In Section 2, we give detailed definitions and interpretations, conduct preliminary analyses, and then provide an equivalent condition for non-reconstruction:
[TABLE]
Here, and represent the probabilities of giving a correct guess of the root given the spins at distance from the root minus the probability of guessing the root randomly which is in this case, for the root being in block and block respectively. Nonreconstruction means that the mutual information between the root and the spins at distance goes to [math] as tends to infinity, therefore one standard to classify reconstruction and nonreconstruction is to analyze the quantity while in this paper we also need to consider the limiting behavior of .
In Section 3, after in-depth investigation of the recursive relationship, we develop a two dimensional dynamical system of the linear diagonal canonical form regarding quantities and through two new variables and :
[TABLE]
Here, represents the opposite case of as giving a wrong guess in another block. By symmetry, we can also obtain the dynamical system involving simply through replacing by . In Section 4, we show that , , , and are just small perturbations in the above dynamical system in order to study its stability, ensure that the decrease from to is never too large to lose construction, and establish crucial concentration results, by fully taking advantage of the Markov random field property and the symmetries in the probability transition matrix and the network structure. In Section 5, by means of the method of reductio ad absurdum, we show that and can not simultaneously converge to zero as goes to , and then establish the nontightness of Kesten-Stigum bound.
2 Preparation
2.1 Notations
Let be the children of the root and be the subtree of descendants of . Denote the th level of the tree by with being the graph distance on . Denote as the spins on , as conditioned on , and as the spins on where is one of the children of the root . For the notations involving in the sequel, we consistently use superscript to denote the conditional on a specific configuration of the root, and use the subscript to denote the conditional on a specific offspring of the root.
For a configuration on the spins of , define the posterior function by
[TABLE]
for and , where the second equality holds by the recursive nature of the tree. Define as the posterior probability that the root is taking the configuration given the random configuration on the spins in , i.e.,
[TABLE]
Apparently one has
[TABLE]
By the block characteristic of the model, we know that regarding the first (resp. second) block, and (resp. and ) have the same distribution. Considering that the stationary distribution of is given by
[TABLE]
we further have
[TABLE]
From the symmetry and the block characteristic of the model, we know that
[TABLE]
and
[TABLE]
Define as the posterior probability that given the random configuration on spins in , i.e.,
[TABLE]
where the random variables are independent and identically distributed and satisfy
[TABLE]
We define the following moment variables to analyze the differences between different inferences of given the spins at distance from the root and the probability of guessing the root randomly:
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
2.2 Preliminary analyses
We firstly establish some important lemmas which will be used frequently in the sequel.
Lemma 2.1**.**
For any , we have
- (a)
. 2. (b)
. 3. (c)
.
Proof 2.2**.**
- (a)
By the law of total probability and Bayes’ theorem, we have
[TABLE]
Recall that is defined as , and then by the fact that we have
[TABLE]
Furthermore, by the law of total expectation, we have
[TABLE] 2. (b)
Similarly, we have
[TABLE]
[TABLE]
and then
[TABLE]
It follows from the Cauchy-Schwarz inequality that
[TABLE]
which implies
[TABLE]
By the definitions of , and , we know that , and thus equation (5) implies . 3. (c)
An analogous proof of
[TABLE]
can be easily carried out.
Lemma 2.3**.**
For any , we have
- (a)
. 2. (b)
** 3. (c)
. 4. (d)
** 5. (e)
.
Proof 2.4**.**
We only prove (a) and (b) and the others can be shown analogously.
- (a)
By the law of total probability, one has
[TABLE]
therefore
[TABLE] 2. (b)
By the fact that and have the same distribution, and the equation that
[TABLE]
plugging in the result of (a), we can obtain that
[TABLE]
*as desired. *
Recall that is defined as the posterior probability that given the random configuration on spins in , i.e., , for and . The random vectors are independent by the symmetry of the model, and its central moments are investigated in the following lemma.
Lemma 2.5**.**
For each , we have
- (a)
. 2. (b)
. 3. (c)
** 4. (d)
** 5. (e)
** 6. (f)
** 7. (g)
** 8. (h)
** 9. (i)
** 10. (j)
**
Proof 2.6**.**
We only prove (a), (b), and (c) and the others can be shown analogously.
- (a)
Conditioning on for , we have
[TABLE] 2. (b)
Similar, we can obtain
[TABLE] 3. (c)
It follows immediately from the identity that, for
[TABLE]
2.3 An equivalent condition for non-reconstruction
If the reconstruction problem is solvable, contains significant information of the root variable. This can be expressed in several equivalent ways (see [17, 19]).
Lemma 2.7**.**
The non-reconstruction is equivalent to
[TABLE]
3 Recursive formulas
3.1 Distributional recursion
Consider as a configuration on , and let be its restriction to where is the th child of the root . Then from the Markov random field property, we have
[TABLE]
where is given by
[TABLE]
Recall that . Setting , we have
[TABLE]
where
[TABLE]
i.e.,
Lemma 3.1**.**
For any nonnegative , we have
[TABLE]
Proof 3.2**.**
For any configuration with denoting the spins on , we have
[TABLE]
By the symmetry of the tree, we have
[TABLE]
*as desired. *
By Lemma 2.5, the means and variances of monomials of can be approximated as follows:
Lemma 3.3**.**
One has
- (i)
** 2. (ii)
** 3. (iii)
** 4. (iv)
* where*
[TABLE] 5. (v)
* where*
[TABLE] 6. (vi)
* for , where*
[TABLE] 7. (vii)
* for , where*
[TABLE] 8. (viii)
, for , where
[TABLE] 9. (ix)
* where*
[TABLE]
3.2 Main expansions of and
In this section, we investigate the second order recursive relations associated with and , with the assistance of the following identity
[TABLE]
Plugging , , and into equation (8), by the definition of and equation (7), we have
[TABLE]
Next, plugging , , and in equation (8), by the definition of and an analogous derivation as equation (7), we can obtain
[TABLE]
Finally, plugging the results of Section 3.1 into equation (LABEL:xexpansion) and equation (LABEL:zexpansion), and then taking substitutions of
[TABLE]
we obtain a two-dimensional recursive formula of the linear diagonal canonical form:
[TABLE]
where
[TABLE]
[TABLE]
[TABLE]
where is an absolute constant.
4 Concentration analysis
In order to study the stability of the dynamical system (11), we show that , , , and are just small perturbations, in the following two lemmas. The proof of Lemma 4.1 resembles that of Lemma in [14] and is skipped for conciseness.
Lemma 4.1**.**
Assume and for some . For any , there exist and , such that if and , then
[TABLE]
The following lemma improves the result of Lemma 2.1 (c) by establishing the strict positivity of the sum of and .
Lemma 4.2**.**
Assume . For any nonnegative , we always have
[TABLE]
Proof 4.3**.**
In Lemma 2.1 we proved that , so it suffices to exclude the equality. Now let us apply reductio ad absurdum and assume for some . Similar to the derivation in Lemma 2.1 (a) and (b), one can obtain that
[TABLE]
For any configuration set on the th level, we always have
[TABLE]
Denote the leftmost vertex on the th level by , and it follows that
[TABLE]
Define the transition matrices at distance by , , and , and then we have the following recursive system
[TABLE]
The difference of the above two equations evolves as
[TABLE]
and then considering that and , we have
[TABLE]
Finally, from the reversible property of the channel, we can conclude that
[TABLE]
*i.e., , a contradiction to the assumption that . *
The following lemma ensures that does not drop too fast.
Lemma 4.4**.**
Suppose that there exists an integer , such that when . For any , if , then there exists a constant such that
[TABLE]
Proof 4.5**.**
Different to the definition of which is the posterior probability that takes value given the random configuration on spins in , we consider a configuration set on and define the posterior function as
[TABLE]
Setting , by Lemma 2.5, we have
[TABLE]
Apparently, we have the following inequalities (see [16]), regarding the estimator and the maximum-likelihood estimator:
[TABLE]
where the last inequality follows from the condition that . Therefore,
[TABLE]
If , then it is concluded from in Lemma 2.1 that
[TABLE]
If , then , since . To sum up, we always have
[TABLE]
Under the condition that , it can be concluded from the dynamical system (11), Lemma 4.1, and the following inequalities achieved in Lemma 2.1
[TABLE]
that there exists a such that when one has
[TABLE]
Under the condition that for any , set and then we further obtain
[TABLE]
On the other hand, if , by equation (14), one has
[TABLE]
Finally, by Lemma 4.2, it follows that , and thus for all . Therefore, taking
[TABLE]
*completes the proof. *
The following lemma provides the crucial concentration estimates of and , when is small.
Lemma 4.6**.**
Assume and for some . For any , there exist and , such that if and , one has
[TABLE]
As a result, we have the estimates
[TABLE]
Proof 4.7**.**
It follows from 2.3 (d) and (e) that
[TABLE]
and
[TABLE]
Then by Lemma 2.1 (a) we have
[TABLE]
By the definitions of , , , and , and by symmetry, it follows that
[TABLE]
Plugging , , and into equation (8), we have
[TABLE]
The first expectation of equation (18) will contribute to the major terms of the expansion:
[TABLE]
where Lemma 3.3 is used in the last equity and the following derivations. Similarly, we can bound both the second and third terms of equation (18) by :
[TABLE]
and
[TABLE]
Considering that and , the dynamical system (11) yields that
[TABLE]
Equation (18) gives
[TABLE]
and then
[TABLE]
Next display the discussion in the plane. First consider the case that for . In a small neighborhood of , since and , the discrete trajectory approaches the origin point in a way that is “tangential” to the -axis, when is small enough (see [2]). Furthermore, the conclusion of Lemma 4.2 excludes the possibility that the trajectory moves along the -axis. Then for some , there exist constants and , such that if and , we have
[TABLE]
where the remainder term comes from the expansion of . Consequently, it follows
[TABLE]
and by the fact that then
[TABLE]
For fixed , by the fact that can be bounded by for the reason that implied in Lemma 2.1 (b) and (c), it is known from the dynamical system (11) that
[TABLE]
Furthermore, one has
[TABLE]
and then there exists , such that if then for any one has . Therefore, for any positive integer , equation (20) yields
[TABLE]
where, by equation (20) and with denoting the constant therein,
[TABLE]
and by equation (21)
[TABLE]
Firstly, from Lemma 2.1 (a) one has , which implies that . Secondly, by the fact that , it is possible to achieve by choosing . Therefore, we can conclude that it is feasible to take sufficiently large and sufficiently small to guarantee that
[TABLE]
Finally, under the condition that , by Lemma 4.4, we know that there exists such that . Thus, we can choose and , such that if and then
[TABLE]
*The second part of the lemma can be shown similarly as above. *
5 Proof of the Main Theorem
First, consider for any fixed . To investigate the non-tightness, it would be convenient to assume that , say, . We take in the following context. Consider fixed and just varying, and without loss of generality, assume . Consequently choose and thus .
By the definition of non-reconstruction in equation (2.7), it suffices to show that when is close enough to , does not converge to [math] for the reason that it implies that does not converge to [math] considering . We apply reductio ad absurdum, by assuming that
[TABLE]
Therefore, there exists , such that whenever , we have . Next, recalling that , we further define . Then by the symmetry of the model, we can obtain the dynamical form for analogously as the dynamical form for in equation (11) :
[TABLE]
where and are counterparts of and simply by replacing by .
Then we display the discussion in the plane. Since and as from equation (23), in a small neighborhood of , the discrete trajectory approaches the origin point in a way that is “tangential” to the -axis. Furthermore, the conclusion of Lemma 4.2 excludes the possibility that the trajectory moves along the -axis. Therefore, it implies that there exists , such that whenever ,
[TABLE]
From the proof of Lemma 4.6, we know that in the plane there exist and , such that if and , then in the small neighborhood of , we have
[TABLE]
By equation (24), applying Lemma 4.1, and taking , one can obtain
[TABLE]
Next by the result of Lemma 4.6 that and for any , now we take Therefore, by equation (11) and the condition that , we have
[TABLE]
Note that the initial point and Lemma 4.4 implies that there exists such that . Define . Because is independent of , considering that sufficiently close to , we can choose such that
[TABLE]
Noting that , equation (5) implies that
[TABLE]
Suppose for some , and it follows from equations (5) and (27) that
[TABLE]
Therefore, by induction we have for all , which contradicts to the assumption imposed in equation (23). Thus, the proof is completed.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Banks, C. Moore, J. Neeman, and P. Netrapalli , Information-theoretic thresholds for community detection in sparse networks , in Conference on Learning Theory, 2016, pp. 383–416.
- 2[2] J. Bernussou and J.-L. Abatut , Point mapping stability , Pergamon, 1977.
- 3[3] S. Bhamidi, R. Rajagopal, and S. Roch , Network delay inference from additive metrics , Random Structures & Algorithms, 37 (2010), pp. 176–203.
- 4[4] P. M. Bleher, J. Ruiz, and V. A. Zagrebnov , On the purity of the limiting Gibbs state for the Ising model on the bethe lattice , Journal of Statistical Physics, 79 (1995), pp. 473–482.
- 5[5] C. Borgs, J. Chayes, E. Mossel, and S. Roch , The Kesten-Stigum reconstruction bound is tight for roughly symmetric binary channels , in Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, IEEE, 2006, pp. 518–530.
- 6[6] G. Brito, I. Dumitriu, S. Ganguly, C. Hoffman, and L. V. Tran , Recovery and rigidity in a regular stochastic block model , in Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, 2016, pp. 1589–1601.
- 7[7] J. Chayes, L. Chayes, J. P. Sethna, and D. Thouless , A mean field spin glass with short-range interactions , Communications in Mathematical Physics, 106 (1986), pp. 41–89.
- 8[8] C. Daskalakis, E. Mossel, and S. Roch , Optimal phylogenetic reconstruction , in Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, ACM, 2006, pp. 159–168.
