Constrained Monte Carlo Markov Chains on Graphs
Roy Cerqueti, Emilio De Santis

TL;DR
This paper introduces a new constrained Monte Carlo Markov chain method on graphs, ensuring convergence to a target distribution while respecting graph connectivity constraints.
Contribution
It proposes a novel MCMC procedure constrained by graph structure, linking distribution support to graph connectedness for convergence analysis.
Findings
Convergence of the Markov chain to the target distribution under graph constraints
Analysis of the relationship between distribution support and graph connectedness
Framework applicable to graph-structured state spaces
Abstract
This paper presents a novel theoretical Monte Carlo Markov chain procedure in the framework of graphs. It specifically deals with the construction of a Markov chain whose empirical distribution converges to a given reference one. The Markov chain is constrained over an underlying graph, so that states are viewed as vertices and the transition between two states can have positive probability only in presence of an edge connecting them. The analysis is carried out on the basis of the relationship between the support of the target distribution and the connectedness of the graph.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Stochastic processes and statistical mechanics · Bayesian Methods and Mixture Models
Constrained Monte Carlo Markov Chains on Graphs
Roy Cerqueti
University of Macerata, Department of Economics and Law. Via Crescimbeni 20, I-62100, Macerata, Italy
and
Emilio De Santis
University of Rome La Sapienza, Department of Mathematics. Piazzale Aldo Moro, 5, I-00185, Rome, Italy
Abstract.
This paper presents a novel theoretical Monte Carlo Markov chain procedure in the framework of graphs. It specifically deals with the construction of a Markov chain whose empirical distribution converges to a given reference one. The Markov chain is constrained over an underlying graph, so that states are viewed as vertices and the transition between two states can have positive probability only in presence of an edge connecting them. The analysis is carried out on the basis of the relationship between the support of the target distribution and the connectedness of the graph.
Keywords: Markov chain; Graph; Convergence of distribution.
AMS MSC 2010: 60J10, 62E25, 60B10.
1. Introduction
Monte Carlo Markov Chain (MCMC) problems represent a challenging research theme not only for their natural practical implications but also for the related methodological advancements.
The idea of a MCMC problem is to build a reversible regular Markov chain with a target stationary distribution (see e.g. [5, 13, 26]). To pursue this scope, several algorithms have been proposed in the literature. Some of them are worthy to be mentioned.
In the Metropolis Hastings algorithm (see [19, 23]), a transition kernel is employed to iteratively generate a value at time on the basis of the value observed at time .
When the states space is huge the Metropolis Hastings algorithm must be used with great care to avoid that the probabilities of transition become too small and in practice unusable on the computer simulation.
The Gibbs sampler, see [17], solves the problem of the huge cardinality in presence of a multivariate structure for the states space. The strategy is to change state by changing only one of the components of the multivariate state. In so doing, there are few transition probabilities that are different from zero; therefore, they remain not too small in order to be used on a computer. The Gibbs sampler loses meaningfulness when the multivariate structure of the state space is not identified.
The debate on the validity of the Gibbs sampler has been remarkably enriched by [18]. In the quoted paper, the Author elaborates on [26] and deals with a Bayesian choice of a vector of models, whose individual components are selected among a set of countable candidates. Each model have a number of unknown parameters; such a number is not constant, and depends on the considered component of the vector of models. In this context of not fixed dimension of the parameter set, [18] adapts to this context the Metropolis-Hastings algorithm, by proposing a so-called ”reversible jump” version of it (see also [2] for further advancements). In [6], the Authors observe that the convergence issues of the MCMC procedures arise always when the problem involves the selection of one among a number of different model specifications. To solve the convergence matter, [6] proposes a modified Gibbs sampler procedure obtained by introducing a sort of average of the considered models. In general, the issue of the convergence is a critical aspect, as also akcnowledged by Persi Diaconis in his long experience of scientific research and publications in the field. In this respect, we strongly recommend the reading of Diaconis’ personal view on the matter, with some relevant insights of the future development of the MCMC in both areas of mathematical advancements and practical applications (see [10, 11]).
Our paper adds to this debate by dealing with a constrained MCMC problem. In particular, we construct some Markov chains whose empirical distributions converge to a target distribution as time goes to infinity and which are constrained to move among the nodes that are adjacent in an assigned graph.
To present the problem in a proper way, some notation is needed. We will refer hereafter to a graph , being the set collecting the nodes and the set of the edges. The nodes are declared adjacent in if or .
We now state a definition linking graphs and stochastic processes.
Definition 1**.**
We say that a stochastic process on is consistent with the graph if, for each , and are adjacent in with probability one.
Given two graphs and we say that is a subgraph of if and , and we write .
A particular class of subgraphs will be of interest in the following. Specifically, the subgraph is said to be an induced subgraph of if and imply . In this case we write in order to stress the dependence on the set of nodes .
We notice that Definition 1 implies that if a process is consistent with a graph then it is also consistent with any graph such that .
From now we only consider and consequently a finite graph . Given a finite graph and a distribution , we will provide in this paper an answer to the following question:
- Q:
Is it possible to construct a (not necessarily homogeneous) Markov chain which is consistent with and such that its empirical distribution converges almost surely to as goes to infinity?
More precisely we aim at constructing a reversible Markov chain with the following properties: is consistent with the graph and
[TABLE]
The motivations to pose question Q are basically three:
- a)
we face the problem of the large cardinality of the states space by controlling the transitions among the states through the edges of a graph;
- b)
we introduce a clear structure of the states space through the graph so that one can think to get some desired properties such as stochastic monotonicity or fast convergence;
- c)
the introduction of a graph which constrains the positive transitions of the Markov chain describes several real-life evolution phenomena, where it is possible to move in a single step only from a state to an ”adjacent one”.
In the following, we provide an answer to question Q in all possible situations and we show that when is connected then it is possible to construct such a (not necessarily time homogeneous) Markov chain.
2. Main results
For a target probability measure and a graph , all the possible situations, along with the related answers to question Q, can be distinguished in four cases:
If the distribution is concentrated on a unique , i.e. , then one can construct the constant Markov chain such that , for each . By Definition 1 and the concept of adjacent states, one has that is consistent with and (1) is trivially satisfied.
If is not connected but is contained in a connected component of , then one can construct a nonhomogeneous Markov chain which is consistent with and fulfilling condition (1) (see Theorem 1 and Theorem 2 part c. below).
If is not connected and is not contained in a unique connected component of , then it does not exist a stochastic process which is consistent with and fulfilling (1) (see Theorem 2 part b. below).
If is connected, then one can construct a homogeneous Markov chain consistent with which satisfies (1) (see Theorem 2 part a. below).
We now deal with item .
Notice that, in this case, there exists a connected component of , say , such that and . Without loss of generality and to avoid the introduction of further notation, we assume that is connected and we identify with .
For a given distribution , let us define, in case , the non-empty set
[TABLE]
and let the distribution be
[TABLE]
i.e. is the uniform distribution on . We also define the distribution as
[TABLE]
Notice that
[TABLE]
where is the total variation norm (see e.g. [22]).
Let denote the cardinality of . Since is not connected, then it contains at least two points. Since and is connected, then . By construction, for any
[TABLE]
and since , one has
[TABLE]
Let us label the elements of such that
[TABLE]
According to definition (2), for , one also obtains
[TABLE]
We construct the transition matrix related to the distribution and to the graph . The dependence on of the elements of matrix is conveniently omitted. For each ,
[TABLE]
where
[TABLE]
and
[TABLE]
Notice that by definition . In fact, since is connected, there exists at least an edge , with ; thus the denominator of (8) is at least equal to , when . Clearly, is a transition or stochastic matrix.
Definition (6) assures that the couple is reversible. Moreover, is irreducible, since is connected; thus, is the unique invariant distribution of . The transition matrix is also aperiodic since, by (7) and (8), for .
We introduce the ergodic coefficient of Dobrushin (see [12] and [4] p. 235), which is defined as
[TABLE]
where is a stochastic matrix.
Lemma 1**.**
Given the transition matrix on constructed above, with , the Dobrushin’s ergodic coefficient can be bounded from above as follows
[TABLE]
for any , where .
Proof.
For , condition and inequalities (4) and (5) provide
[TABLE]
Thus, by (10) one obtains , for . Then one has that, if ,
[TABLE]
For , since the graph is connected and for each , then (11) gives that
[TABLE]
where is the transition probability from to in steps.
Then, by definition of the ergodic coefficient of Dobrushin in (9), one has the thesis. ∎
Given an arbitrary distribution over , namely , we construct a non-homogeneous Markov chain with as initial distribution. The transition matrix of the Markov chain at time will be denoted by .
Let us consider an increasing sequence of times , and let us define
[TABLE]
Theorem 1**.**
Consider a connected graph and a distribution . Assume that is not connected but is contained in a connected component of . Any Markov chain constructed above with transition matrix given in (12), with sequence of times is consistent with and (1) holds true, i.e.
[TABLE]
Proof.
The fact that is consistent with derives from the construction of (see (6) and (12)).
To prove the result, we first check that
[TABLE]
By definition of one has
[TABLE]
Then (13) implies (1). In fact, for , one has
[TABLE]
[TABLE]
Thus
[TABLE]
For and let us define the sequence of events as
[TABLE]
To obtain (13) it is enough that, for each and one has
[TABLE]
Now, take the auxiliary sequence of independent random variables with values on such that has distribution if (see (2) for the definition of ).
Notice that for each initial distribution on , Lemma 1 and Dobrushin’s Theorem (see e.g. [4]) give that
[TABLE]
for any .
Let . Given and , by the maximal coupling (see [22]) and inequality (15) one can couple with so that
[TABLE]
when .
Let us define the sequence of events by
[TABLE]
for any and any integer .
By subadditivity, one has
[TABLE]
We also set . Then
[TABLE]
[TABLE]
By (18) and the first Borel-Cantelli lemma, one has that .
Now, for and , let us define the sequence of events as
[TABLE]
A straightforward calculation gives that
[TABLE]
Therefore to end the proof it is enough to show
[TABLE]
Such a result is a consequence of the convergence , as , the large deviation bounds for i.i.d. Bernoulli random variables and the first Borel-Cantelli lemma. This concludes the proof. ∎
Remark 1**.**
The definition of provided in Theorem 1 represents only one of the possible choices. In this respect, it is interesting to note that the proof of Theorem 1 can be adapted to other sequences . For example, one can take , with . In this case, for any , there exists and an increasing sequence
[TABLE]
such that , and the following property holds
[TABLE]
By reproducing the arguments of the proof of Theorem 1 for the sequence , one obtains that the Markov chain on with an arbitrary initial distribution and transition matrix as in (12) satisfies (1).
Next example shows that the convergence of the distribution to the distribution should not be taken too fast and should be not taken too small in order to have (1).
Example 1**.**
Let us consider a graph with and .
Let us take the distribution having , and define , for each , and the sequence of distributions where (see the definition in (2)). In particular, .
We take a non-homogeneous Markov chain with transition matrix , at time , given by
[TABLE]
Accordingly to the definition of given in (8) and omitting the dependence of on the index , one has
[TABLE]
Thus, (21) gives that at time (see (7)). Therefore, the Borel-Cantelli’s Lemma guarantees that
[TABLE]
and therefore
[TABLE]
In fact, formula (22) allows to consider only such that condition is satisfied. If for a finite number of , then
[TABLE]
if for infinite values of , then (22) states that
[TABLE]
Notice that Example 1 gives a natural comparison between our setting and the simulated annealing (see [20]). In both cases the hope is that the rate of convergence is fast but, if one tries to have an excessively high rate of convergence, it leads to local minima (case of simulated annealing) or not convergence of the empirical measure to the target distribution in our framework. In the case of excessively fast convergence rate, the response to question Q might be wrong, even if the Markov chain is consistent with the graph .
Next result provides an answer to Q for items and .
Theorem 2**.**
The following three sentences hold true:
- a.
if is connected, then each homogeneous Markov chain with state space having transition matrix equal to defined in (6) satisfies (1). Furthermore, is consistent with ;
- b.
if is not connected and is not contained in a connected component of , then it does not exist a stochastic process consistent with which satisfies (1).
- c.
if is not connected and is contained in a connected component of , then each homogeneous Markov chain consistent with does not satisfy (1);
Proof.
We prove a. Since is connected, then the transition matrix is well defined. Moreover, is the unique invariant distribution of because is irreducible. Now, by applying the ergodic theorem, one has (1). The consistence of with follows from the fact that, for , implies .
We prove b. by contradiction. Assume that (1) holds true for a stochastic process which is consistent with . Then for each one should have
[TABLE]
Let us consider which belong to two different connected components of . By (24), it follows that where
[TABLE]
with
[TABLE]
Without loss of generality one can assume that . Then, by the consistence of with the graph , one has that
[TABLE]
Therefore
[TABLE]
and this contradicts (24).
Now, we prove c. Without loss of generality we can consider that the graph is connected, thus the connected component containing is the whole space . Now, we can reduce to the case of irreducible Markov chains. Indeed, if a Markov chain is not irreducible, (1) cannot be true, because the limit in formula (1), admitting that it exists, depends on the initial state of the Markov chain.
By hypothesis, there exist two connected components of , say and , with and , and a path of such that the transition matrix has , for and , and , for some and .
Assuming that the homogeneous Markov chain satisfies (1), we proceed by contradiction. Since the Markov chain is irreducible, then ergodic theorem guarantees that
[TABLE]
does exist almost surely. Moreover, by (1), one has
[TABLE]
Therefore, (26) gives
[TABLE]
[TABLE]
This is a contradiction since . ∎
Remark 2**.**
Suppose that we are under hypothesis (ii) and let us consider and a fixed . Part a. of Theorem 2 states that it is possible to select a homogeneous Markov chain having transition matrix equal to (see (2) and (6)), which satisfies
[TABLE]
[TABLE]
Furthermore, is consistent with .
Some consequences of Theorems 1 and 2 arise. Let us consider .
Under condition of Theorem 1 or of Theorem 2 a. one obtains
[TABLE]
where is the expected value with respect to the distribution , i.e.
[TABLE]
Moreover, when (27) holds true, then
[TABLE]
Thus, accepting the error given in (30), that can be taken arbitrarily small, one can always use an homogeneous Markov chain to numerically compute .
2.1. A remark on suitable criteria for graph selection
We point out that a proper selection of the graph may lead to a more efficient MCMC procedure. In particular, graphs can contribute to the reduction of the number of possible transitions among states, as also Gibbs sampler proposes (see e.g. [17]). In fact, when the number of the states is extremely large, then the unconstrained transition probabilities involving all the pairs of states may be too small, hence too difficult to simulate. In this respect, a proper choice of the graph should ensure the connections among highly probable states, thus avoiding the creation of metastable states (sometimes called wells, see [3, 21]). Indeed, wells are states in which the Markov chain is expected to spend an extremely long time before being able to visit other high-probability ones. This would increase dramatically the mixing time and the convergence speed of the MCMC algorithm (see e.g. [1, 15, 16]).
In this context, a very useful reading are [8, 14, 24], where the (stochastically) monotone MCMC is explored. In details, a Markov chain is said to be stochastically monotone when the states space is endowed with a partial order and there exists a coupling of the chain with itself that maintains the partial order of the states space at any time. Stochastically monotone Markov chains are particularly simple in the simulation procedures (see [14] and [24] for connections with the perfect simulation literature). Now, let us assume that the states space is endowed with a partial order and consider the target distribution on . Naturally, there are infinite Markov chains satisfying (1). Some of them might be stochastically monotone, i.e. simple in the simulation process. The role of the graph in obtaining stochastically monotone Markov chains might then be crucial.
As a paradigmatic example, we can take the classical ferromagnetic Ising model assigning a spin to each vertex and assume that the set is endowed with a partial order such that if and only if for each . In this situation, we have that the Markov chain identified by the Gibbs sampler is stochastically monotone, and this property leads to affordable simulation exercises for the convergence towards the Gibbs measure of the ferromagnetic Ising model (see [24] and, more recently, [9]). There are also other Markov chains converging to the Gibbs measure which do not maintain the ordering of the states space (see e.g. [7, 24]).
It is not difficult to construct other examples for non-ferromagnetic Ising models (where the Gibbs sampler is not stochastically monotone) such that Markov chains consistent with suitably defined graphs are stochastically monotone.
Product graphs and product distributions
We now introduce the standard definition of product of graphs, as in [25]. It leads to a simplification of the MCMC simulations.
Definition 2**.**
Consider two graphs . The strong product is a graph , where and collects the couples , with , such that one of the following condition is verified
- •
* and ;*
- •
* and ;*
- •
* and .*
Since the strong product of graphs is associative (see [25]), then Definition 2 can be extended to any collection of graphs obtaining .
Let us consider now finite sets and take a product distribution , where is a distribution on the space . We construct independent Markov chains such that the -th Markov chain has state space and an arbitrary initial distribution , for each .
Moreover, by replacing with and with , we replicate the construction provided before Theorem 1. In so doing, we take to define the distribution
[TABLE]
Now, take a sequence of increasing times , such that
[TABLE]
with a positive constant.
The transition matrices of are as in (12):
[TABLE]
We introduce the Markov chain
[TABLE]
Next result is similar to Theorem 1 but it is based on the independent Markov chains constructed above.
Theorem 3**.**
Let and . Let us consider a product distribution and consider the Markov chains of (33).
Then
[TABLE]
for each .
Proof.
By (31) follows that
[TABLE]
In fact, for each ,
[TABLE]
since
[TABLE]
Thus, the times in can be neglected in the procedure of checking (34), i.e.
[TABLE]
and also
[TABLE]
Let us define the set of times . Now we introduce the independent random variables . The random variables , with label , take value on . Moreover, if then has distribution .
We now adapt formula (16) to the Markov chain . If then for each there exists such that belong to . In this case formula (16) becomes
[TABLE]
where we recall that .
Hence, for any one has that there exist such that . Therefore, using the independence of the random variables ’s and the independence of the Markov chains ’s, one has
[TABLE]
For , the distribution of coincides with .
Thus, we have
[TABLE]
Notice that any increases to infinity when goes to infinity. Therefore, the left-hand side of (37) goes to zero as goes to infinity. Inequalities (36) and (37) give an upper bound for the distance in total variation between the law of and the distribution .
Now, by following the arguments in the proof of Theorem 1, we obtain equation (34). ∎
3. Conclusions
The paper adds to the MCMC literature. In particular, it deals with the existence and identification of a Markov chain which is constrained to move among adjacent nodes of a graph and whose empirical distribution coincides with a prefixed one. In so doing, we classify the cases in which such a Markov chain exists and, in case of existence, when it can be homogeneous or not.
The presence of assigned constraints let the paper be quite different with respect to the classical Metropolis-Hastings Markov chain methods. Indeed, one of the most relevant consequences of the graph-based constraint is the possibility of not having homogeneous Markov chains satisfying question Q, but only nonhomogeneous ones.
The problem is also extended to the particular case of strong products of graph, where also the given distributions are of product type. In this context, we give a result which allows researchers to study the convergence of one Markov chain with a large amount of states by using indipendent Markov chains with small state spaces – hence reducing the computational complexity of related simulation models.
Some suggestions on the speed of convergence are also provided. However, the detailed analysis of this important point may be the topic for future research.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. Baldi, A. Frigessi, and M. Piccioni. Importance sampling for Gibbs random fields. Ann. Appl. Probab. , 3(3):914–933, 1993.
- 2[2] F. Bartolucci, L. Scaccia, and A. Mira. Efficient Bayes factor estimation from the reversible jump output. Biometrika , 93(1):41–52, 2006.
- 3[3] J. Beltrán and C. Landim. Tunneling and metastability of continuous time Markov chains. J. Stat. Phys. , 140(6):1065–1114, 2010.
- 4[4] P. Brémaud. Markov chains , volume 31 of Texts in Applied Mathematics . Springer-Verlag, New York, 1999. Gibbs fields, Monte Carlo simulation, and queues.
- 5[5] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, editors. Handbook of Markov chain Monte Carlo . Chapman & Hall/CRC Handbooks of Modern Statistical Methods. CRC Press, Boca Raton, FL, 2011.
- 6[6] B. P. Carlin and S. Chib. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Series B .
- 7[7] R. Cerqueti and E. De Santis. Stochastic Ising model with flipping sets of spins and fast decreasing temperature. Ann. Inst. Henri Poincaré Probab. Stat. , 54(2):757–789, 2018.
- 8[8] D. J. Daley. Stochastically monotone Markov chains. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete , 10:305–317, 1968.
