Constrained Monte Carlo Markov Chains on Graphs

Roy Cerqueti; Emilio De Santis

arXiv:1907.00779·math.PR·July 2, 2019

Constrained Monte Carlo Markov Chains on Graphs

Roy Cerqueti, Emilio De Santis

PDF

Open Access

TL;DR

This paper introduces a new constrained Monte Carlo Markov chain method on graphs, ensuring convergence to a target distribution while respecting graph connectivity constraints.

Contribution

It proposes a novel MCMC procedure constrained by graph structure, linking distribution support to graph connectedness for convergence analysis.

Findings

01

Convergence of the Markov chain to the target distribution under graph constraints

02

Analysis of the relationship between distribution support and graph connectedness

03

Framework applicable to graph-structured state spaces

Abstract

This paper presents a novel theoretical Monte Carlo Markov chain procedure in the framework of graphs. It specifically deals with the construction of a Markov chain whose empirical distribution converges to a given reference one. The Markov chain is constrained over an underlying graph, so that states are viewed as vertices and the transition between two states can have positive probability only in presence of an edge connecting them. The analysis is carried out on the basis of the relationship between the support of the target distribution and the connectedness of the graph.

Equations140

t \to \infty lim \frac{1}{t} m = 0 \sum t - 1 1_{{X (m) = s}} = μ (s), s \in S a . s ..

t \to \infty lim \frac{1}{t} m = 0 \sum t - 1 1_{{X (m) = s}} = μ (s), s \in S a . s ..

A_{k} = {s \in S : μ (s) < \frac{1}{k}}, k \geq 2

A_{k} = {s \in S : μ (s) < \frac{1}{k}}, k \geq 2

η_{k} (s) = \frac{1}{∣ A _{k} ∣} 1_{{s \in A_{k}}}, s \in S,

η_{k} (s) = \frac{1}{∣ A _{k} ∣} 1_{{s \in A_{k}}}, s \in S,

μ_{k} = \frac{1}{k} η_{k} + \frac{k - 1}{k} μ .

μ_{k} = \frac{1}{k} η_{k} + \frac{k - 1}{k} μ .

∣∣ μ_{k} - μ ∣ ∣_{T V} = \frac{1}{k} ∣∣ η_{k} - μ ∣ ∣_{T V} \leq \frac{1}{k},

∣∣ μ_{k} - μ ∣ ∣_{T V} = \frac{1}{k} ∣∣ η_{k} - μ ∣ ∣_{T V} \leq \frac{1}{k},

k > \overset{ˉ}{k} := ⌈ \frac{1}{min { μ ( s ) > 0 : s \in S }} ⌉

k > \overset{ˉ}{k} := ⌈ \frac{1}{min { μ ( s ) > 0 : s \in S }} ⌉

μ_{k} (s) \geq \frac{1}{( N - 1 ) k}, s \in S .

μ_{k} (s) \geq \frac{1}{( N - 1 ) k}, s \in S .

μ (s^{1}) \geq μ (s^{2}) \geq \dots \geq μ (s^{N}) .

μ (s^{1}) \geq μ (s^{2}) \geq \dots \geq μ (s^{N}) .

μ_{k} (s^{1}) \geq μ_{k} (s^{2}) \geq \dots \geq μ_{k} (s^{N}) > 0.

μ_{k} (s^{1}) \geq μ_{k} (s^{2}) \geq \dots \geq μ_{k} (s^{N}) > 0.

p_{l,m}=\left\{\begin{array}[]{ll}p,&\hbox{if $l<m$ and $\{s^{l},s^{m}\}\in E$;}\\ \frac{\mu_{k}(s^{m})}{\mu_{k}(s^{l})}p,&\hbox{if $l>m$ and $\{s^{l},s^{m}\}\in E$;}\\ p_{l},&\hbox{if $l=m$;}\\ 0,&\hbox{otherwise,}\\ \end{array}\right.

p_{l,m}=\left\{\begin{array}[]{ll}p,&\hbox{if $l<m$ and $\{s^{l},s^{m}\}\in E$;}\\ \frac{\mu_{k}(s^{m})}{\mu_{k}(s^{l})}p,&\hbox{if $l>m$ and $\{s^{l},s^{m}\}\in E$;}\\ p_{l},&\hbox{if $l=m$;}\\ 0,&\hbox{otherwise,}\\ \end{array}\right.

p_{l} = 1 - p [m^{'} : m^{'} > l \sum 1_{{{s^{l}, s^{m^{'}}} \in E}} + m^{'} : m^{'} < l \sum \frac{μ _{k} ( s ^{m^{'}} )}{μ _{k} ( s ^{l} )} 1_{{{s^{l}, s^{m^{'}}} \in E}}]

p_{l} = 1 - p [m^{'} : m^{'} > l \sum 1_{{{s^{l}, s^{m^{'}}} \in E}} + m^{'} : m^{'} < l \sum \frac{μ _{k} ( s ^{m^{'}} )}{μ _{k} ( s ^{l} )} 1_{{{s^{l}, s^{m^{'}}} \in E}}]

p = l = 1, \dots, N min \frac{1}{2 ( \sum _{m^{'} : m^{'} > l} 1 _{{{s^{l}, s^{m^{'}}} \in E}} + \sum _{m^{'} : m^{'} < l} \frac{μ _{k} ( s ^{m^{'}} )}{μ _{k} ( s ^{l} )} 1 _{{{s^{l}, s^{m^{'}}} \in E}} )} .

p = l = 1, \dots, N min \frac{1}{2 ( \sum _{m^{'} : m^{'} > l} 1 _{{{s^{l}, s^{m^{'}}} \in E}} + \sum _{m^{'} : m^{'} < l} \frac{μ _{k} ( s ^{m^{'}} )}{μ _{k} ( s ^{l} )} 1 _{{{s^{l}, s^{m^{'}}} \in E}} )} .

δ (P) = 1 - i, j = 1, \dots, N in f h = 1 \sum N p_{i, h} \land p_{j, h}

δ (P) = 1 - i, j = 1, \dots, N in f h = 1 \sum N p_{i, h} \land p_{j, h}

δ ((P^{(μ_{k}, G)})^{N - 1}) \leq 1 - (\frac{c _{N}}{k})^{N - 1},

δ ((P^{(μ_{k}, G)})^{N - 1}) \leq 1 - (\frac{c _{N}}{k})^{N - 1},

1 \leq \frac{μ _{k} ( s ^{m} )}{μ _{k} ( s ^{l} )} \leq k (N - 1), for l > m .

1 \leq \frac{μ _{k} ( s ^{m} )}{μ _{k} ( s ^{l} )} \leq k (N - 1), for l > m .

p_{l, m} \geq \frac{c _{N}}{k} .

p_{l, m} \geq \frac{c _{N}}{k} .

p_{l, m}^{(N - 1)} \geq (\frac{c _{N}}{k})^{N - 1}, \forall l, m = 1, \dots, N,

p_{l, m}^{(N - 1)} \geq (\frac{c _{N}}{k})^{N - 1}, \forall l, m = 1, \dots, N,

P (t) = k = \overset{ˉ}{k} + 1 \sum \infty P^{(μ_{k}, G)} 1_{{t \in [t_{k}, t_{k + 1})}} .

P (t) = k = \overset{ˉ}{k} + 1 \sum \infty P^{(μ_{k}, G)} 1_{{t \in [t_{k}, t_{k + 1})}} .

t \to \infty lim \frac{1}{t} m = 0 \sum t - 1 1_{{X (m) = s}} = μ (s), s \in S a . s ..

t \to \infty lim \frac{1}{t} m = 0 \sum t - 1 1_{{X (m) = s}} = μ (s), s \in S a . s ..

ℓ \to \infty lim \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} = μ (s), s \in S a . s ..

ℓ \to \infty lim \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} = μ (s), s \in S a . s ..

ℓ \to \infty lim \frac{t _{ℓ + 1} - t _{ℓ}}{t _{ℓ}} = 0 and ℓ \to \infty lim \frac{t _{ℓ + 1}}{t _{ℓ}} = 1.

ℓ \to \infty lim \frac{t _{ℓ + 1} - t _{ℓ}}{t _{ℓ}} = 0 and ℓ \to \infty lim \frac{t _{ℓ + 1}}{t _{ℓ}} = 1.

\frac{1}{t _{ℓ + 1}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} \leq \frac{1}{t} m = 0 \sum t 1_{{X (m) = s}} \leq \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ + 1} - 1 1_{{X (m) = s}}

\frac{1}{t _{ℓ + 1}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} \leq \frac{1}{t} m = 0 \sum t 1_{{X (m) = s}} \leq \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ + 1} - 1 1_{{X (m) = s}}

\leq \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} + \frac{t _{ℓ + 1} - t _{ℓ}}{t _{ℓ}} .

\leq \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} + \frac{t _{ℓ + 1} - t _{ℓ}}{t _{ℓ}} .

t \to \infty lim \frac{1}{t} m = 0 \sum t 1_{{X (m) = s}} = ℓ \to \infty lim \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} .

t \to \infty lim \frac{1}{t} m = 0 \sum t 1_{{X (m) = s}} = ℓ \to \infty lim \frac{1}{t _{ℓ}} m = 0 \sum t_{ℓ} - 1 1_{{X (m) = s}} .

B_{\ell}(\varepsilon,s)=\left\{\Big{|}\mu(s)-\frac{1}{t_{\ell+1}-t_{\ell}}\sum_{m=t_{\ell}}^{t_{\ell+1}-1}\mathbf{1}_{\{X(m)=s\}}\Big{|}<\varepsilon\right\}.

B_{\ell}(\varepsilon,s)=\left\{\Big{|}\mu(s)-\frac{1}{t_{\ell+1}-t_{\ell}}\sum_{m=t_{\ell}}^{t_{\ell+1}-1}\mathbf{1}_{\{X(m)=s\}}\Big{|}<\varepsilon\right\}.

P (ℓ \to \infty lim inf B_{ℓ} (ε, s)) = 1.

P (ℓ \to \infty lim inf B_{ℓ} (ε, s)) = 1.

∣∣ ϑP (t_{ℓ})^{ℓ^{2 N}} - μ_{ℓ} ∣ ∣_{T V} \leq δ (P (t_{ℓ})^{N - 1})^{⌊ \frac{ℓ ^{2 N}}{N - 1} ⌋} \leq (1 - (\frac{c _{N}}{ℓ})^{N - 1})^{⌊ \frac{ℓ ^{2 N}}{N - 1} ⌋} \leq exp (- c_{N}^{N - 1} ⌊ \frac{ℓ ^{N + 1}}{N - 1} ⌋),

∣∣ ϑP (t_{ℓ})^{ℓ^{2 N}} - μ_{ℓ} ∣ ∣_{T V} \leq δ (P (t_{ℓ})^{N - 1})^{⌊ \frac{ℓ ^{2 N}}{N - 1} ⌋} \leq (1 - (\frac{c _{N}}{ℓ})^{N - 1})^{⌊ \frac{ℓ ^{2 N}}{N - 1} ⌋} \leq exp (- c_{N}^{N - 1} ⌊ \frac{ℓ ^{N + 1}}{N - 1} ⌋),

P (X (t_{ℓ} + k ℓ^{2 N} + i) \neq = Y (t_{ℓ} + k ℓ^{2 N} + i)) \leq exp (- \overset{c}{^}_{N} ⌊ \frac{ℓ ^{N + 1}}{N - 1} ⌋),

P (X (t_{ℓ} + k ℓ^{2 N} + i) \neq = Y (t_{ℓ} + k ℓ^{2 N} + i)) \leq exp (- \overset{c}{^}_{N} ⌊ \frac{ℓ ^{N + 1}}{N - 1} ⌋),

A_{ℓ, i} = {X (t_{ℓ} + a ℓ^{2 N} + i) = Y (t_{ℓ} + a ℓ^{2 N} + i) : a \geq 1 and t_{ℓ} + a ℓ^{2 N} + i \leq t_{ℓ + 1} - 1},

A_{ℓ, i} = {X (t_{ℓ} + a ℓ^{2 N} + i) = Y (t_{ℓ} + a ℓ^{2 N} + i) : a \geq 1 and t_{ℓ} + a ℓ^{2 N} + i \leq t_{ℓ + 1} - 1},

P (A_{ℓ, i}) \geq 1 - (ℓ + 1)^{5 N} exp (- \overset{c}{^}_{N} ⌊ \frac{ℓ ^{N + 1}}{N - 1} ⌋) .

P (A_{ℓ, i}) \geq 1 - (ℓ + 1)^{5 N} exp (- \overset{c}{^}_{N} ⌊ \frac{ℓ ^{N + 1}}{N - 1} ⌋) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Stochastic processes and statistical mechanics · Bayesian Methods and Mixture Models

Full text

Constrained Monte Carlo Markov Chains on Graphs

Roy Cerqueti

University of Macerata, Department of Economics and Law. Via Crescimbeni 20, I-62100, Macerata, Italy

[email protected]

and

Emilio De Santis

University of Rome La Sapienza, Department of Mathematics. Piazzale Aldo Moro, 5, I-00185, Rome, Italy

[email protected]

Abstract.

This paper presents a novel theoretical Monte Carlo Markov chain procedure in the framework of graphs. It specifically deals with the construction of a Markov chain whose empirical distribution converges to a given reference one. The Markov chain is constrained over an underlying graph, so that states are viewed as vertices and the transition between two states can have positive probability only in presence of an edge connecting them. The analysis is carried out on the basis of the relationship between the support of the target distribution and the connectedness of the graph.

Keywords: Markov chain; Graph; Convergence of distribution.

AMS MSC 2010: 60J10, 62E25, 60B10.

1. Introduction

Monte Carlo Markov Chain (MCMC) problems represent a challenging research theme not only for their natural practical implications but also for the related methodological advancements.

The idea of a MCMC problem is to build a reversible regular Markov chain with a target stationary distribution (see e.g. [5, 13, 26]). To pursue this scope, several algorithms have been proposed in the literature. Some of them are worthy to be mentioned.

In the Metropolis Hastings algorithm (see [19, 23]), a transition kernel is employed to iteratively generate a value $y$ at time $t+1$ on the basis of the value $x$ observed at time $t$ .

When the states space is huge the Metropolis Hastings algorithm must be used with great care to avoid that the probabilities of transition become too small and in practice unusable on the computer simulation.

The Gibbs sampler, see [17], solves the problem of the huge cardinality in presence of a multivariate structure for the states space. The strategy is to change state by changing only one of the components of the multivariate state. In so doing, there are few transition probabilities that are different from zero; therefore, they remain not too small in order to be used on a computer. The Gibbs sampler loses meaningfulness when the multivariate structure of the state space is not identified.

The debate on the validity of the Gibbs sampler has been remarkably enriched by [18]. In the quoted paper, the Author elaborates on [26] and deals with a Bayesian choice of a vector of models, whose individual components are selected among a set of countable candidates. Each model have a number of unknown parameters; such a number is not constant, and depends on the considered component of the vector of models. In this context of not fixed dimension of the parameter set, [18] adapts to this context the Metropolis-Hastings algorithm, by proposing a so-called ”reversible jump” version of it (see also [2] for further advancements). In [6], the Authors observe that the convergence issues of the MCMC procedures arise always when the problem involves the selection of one among a number of different model specifications. To solve the convergence matter, [6] proposes a modified Gibbs sampler procedure obtained by introducing a sort of average of the considered models. In general, the issue of the convergence is a critical aspect, as also akcnowledged by Persi Diaconis in his long experience of scientific research and publications in the field. In this respect, we strongly recommend the reading of Diaconis’ personal view on the matter, with some relevant insights of the future development of the MCMC in both areas of mathematical advancements and practical applications (see [10, 11]).

Our paper adds to this debate by dealing with a constrained MCMC problem. In particular, we construct some Markov chains whose empirical distributions converge to a target distribution as time goes to infinity and which are constrained to move among the nodes that are adjacent in an assigned graph.

To present the problem in a proper way, some notation is needed. We will refer hereafter to a graph $G=({\mathcal{S}},E)$ , being ${\mathcal{S}}$ the set collecting the nodes and $E$ the set of the edges. The nodes $s,t\in{\mathcal{S}}$ are declared adjacent in $G$ if $\{s,t\}\in E$ or $s=t$ .

We now state a definition linking graphs and stochastic processes.

Definition 1.

We say that a stochastic process $X=(X(t):t\in\mathbb{N})$ on ${\mathcal{S}}$ is consistent with the graph $G=({\mathcal{S}},E)$ if, for each $t\in\mathbb{N}$ , $X(t)$ and $X(t+1)$ are adjacent in $G$ with probability one.

Given two graphs $G=({\mathcal{S}},E)$ and $G^{\prime}=({\mathcal{S}}^{\prime},E^{\prime})$ we say that $G^{\prime}$ is a subgraph of $G$ if ${\mathcal{S}}^{\prime}\subset{\mathcal{S}}$ and $E^{\prime}\subset E$ , and we write $G^{\prime}\subset G$ .

A particular class of subgraphs will be of interest in the following. Specifically, the subgraph $G^{\prime}=({\mathcal{S}}^{\prime},E^{\prime})\subset G=({\mathcal{S}},E)$ is said to be an induced subgraph of $G$ if $s,t\in{\mathcal{S}}^{\prime}$ and $\{s,t\}\in E$ imply $\{s,t\}\in E^{\prime}$ . In this case we write $G^{\prime}=G[{\mathcal{S}}^{\prime}]$ in order to stress the dependence on the set of nodes ${\mathcal{S}}^{\prime}$ .

We notice that Definition 1 implies that if a process $X=(X(t):t\in\mathbb{N})$ is consistent with a graph $G$ then it is also consistent with any graph $G^{\prime}$ such that $G\subset G^{\prime}$ .

From now we only consider $|{\mathcal{S}}|<\infty$ and consequently a finite graph $G=(\mathcal{S},E)$ . Given a finite graph $G=(\mathcal{S},E)$ and a distribution $\mu=(\mu(s):s\in{\mathcal{S}})$ , we will provide in this paper an answer to the following question:

Q:

Is it possible to construct a (not necessarily homogeneous) Markov chain $X=(X(t):t\in\mathbb{N})$ which is consistent with $G$ and such that its empirical distribution converges almost surely to $\mu$ as $t$ goes to infinity?

More precisely we aim at constructing a reversible Markov chain $X=(X(t):t\in\mathbb{N})$ with the following properties: $X$ is consistent with the graph $G$ and

[TABLE]

The motivations to pose question Q are basically three:

a)

we face the problem of the large cardinality of the states space by controlling the transitions among the states through the edges of a graph;

b)

we introduce a clear structure of the states space through the graph so that one can think to get some desired properties such as stochastic monotonicity or fast convergence;

c)

the introduction of a graph which constrains the positive transitions of the Markov chain describes several real-life evolution phenomena, where it is possible to move in a single step only from a state to an ”adjacent one”.

In the following, we provide an answer to question Q in all possible situations and we show that when $G=({\mathcal{S}},E)$ is connected then it is possible to construct such a (not necessarily time homogeneous) Markov chain.

2. Main results

For a target probability measure $(\mu(s):s\in{\mathcal{S}})$ and a graph $G$ , all the possible situations, along with the related answers to question Q, can be distinguished in four cases:

$(i)$

If the distribution $\mu$ is concentrated on a unique $\bar{s}\in{\mathcal{S}}$ , i.e. $\mu=\delta_{\bar{s}}$ , then one can construct the constant Markov chain $X=(X(t):t\in\mathbb{N})$ such that $X(t)=\bar{s}$ , for each $t$ . By Definition 1 and the concept of adjacent states, one has that $X$ is consistent with $G$ and (1) is trivially satisfied.

$(ii)$

If $G[{supp\,(\mu)}]$ is not connected but ${supp\,(\mu)}$ is contained in a connected component of $G$ , then one can construct a nonhomogeneous Markov chain which is consistent with $G$ and fulfilling condition (1) (see Theorem 1 and Theorem 2 part c. below).

$(iii)$

If $G[{supp\,(\mu)}]$ is not connected and ${supp\,(\mu)}$ is not contained in a unique connected component of $G$ , then it does not exist a stochastic process which is consistent with $G$ and fulfilling (1) (see Theorem 2 part b. below).

$(iv)$

If $G[{supp\,(\mu)}]$ is connected, then one can construct a homogeneous Markov chain consistent with $G$ which satisfies (1) (see Theorem 2 part a. below).

We now deal with item $(ii)$ .

Notice that, in this case, there exists a connected component of ${\mathcal{S}}$ , say $\hat{\mathcal{S}}$ , such that ${supp\,(\mu)}\subset\hat{\mathcal{S}}$ and ${supp\,(\mu)}\neq\hat{\mathcal{S}}$ . Without loss of generality and to avoid the introduction of further notation, we assume that $G$ is connected and we identify $\hat{\mathcal{S}}$ with ${\mathcal{S}}$ .

For a given distribution $\mu=(\mu(s):s\in{\mathcal{S}})$ , let us define, in case $(ii)$ , the non-empty set

[TABLE]

and let the distribution $\eta_{k}=(\eta_{k}(s):s\in{\mathcal{S}})$ be

[TABLE]

i.e. $\eta_{k}$ is the uniform distribution on ${\mathcal{A}}_{k}$ . We also define the distribution $\mu_{k}=(\mu_{k}(s):s\in{\mathcal{S}})$ as

[TABLE]

Notice that

[TABLE]

where $||\cdot||_{TV}$ is the total variation norm (see e.g. [22]).

Let $N$ denote the cardinality of ${\mathcal{S}}$ . Since $G[{supp\,(\mu)}]$ is not connected, then it contains at least two points. Since ${supp\,(\mu)}\subset{\mathcal{S}}$ and ${\mathcal{S}}$ is connected, then $N\geq 3$ . By construction, for any

[TABLE]

and since $N\geq 3$ , one has

[TABLE]

Let us label the elements of ${\mathcal{S}}=\{s^{1},\ldots,s^{N}\}$ such that

[TABLE]

According to definition (2), for $k>\bar{k}$ , one also obtains

[TABLE]

We construct the transition matrix $P^{(\mu_{k},G)}=(p_{l,m}:l,m=1,\ldots,N)$ related to the distribution $\mu_{k}$ and to the graph $G=({\mathcal{S}},E)$ . The dependence on $k$ of the elements of matrix $P^{(\mu_{k},G)}$ is conveniently omitted. For each $l,m=1,\dots,N$ ,

[TABLE]

where

[TABLE]

and

[TABLE]

Notice that by definition $p\leq\frac{1}{2}$ . In fact, since $G$ is connected, there exists at least an edge $\{s^{1},s^{m}\}\in E$ , with $m>1$ ; thus the denominator of (8) is at least equal to $2$ , when $l=1$ . Clearly, $P^{(\mu_{k},G)}$ is a transition or stochastic matrix.

Definition (6) assures that the couple $(\mu_{k},P^{(\mu_{k},G)})$ is reversible. Moreover, $P^{(\mu_{k},G)}$ is irreducible, since $G$ is connected; thus, $\mu_{k}$ is the unique invariant distribution of $P^{(\mu_{k},G)}$ . The transition matrix $P^{(\mu_{k},G)}$ is also aperiodic since, by (7) and (8), $p_{l}\geq\frac{1}{2}$ for $l=1,\ldots,N$ .

We introduce the ergodic coefficient of Dobrushin (see [12] and [4] p. 235), which is defined as

[TABLE]

where $P=(p_{i,j}:i,j=1,\dots,N)$ is a stochastic matrix.

Lemma 1.

Given the transition matrix $P^{(\mu_{k},G)}$ on ${\mathcal{S}}$ constructed above, with $N=|{\mathcal{S}}|\geq 3$ , the Dobrushin’s ergodic coefficient can be bounded from above as follows

[TABLE]

for any $k>\bar{k}$ , where $c_{N}=\frac{1}{2(N-1)^{2}}$ .

Proof.

For $k>\bar{k}$ , condition $N\geq 3$ and inequalities (4) and (5) provide

[TABLE]

Thus, by (10) one obtains $p\geq\frac{c_{N}}{k}$ , for $k>\bar{k}$ . Then one has that, if $p_{l,m}\not=0$ ,

[TABLE]

For $k>\bar{k}$ , since the graph $G$ is connected and $p_{l}\geq\frac{1}{2}$ for each $l=1,\ldots,N$ , then (11) gives that

[TABLE]

where $p_{l,m}^{(N-1)}$ is the transition probability from $s^{l}$ to $s^{m}$ in $(N-1)$ steps.

Then, by definition of the ergodic coefficient of Dobrushin in (9), one has the thesis. ∎

Given an arbitrary distribution over ${\mathcal{S}}$ , namely $\lambda=(\lambda(s):s\in{\mathcal{S}})$ , we construct a non-homogeneous Markov chain $X=(X(t):t\in\mathbb{N})$ with $\lambda$ as initial distribution. The transition matrix of the Markov chain $X$ at time $t\in\mathbb{N}$ will be denoted by $P(t)=(p_{i,j}(t):i,j=1,\dots,N)$ .

Let us consider an increasing sequence of times $(t_{\ell}:\ell\in\mathbb{N})$ , and let us define

[TABLE]

Theorem 1.

Consider a connected graph $G=({\mathcal{S}},E)$ and a distribution $\mu=(\mu(s):s\in{\mathcal{S}})$ . Assume that $G[{supp\,(\mu)}]$ is not connected but ${supp\,(\mu)}$ is contained in a connected component of $G$ . Any Markov chain $X=(X(t):t\in\mathbb{N})$ constructed above with transition matrix given in (12), with sequence of times $(t_{\ell}=\ell^{5N}:\ell\in\mathbb{N})$ is consistent with $G$ and (1) holds true, i.e.

[TABLE]

Proof.

The fact that $X$ is consistent with $G$ derives from the construction of $P$ (see (6) and (12)).

To prove the result, we first check that

[TABLE]

By definition of $(t_{\ell}:\ell\in\mathbb{N})$ one has

[TABLE]

Then (13) implies (1). In fact, for $t\in[t_{\ell},t_{\ell+1})$ , one has

[TABLE]

Thus

[TABLE]

For $\varepsilon>0$ and $s\in{\mathcal{S}}$ let us define the sequence of events $(B_{\ell}(\varepsilon,s):\ell\in\mathbb{N})$ as

[TABLE]

To obtain (13) it is enough that, for each $\varepsilon>0$ and $s\in{\mathcal{S}}$ one has

[TABLE]

Now, take the auxiliary sequence of independent random variables $(Y(t):t\in\mathbb{N})$ with values on ${\mathcal{S}}$ such that $Y(i)$ has distribution $\mu_{k}$ if $i\in[t_{k},t_{k+1})$ (see (2) for the definition of $\mu_{k}$ ).

Notice that for each initial distribution $\vartheta$ on ${\mathcal{S}}$ , Lemma 1 and Dobrushin’s Theorem (see e.g. [4]) give that

[TABLE]

for any $\ell>\bar{k}$ .

Let $\hat{c}_{N}=c_{N}^{N-1}$ . Given $i\geq 0$ and $k\geq 1$ , by the maximal coupling (see [22]) and inequality (15) one can couple $X(t_{\ell}+k\ell^{2N}+i)$ with $Y(t_{\ell}+k\ell^{2N}+i)$ so that

[TABLE]

when $t_{\ell}+k\ell^{2N}+i<t_{\ell+1}$ .

Let us define the sequence of events $(A_{\ell,i}:\ell\in\mathbb{N},i\in[0,\ell^{2N}))$ by

[TABLE]

for any $\ell\in\mathbb{N}$ and any integer $i\in[0,\ell^{2N})$ .

By subadditivity, one has

[TABLE]

We also set $\hat{A}_{\ell}=\bigcap_{i=0}^{\ell^{2N}-1}A_{\ell,i}$ . Then

[TABLE]

By (18) and the first Borel-Cantelli lemma, one has that $\mathbb{P}(\liminf_{\ell\to\infty}\hat{A}_{\ell})=1$ .

Now, for $\varepsilon>0$ and $s\in{\mathcal{S}}$ , let us define the sequence of events $(\hat{B}_{\ell}(\varepsilon,s):\ell\in\mathbb{N})$ as

[TABLE]

A straightforward calculation gives that

[TABLE]

Therefore to end the proof it is enough to show

[TABLE]

Such a result is a consequence of the convergence $\mu_{\ell}\to\mu$ , as $\ell\to\infty$ , the large deviation bounds for i.i.d. Bernoulli random variables and the first Borel-Cantelli lemma. This concludes the proof. ∎

Remark 1.

The definition of $(t_{\ell}:\ell\in\mathbb{N})$ provided in Theorem 1 represents only one of the possible choices. In this respect, it is interesting to note that the proof of Theorem 1 can be adapted to other sequences $(t_{\ell}:\ell\in\mathbb{N})$ . For example, one can take $t_{\ell+1}-t_{\ell}\geq c\ell^{5N-1}$ , with $c>0$ . In this case, for any $\ell\in\mathbb{N}$ , there exists $I_{\ell}\in\mathbb{N}$ and an increasing sequence

[TABLE]

such that $t_{\ell}=t_{\ell}^{(0)}$ , $t_{\ell}^{(I_{\ell})}=t_{\ell+1}$ and the following property holds

[TABLE]

By reproducing the arguments of the proof of Theorem 1 for the sequence $t_{\ell}^{(0)},t_{\ell}^{(1)},\ldots,t_{\ell}^{(I_{\ell})}$ , one obtains that the Markov chain on ${\mathcal{S}}$ with an arbitrary initial distribution and transition matrix as in (12) satisfies (1).

Next example shows that the convergence of the distribution $\mu_{k}$ to the distribution $\mu$ should not be taken too fast and $t_{\ell+1}-t_{\ell}$ should be not taken too small in order to have (1).

Example 1.

Let us consider a graph $G=({\mathcal{S}},E)$ with ${\mathcal{S}}=\{s^{1},s^{2},s^{3},s^{4}\}$ and $E=\{\{s^{1},s^{3}\},\{s^{3},s^{4}\},\{s^{2},s^{4}\}\}$ .

Let us take the distribution $\mu=(\mu(s):s\in{\mathcal{S}})$ having $\mu(s^{1})=\mu(s^{2})=\frac{1}{2}$ , and define $t_{\ell}=\ell$ , for each $\ell\in\mathbb{N}$ , and the sequence of distributions $(\hat{\mu}_{\ell}:\ell\in\mathbb{N})$ where $\hat{\mu}_{\ell}=\mu_{2^{\ell}}$ (see the definition in (2)). In particular, $||\hat{\mu}_{\ell}-\mu||_{TV}\leq\frac{1}{2^{\ell}}$ .

We take a non-homogeneous Markov chain $X=(X(t):t\in\mathbb{N})$ with transition matrix $P(\ell)=(p_{m,n}(\ell):m,n=1,2,3,4)$ , at time $\ell$ , given by

[TABLE]

Accordingly to the definition of $p$ given in (8) and omitting the dependence of $p$ on the index $\ell$ , one has

[TABLE]

Thus, (21) gives that $p_{1,1}(\ell)=1-\frac{1}{2^{\ell+1}}$ at time $\ell$ (see (7)). Therefore, the Borel-Cantelli’s Lemma guarantees that

[TABLE]

and therefore

[TABLE]

In fact, formula (22) allows to consider only $\omega\in\Omega$ such that condition $|\{\ell\in\mathbb{N}:X(\ell)=s^{1},X(\ell+1)\not=s^{1}\}|<\infty$ is satisfied. If $X(\ell)=s^{1}$ for a finite number of $\ell$ , then

[TABLE]

if $X(\ell)=s^{1}$ for infinite values of $\ell$ , then (22) states that

[TABLE]

Notice that Example 1 gives a natural comparison between our setting and the simulated annealing (see [20]). In both cases the hope is that the rate of convergence is fast but, if one tries to have an excessively high rate of convergence, it leads to local minima (case of simulated annealing) or not convergence of the empirical measure to the target distribution $\mu$ in our framework. In the case of excessively fast convergence rate, the response to question Q might be wrong, even if the Markov chain is consistent with the graph $G$ .

Next result provides an answer to Q for items $(iii)$ and $(iv)$ .

Theorem 2.

The following three sentences hold true:

a.

if $G[{supp\,(\mu)}]$ is connected, then each homogeneous Markov chain $X=(X(t):t\in\mathbb{N})$ with state space ${supp\,(\mu)}$ having transition matrix equal to $P^{(\mu,G[{supp\,(\mu)}])}$ defined in (6) satisfies (1). Furthermore, $X$ is consistent with $G$ ;

b.

if $G[{supp\,(\mu)}]$ is not connected and ${supp\,(\mu)}$ is not contained in a connected component of $G$ , then it does not exist a stochastic process consistent with $G$ which satisfies (1).

c.

if $G[{supp\,(\mu)}]$ is not connected and ${supp\,(\mu)}$ is contained in a connected component of $G$ , then each homogeneous Markov chain consistent with $G$ does not satisfy (1);

Proof.

We prove a. Since $G[{supp\,(\mu)}]$ is connected, then the transition matrix $P^{(\mu,G[{supp\,(\mu)}])}$ is well defined. Moreover, $\mu$ is the unique invariant distribution of $P^{(\mu,G[{supp\,(\mu)}])}$ because $P^{(\mu,G[{supp\,(\mu)}])}$ is irreducible. Now, by applying the ergodic theorem, one has (1). The consistence of $X$ with $G$ follows from the fact that, for $l\not=m$ , $p_{l,m}>0$ implies $\{s^{l},s^{m}\}\in E$ .

We prove b. by contradiction. Assume that (1) holds true for a stochastic process $(X(t):t\in\mathbb{N})$ which is consistent with $G$ . Then for each $s\in{supp\,(\mu)}$ one should have

[TABLE]

Let us consider $s^{\prime},s^{\prime\prime}\in{supp\,(\mu)}$ which belong to two different connected components of $G$ . By (24), it follows that $\mathbb{P}(T<\infty)=1$ where

[TABLE]

with

[TABLE]

Without loss of generality one can assume that $\mathbb{P}(T=T_{s^{\prime}})>0$ . Then, by the consistence of $X$ with the graph $G$ , one has that

[TABLE]

Therefore

[TABLE]

and this contradicts (24).

Now, we prove c. Without loss of generality we can consider that the graph $G$ is connected, thus the connected component containing ${supp\,(\mu)}$ is the whole space ${\mathcal{S}}$ . Now, we can reduce to the case of irreducible Markov chains. Indeed, if a Markov chain is not irreducible, (1) cannot be true, because the limit in formula (1), admitting that it exists, depends on the initial state of the Markov chain.

By hypothesis, there exist two connected components of $G[{supp\,(\mu)}]$ , say $G[A]$ and $G[B]$ , with $A,B\subset{supp\,(\mu)}$ and $A\cap B=\emptyset$ , and a path $\gamma=(s^{j_{1}},s^{j_{2}},\ldots,s^{j_{n}})$ of $G$ such that the transition matrix $P=(p_{i,j}:i,j=1,\ldots,N)$ has $p_{j_{r},j_{r+1}}>0$ , for $r=1,\ldots,n-1$ and $s^{j_{a}}\in A$ , $s^{j_{a+1}}\notin{supp\,(\mu)}$ and $s^{j_{a+h}}\in B$ , for some $a=1,\dots,n-2$ and $h=2,\dots,n-a$ .

Assuming that the homogeneous Markov chain $X$ satisfies (1), we proceed by contradiction. Since the Markov chain is irreducible, then ergodic theorem guarantees that

[TABLE]

does exist almost surely. Moreover, by (1), one has

[TABLE]

Therefore, (26) gives

[TABLE]

This is a contradiction since $s^{j_{a+1}}\notin{supp\,(\mu)}$ . ∎

Remark 2.

Suppose that we are under hypothesis (ii) and let us consider $\varepsilon>0$ and a fixed $k\geq\lceil\frac{1}{\varepsilon}\rceil$ . Part a. of Theorem 2 states that it is possible to select a homogeneous Markov chain $X=(X(t):t\in\mathbb{N})$ having transition matrix equal to $P^{(\mu_{k},G)}$ (see (2) and (6)), which satisfies

[TABLE]

Then, (3) and (27) gives that

[TABLE]

Furthermore, $X$ is consistent with $G$ .

Some consequences of Theorems 1 and 2 arise. Let us consider $f:{\mathcal{S}}\to\mathbb{R}$ .

Under condition of Theorem 1 or of Theorem 2 a. one obtains

[TABLE]

where $\mathbb{E}_{\mu}$ is the expected value with respect to the distribution $\mu$ , i.e.

[TABLE]

Moreover, when (27) holds true, then

[TABLE]

Thus, accepting the error $\varepsilon\max_{s\in{\mathcal{S}}}|f(s)|$ given in (30), that can be taken arbitrarily small, one can always use an homogeneous Markov chain to numerically compute $\mathbb{E}_{\mu}(f)$ .

2.1. A remark on suitable criteria for graph selection

We point out that a proper selection of the graph may lead to a more efficient MCMC procedure. In particular, graphs can contribute to the reduction of the number of possible transitions among states, as also Gibbs sampler proposes (see e.g. [17]). In fact, when the number of the states is extremely large, then the unconstrained transition probabilities involving all the pairs of states may be too small, hence too difficult to simulate. In this respect, a proper choice of the graph should ensure the connections among highly probable states, thus avoiding the creation of metastable states (sometimes called wells, see [3, 21]). Indeed, wells are states in which the Markov chain is expected to spend an extremely long time before being able to visit other high-probability ones. This would increase dramatically the mixing time and the convergence speed of the MCMC algorithm (see e.g. [1, 15, 16]).

In this context, a very useful reading are [8, 14, 24], where the (stochastically) monotone MCMC is explored. In details, a Markov chain is said to be stochastically monotone when the states space is endowed with a partial order and there exists a coupling of the chain with itself that maintains the partial order of the states space at any time. Stochastically monotone Markov chains are particularly simple in the simulation procedures (see [14] and [24] for connections with the perfect simulation literature). Now, let us assume that the states space ${\mathcal{S}}$ is endowed with a partial order and consider the target distribution $\mu$ on ${\mathcal{S}}$ . Naturally, there are infinite Markov chains satisfying (1). Some of them might be stochastically monotone, i.e. simple in the simulation process. The role of the graph in obtaining stochastically monotone Markov chains might then be crucial.

As a paradigmatic example, we can take the classical ferromagnetic Ising model assigning a spin $\sigma(i)\in\{-1,+1\}$ to each vertex $i\in V$ and assume that the set ${\mathcal{S}}=\{-1,+1\}^{V}$ is endowed with a partial order such that $\sigma^{\prime}\preceq\sigma^{\prime\prime}$ if and only if $\sigma^{\prime}(i)\leq\sigma^{\prime\prime}(i)$ for each $i\in V$ . In this situation, we have that the Markov chain identified by the Gibbs sampler is stochastically monotone, and this property leads to affordable simulation exercises for the convergence towards the Gibbs measure of the ferromagnetic Ising model (see [24] and, more recently, [9]). There are also other Markov chains converging to the Gibbs measure which do not maintain the ordering of the states space (see e.g. [7, 24]).

It is not difficult to construct other examples for non-ferromagnetic Ising models (where the Gibbs sampler is not stochastically monotone) such that Markov chains consistent with suitably defined graphs are stochastically monotone.

Product graphs and product distributions

We now introduce the standard definition of product of graphs, as in [25]. It leads to a simplification of the MCMC simulations.

Definition 2.

Consider two graphs $G_{1}=({\mathcal{S}}_{1},E_{1}),G_{2}=({\mathcal{S}}_{2},E_{2})$ . The strong product $G_{1}\boxtimes G_{2}$ is a graph $G=({\mathcal{S}},E)$ , where ${\mathcal{S}}={\mathcal{S}}_{1}\times{\mathcal{S}}_{2}$ and $E$ collects the couples $\{(s_{1},s_{2}),(\bar{s}_{1},\bar{s}_{2})\}$ , with $(s_{1},s_{2}),(\bar{s}_{1},\bar{s}_{2})\in{\mathcal{S}}$ , such that one of the following condition is verified

•

$\{s_{1},\bar{s}_{1}\}\in E_{1}$ * and $s_{2}=\bar{s}_{2}$ ;*

•

$s_{1}=\bar{s}_{1}$ * and $\{s_{2},\bar{s}_{2}\}\in E_{2}$ ;*

•

$\{s_{1},\bar{s}_{1}\}\in E_{1}$ * and $\{s_{2},\bar{s}_{2}\}\in E_{2}$ .*

Since the strong product of graphs is associative (see [25]), then Definition 2 can be extended to any collection of $r>2$ graphs obtaining $G=G_{1}\boxtimes\dots\boxtimes G_{r}$ .

Let us consider now $r$ finite sets ${\mathcal{S}}_{1},\dots,{\mathcal{S}}_{r}$ and take a product distribution $\mu=\prod_{h=1}^{r}\mu_{h}$ , where $\mu_{h}$ is a distribution on the space ${\mathcal{S}}_{h}$ . We construct $r$ independent Markov chains $X_{1}=(X_{1}(t):t\in\mathbb{N}),\dots,X_{r}=(X_{r}(t):t\in\mathbb{N})$ such that the $h$ -th Markov chain $X_{h}$ has state space ${\mathcal{S}}_{h}$ and an arbitrary initial distribution $\lambda_{h}=(\lambda_{h}(s_{h}):s_{h}\in{\mathcal{S}}_{h})$ , for each $h=1,\ldots,r$ .

Moreover, by replacing ${\mathcal{S}}$ with ${\mathcal{S}}_{h}$ and $\mu$ with $\mu_{h}$ , we replicate the construction provided before Theorem 1. In so doing, we take $k\in\mathbb{N}$ to define the distribution

[TABLE]

Now, take a sequence of increasing times $(t_{\ell}^{(h)}:\ell\in\mathbb{N})$ , such that

[TABLE]

with $c$ a positive constant.

The transition matrices of $X_{h}$ are $(P_{h}(t):t\in\mathbb{N})$ as in (12):

[TABLE]

We introduce the Markov chain

[TABLE]

Next result is similar to Theorem 1 but it is based on the independent Markov chains constructed above.

Theorem 3.

Let ${\mathcal{S}}=\prod_{h=1}^{r}{\mathcal{S}}_{h}$ and $G({\mathcal{S}},E)=G_{1}({\mathcal{S}}_{1},E_{1})\boxtimes\cdots\boxtimes G_{r}({\mathcal{S}}_{r},E_{r})$ . Let us consider a product distribution $\mu=\prod_{h=1}^{r}\mu_{h}$ and consider the Markov chains $X$ of (33).

Then

[TABLE]

for each $s=(s_{1},\dots,s_{r})\in{\mathcal{S}}$ .

Proof.

By (31) follows that

[TABLE]

In fact, for each $h=1,\ldots,r$ ,

[TABLE]

since

[TABLE]

Thus, the times in $\bigcup_{h=1}^{r}\bigcup_{\ell=1}^{\infty}[t^{(h)}_{\ell},t^{(h)}_{\ell}+\ell^{2N})$ can be neglected in the procedure of checking (34), i.e.

[TABLE]

and also

[TABLE]

Let us define the set of times $A=\bigcup_{h=1}^{r}\bigcup_{\ell=1}^{\infty}[t^{(h)}_{\ell},t^{(h)}_{\ell}+\ell^{2N})$ . Now we introduce the independent random variables $(Y_{h}(t):t\in\mathbb{N},h=1,\ldots,r)$ . The random variables $(Y_{h}(t):t\in\mathbb{N})$ , with label $h$ , take value on $S_{h}$ . Moreover, if $t\in[t^{(h)}_{k},t^{(h)}_{k+1})$ then $Y_{h}(t)$ has distribution $\mu_{h,k}$ .

We now adapt formula (16) to the Markov chain $X_{h}$ . If $\bar{t}\notin A$ then for each $h=1,\ldots,r$ there exists $\bar{\ell}_{h}$ such that $\bar{t}$ belong to $[t^{(h)}_{\bar{\ell}_{h}},t^{(h)}_{\bar{\ell}_{h}+1})$ . In this case formula (16) becomes

[TABLE]

where we recall that $\hat{c}_{N}=\frac{1}{[2(N-1)^{2}]^{N-1}}$ .

Hence, for any $\bar{t}\notin A$ one has that there exist $\bar{\ell}_{1},\ldots,\bar{\ell}_{r}\in\mathbb{N}$ such that $\bar{t}\in\bigcap_{h=1}^{r}[t^{(h)}_{\bar{\ell}_{h}}+{\bar{\ell}_{h}}^{2N},t^{(h)}_{\bar{\ell}_{h}+1})$ . Therefore, using the independence of the random variables $Y$ ’s and the independence of the Markov chains $X$ ’s, one has

[TABLE]

For $\bar{t}\in\bigcap_{h=1}^{r}[t^{(h)}_{\bar{\ell}_{h}}+{\bar{\ell}_{h}}^{2N},t^{(h)}_{\bar{\ell}_{h}+1})$ , the distribution of $(Y_{1}(\bar{t}),\ldots,Y_{r}(\bar{t}))$ coincides with $\prod_{h=1}^{r}\mu_{h,\bar{\ell}_{h}}$ .

Thus, we have

[TABLE]

Notice that any $\bar{\ell}_{h}$ increases to infinity when $\bar{t}$ goes to infinity. Therefore, the left-hand side of (37) goes to zero as $\bar{t}$ goes to infinity. Inequalities (36) and (37) give an upper bound for the distance in total variation between the law of $X(\bar{t})$ and the distribution $\mu$ .

Now, by following the arguments in the proof of Theorem 1, we obtain equation (34). ∎

3. Conclusions

The paper adds to the MCMC literature. In particular, it deals with the existence and identification of a Markov chain which is constrained to move among adjacent nodes of a graph and whose empirical distribution coincides with a prefixed one. In so doing, we classify the cases in which such a Markov chain exists and, in case of existence, when it can be homogeneous or not.

The presence of assigned constraints let the paper be quite different with respect to the classical Metropolis-Hastings Markov chain methods. Indeed, one of the most relevant consequences of the graph-based constraint is the possibility of not having homogeneous Markov chains satisfying question Q, but only nonhomogeneous ones.

The problem is also extended to the particular case of strong products of graph, where also the given distributions are of product type. In this context, we give a result which allows researchers to study the convergence of one Markov chain with a large amount of states by using indipendent Markov chains with small state spaces – hence reducing the computational complexity of related simulation models.

Some suggestions on the speed of convergence are also provided. However, the detailed analysis of this important point may be the topic for future research.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Baldi, A. Frigessi, and M. Piccioni. Importance sampling for Gibbs random fields. Ann. Appl. Probab. , 3(3):914–933, 1993.
2[2] F. Bartolucci, L. Scaccia, and A. Mira. Efficient Bayes factor estimation from the reversible jump output. Biometrika , 93(1):41–52, 2006.
3[3] J. Beltrán and C. Landim. Tunneling and metastability of continuous time Markov chains. J. Stat. Phys. , 140(6):1065–1114, 2010.
4[4] P. Brémaud. Markov chains , volume 31 of Texts in Applied Mathematics . Springer-Verlag, New York, 1999. Gibbs fields, Monte Carlo simulation, and queues.
5[5] S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, editors. Handbook of Markov chain Monte Carlo . Chapman & Hall/CRC Handbooks of Modern Statistical Methods. CRC Press, Boca Raton, FL, 2011.
6[6] B. P. Carlin and S. Chib. Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Series B .
7[7] R. Cerqueti and E. De Santis. Stochastic Ising model with flipping sets of spins and fast decreasing temperature. Ann. Inst. Henri Poincaré Probab. Stat. , 54(2):757–789, 2018.
8[8] D. J. Daley. Stochastically monotone Markov chains. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete , 10:305–317, 1968.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Constrained Monte Carlo Markov Chains on Graphs

Abstract.

1. Introduction

Definition 1**.**

2. Main results

Lemma 1**.**

Proof.

Theorem 1**.**

Proof.

Remark 1**.**

Example 1**.**

Theorem 2**.**

Proof.

Remark 2**.**

2.1. A remark on suitable criteria for graph selection

Product graphs and product distributions

Definition 2**.**

Theorem 3**.**

Proof.

3. Conclusions

Definition 1.

Lemma 1.

Theorem 1.

Remark 1.

Example 1.

Theorem 2.

Remark 2.

Definition 2.

Theorem 3.