Equi-Energy sampling does not converge rapidly on the mean-field Potts   model with three colors close to the critical temperature

Mirko Ebbers; Matthias L\"owe

arXiv:1904.00394·math.PR·April 22, 2020

Equi-Energy sampling does not converge rapidly on the mean-field Potts model with three colors close to the critical temperature

Mirko Ebbers, Matthias L\"owe

PDF

TL;DR

This paper demonstrates that Equi-Energy Sampling (EES) does not rapidly converge for the mean-field Potts model with three or more colors near the critical temperature, highlighting limitations of EES in certain regimes.

Contribution

The paper proves that EES is slowly mixing in the mean-field Potts model below the critical temperature, extending the understanding of EES limitations to models with multiple colors.

Findings

01

EES does not rapidly converge near the critical temperature.

02

EES is slowly mixing for the mean-field Potts model with three or more colors.

03

Results apply to any number of colors q ≥ 3 with appropriate temperature regimes.

Abstract

Equi-Energy Sampling (EES, for short) is a method to speed up the convergence of the Metropolis chain, when the latter is slow. We show that there are still models like the mean-field Potts model, where EES does not converge rapidly in certain temperature regimes. Indeed we will show that EES is slowly mixing on the mean-field Potts model, in a regime below the critical temperature. Though we will concentrate on the Potts model with three colors, our arguments remain valid for any number of colors $q \geq 3$ , if we adapt the temperature regime. For the situation of the mean-field Potts model this answers a question posed in \cite{HuaKou}.

Equations70

H_{N} (σ) := \frac{1}{2 N} i, j = 1 \sum N δ_{σ_{i} = σ_{j}}, σ \in Ω.

H_{N} (σ) := \frac{1}{2 N} i, j = 1 \sum N δ_{σ_{i} = σ_{j}}, σ \in Ω.

m_{N} (σ) := (m_{N}^{1} (σ), m_{N}^{2} (σ), m_{N}^{3} (σ)) := (\frac{1}{N} i = 1 \sum N δ_{σ_{i} = 1}, \frac{1}{N} i = 1 \sum N δ_{σ_{i} = 2}, \frac{1}{N} i = 1 \sum N δ_{σ_{i} = 3}) .

m_{N} (σ) := (m_{N}^{1} (σ), m_{N}^{2} (σ), m_{N}^{3} (σ)) := (\frac{1}{N} i = 1 \sum N δ_{σ_{i} = 1}, \frac{1}{N} i = 1 \sum N δ_{σ_{i} = 2}, \frac{1}{N} i = 1 \sum N δ_{σ_{i} = 3}) .

H_{N} (σ) = \frac{N}{2} c = 1 \sum 3 (m_{N}^{c} (σ))^{2} .

H_{N} (σ) = \frac{N}{2} c = 1 \sum 3 (m_{N}^{c} (σ))^{2} .

π_{β} (σ) = \frac{e ^{β H (σ)}}{Z _{β}}, σ \in Ω.

π_{β} (σ) = \frac{e ^{β H (σ)}}{Z _{β}}, σ \in Ω.

K_{gen}(\sigma,\tau)=\left\{\begin{array}[]{ll}\frac{1}{2}&\mbox{if }\sigma=\tau\\ \frac{1}{4N}&\mbox{if }d_{H}(\sigma,\tau)=1\\ 0&\mbox{otherwise.}\end{array}\right.

K_{gen}(\sigma,\tau)=\left\{\begin{array}[]{ll}\frac{1}{2}&\mbox{if }\sigma=\tau\\ \frac{1}{4N}&\mbox{if }d_{H}(\sigma,\tau)=1\\ 0&\mbox{otherwise.}\end{array}\right.

T_{\beta}(x,y)=\left\{\begin{array}[]{ll}K_{gen}(x,y)&\mbox{if }x\neq y\mbox{ and }H(y)\geq H(x)\\ K_{gen}(x,y)\frac{\pi_{\beta}(y)}{\pi_{\beta}(x)}&\mbox{if }x\neq y\mbox{ and }H(y)<H(x)\\ 1-\sum_{z\neq x}T_{\beta}(x,z)&\mbox{otherwise.}\end{array}\right.

T_{\beta}(x,y)=\left\{\begin{array}[]{ll}K_{gen}(x,y)&\mbox{if }x\neq y\mbox{ and }H(y)\geq H(x)\\ K_{gen}(x,y)\frac{\pi_{\beta}(y)}{\pi_{\beta}(x)}&\mbox{if }x\neq y\mbox{ and }H(y)<H(x)\\ 1-\sum_{z\neq x}T_{\beta}(x,z)&\mbox{otherwise.}\end{array}\right.

x in f H (x) := h_{0} < h_{1} < \dots < h_{M} = sup H (x)

x in f H (x) := h_{0} < h_{1} < \dots < h_{M} = sup H (x)

x in f H (x) = \frac{N}{6} and x sup H (x) = \frac{N}{2}

x in f H (x) = \frac{N}{6} and x sup H (x) = \frac{N}{2}

0 = β_{0} < β_{1} < \dots < β_{M} = β

0 = β_{0} < β_{1} < \dots < β_{M} = β

Q_{i} (σ, τ) := Q_{n, i} (σ, τ) := \frac{1}{∣ B _{n, k} ∣} min {1, \frac{π _{β_{i}} ( τ ) π _{β_{i - 1}} ( σ )}{π _{β_{i}} ( σ ) π _{β_{i - 1}} ( τ )}} 1_{τ \in B_{n, k}} .

Q_{i} (σ, τ) := Q_{n, i} (σ, τ) := \frac{1}{∣ B _{n, k} ∣} min {1, \frac{π _{β_{i}} ( τ ) π _{β_{i - 1}} ( σ )}{π _{β_{i}} ( σ ) π _{β_{i - 1}} ( τ )}} 1_{τ \in B_{n, k}} .

Q_{i} := j = 0 ⨂ i - 1 I \otimes Q_{i} j = i + 1 ⨂ M I

Q_{i} := j = 0 ⨂ i - 1 I \otimes Q_{i} j = i + 1 ⨂ M I

T_{i} := j = 0 ⨂ i - 1 I \otimes T_{β_{i}} j = i + 1 ⨂ M I .

T_{i} := j = 0 ⨂ i - 1 I \otimes T_{β_{i}} j = i + 1 ⨂ M I .

R = \frac{1}{( M + 1 ) ^{3}} j, k, l = 0 \sum M Q_{j} T_{k} Q_{l} .

R = \frac{1}{( M + 1 ) ^{3}} j, k, l = 0 \sum M Q_{j} T_{k} Q_{l} .

π := i = 0 \prod M π_{β_{i}} \times δ_{M_{0}} .

π := i = 0 \prod M π_{β_{i}} \times δ_{M_{0}} .

π_{β} (m_{N} (σ) = c)

π_{β} (m_{N} (σ) = c)

=

=

f (c) = i = 1 \sum 3 (\frac{β}{2} c_{i}^{2} - c_{i} lo g c_{i})

f (c) = i = 1 \sum 3 (\frac{β}{2} c_{i}^{2} - c_{i} lo g c_{i})

Φ_{S^{'}} = \frac{\sum _{x \in S^{'}, y \in / S^{'}} π ( x ) P ( x , y )}{π ( S ^{'} )} .

Φ_{S^{'}} = \frac{\sum _{x \in S^{'}, y \in / S^{'}} π ( x ) P ( x , y )}{π ( S ^{'} )} .

Φ = S^{'} : π (S^{'}) \leq \frac{1}{2} min Φ_{S^{'}} .

Φ = S^{'} : π (S^{'}) \leq \frac{1}{2} min Φ_{S^{'}} .

\frac{Φ ^{2}}{2} \leq Γ (P) \leq 2Φ.

\frac{Φ ^{2}}{2} \leq Γ (P) \leq 2Φ.

B_{2 ε} := {σ : m_{N} (σ) \in B_{2 ε} (a_{1})}

B_{2 ε} := {σ : m_{N} (σ) \in B_{2 ε} (a_{1})}

\frac{1}{4} \leq π_{β} (m_{N} (σ) \in B_{2 ε}) \leq \frac{1}{3},

\frac{1}{4} \leq π_{β} (m_{N} (σ) \in B_{2 ε}) \leq \frac{1}{3},

π_{β} (m_{N} (σ) = c) = \frac{exp ( N ( f ( c ) + Δ ( c )))}{N Z _{β}},

π_{β} (m_{N} (σ) = c) = \frac{exp ( N ( f ( c ) + Δ ( c )))}{N Z _{β}},

\frac{π _{β} ( m _{N} ( σ ) \in B _{2 ε} ∖ B _{ε} )}{π _{β} ( m _{N} ( σ ) \in B _{2 ε} )} \leq e^{- c^{'} N}

\frac{π _{β} ( m _{N} ( σ ) \in B _{2 ε} ∖ B _{ε} )}{π _{β} ( m _{N} ( σ ) \in B _{2 ε} )} \leq e^{- c^{'} N}

Φ \leq Φ_{B_{2 ε}}

Φ \leq Φ_{B_{2 ε}}

Q_{M} (σ, τ) = 0.

Q_{M} (σ, τ) = 0.

h_{i + 1} - h_{i} = \frac{1}{M} (h_{M} - h_{0}) = \frac{\frac{N}{2} - \frac{N}{6}}{M} = \frac{1}{3 d} .

h_{i + 1} - h_{i} = \frac{1}{M} (h_{M} - h_{0}) = \frac{\frac{N}{2} - \frac{N}{6}}{M} = \frac{1}{3 d} .

∣ H_{N} (σ) - H_{N} (τ) ∣ = \frac{N}{2} (∣∣ m_{N} (σ) ∣ ∣_{2}^{2} - ∣∣ m_{N} (τ) ∣ ∣_{2}^{2}) \leq \frac{1}{3 d},

∣ H_{N} (σ) - H_{N} (τ) ∣ = \frac{N}{2} (∣∣ m_{N} (σ) ∣ ∣_{2}^{2} - ∣∣ m_{N} (τ) ∣ ∣_{2}^{2}) \leq \frac{1}{3 d},

∣∣ m_{N} (σ) ∣ ∣_{2}^{2} - ∣∣ m_{N} (τ) ∣ ∣_{2}^{2} \leq \frac{1}{6 d N} .

∣∣ m_{N} (σ) ∣ ∣_{2}^{2} - ∣∣ m_{N} (τ) ∣ ∣_{2}^{2} \leq \frac{1}{6 d N} .

B_{ε}^{M + 1} := {x \in Ω^{M + 1} : m_{N} (x_{M}) \in B_{ε} (a_{0})} .

B_{ε}^{M + 1} := {x \in Ω^{M + 1} : m_{N} (x_{M}) \in B_{ε} (a_{0})} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Equi-Energy sampling does not converge rapidly on the mean-field Potts model with three colors close to the critical temperature

Mirko Ebbers

Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstraße 62, 48149 Münster, Germany

[email protected]

and

Matthias Löwe

Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstraße 62, 48149 Münster, Germany

[email protected] missing

Abstract.

Equi-Energy Sampling (EES, for short) is a method to speed up the convergence of the Metropolis chain, when the latter is slow. We show that there are still models like the mean-field Potts model, where EES does not converge rapidly in certain temperature regimes. Indeed we will show that EES is slowly mixing on the mean-field Potts model, in a regime below the critical temperature. Though we will concentrate on the Potts model with three colors, our arguments remain valid for any number of colors $q\geq 3$ , if we adapt the temperature regime. For the situation of the mean-field Potts model this answers a question posed in Hua and Kou, (2011).

Key words and phrases:

Equi-Energy Algorithm, Potts model, Swapping Algorithm, Metropolis Algorithm, Markov Chain Monte Carlo methods, Curie-Weiss model

2010 Mathematics Subject Classification:

60J10,60K35, 82B05

Research of the second author was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany ’s Excellence Strategy EXC 2044 –390685587, Mathematics Münster: Dynamics –Geometry -Structure

1. Introduction

Sampling methods are of utmost importance in applied mathematics, e.g. in Bayesian statistics, computational physics, econometrics, or computational biology. In many cases one wants to sample a random element drawn from a finite set $\Omega$ according to a probability distribution $\pi$ on $(\Omega,\mathcal{P}(\Omega))$ . But even this problem may be less trivial than it sounds. Sometimes $\Omega$ may be finite, yet very large. E.g. when modeling a ferromagnetic material on $N$ atoms, $\Omega$ is of the form $\{-1,+1\}^{N}$ and for real size systems, $N$ is of the order $10^{23}$ , thus $|\Omega|$ is of the order $2^{{10}^{23}}$ . Hence a straight-forward Monte-Carlo simulation would take exponentially long in the system size $N$ and thus would be much too expensive. In other situations, the size of $\Omega$ may not even be known, as e.g. in the so-called knapsack problem (Löwe and Meise, (2001)).

One potential solution of this problem lies in the use of Markov Chain Monte Carlo (MCMC, for short) algorithms. They rely on an aperiodic and irreducible Markov chain on $\Omega$ that has $\pi$ as its invariant (i.e. stationary) distribution. One runs this Markov chain, stops it after some long enough time, and takes the current state as a sample element. The ergodic Theorem for Markov chains ensures that this element is almost distributed according to $\pi$ , given one has waited long enough. This method immediately raises two questions:

(1)

Can one find for each $\pi$ a Markov chain that converges in distribution to $\pi$ ? I.e. can one find for each $\pi$ an irreducible and aperiodic Markov chain that has $\pi$ as its invariant measure? This question is answered in the affirmative by the Metropolis-Hastings chain (see e.g. Häggström, (2002)). 2. (2)

How long do we need to wait to get a sample with a distribution that is reasonably close to $\pi$ ? If this waiting time is polynomial in the problem instance we speak about fast or rapid convergence, otherwise, in particular, if the mixing time is exponential in the problem instance, we will say the algorithm converges torpidly or slowly.

It is, however, well known that like the Glauber dynamics the Metropolis-Hastings algorithm usually converges slowly, when the target distribution is multi-modal. Such situations occur e.g. in statistical physics in the presence of a phase transition. Hence slow convergence applies to a number of interesting situations, among them the low temperature phase of the Curie-Weiss model (see e.g. the discussion in Mossel and Sly, (2013)). In the next section we will introduce a close relative of the Curie-Weiss model, the three state mean-field Potts model. This will be our test model for the EES to be introduced in Section 3. In Section 4 we will show that this sampler mixes slowly when applied to the (three state) mean-field Potts model in a certain temperature regime. Here, a key argument is based on a property of the mean-field Potts model that is closely related to its first order phase transition (see Lemma 4.3 and the remark below it): The limit point of the order parameter $m_{N}$ (cf. (2.1)) at high temperatures remains a local maximum of its distribution also in a certain part of the low temperature regime. Hence a Metropolis-Hastings chain started in a neighborhood of this high temperature limit point will typically not escape from this neighborhood in polynomial time (in the part of the low temperature regime described above). But then also EES cannot improve the performance of the Metropolis-Hastings algorithm, because there are simply no observations of the global maxima of the distribution of $m_{N}$ the algorithm could jump at. The proof is completed by combining this observation with the very powerful technical argument of conductance, also known as Cheeger’s inequality (see Theorem 4.4).

2. The mean-field Potts model

Let us now introduce the mean-field Potts model. Consider the space $\Omega=E^{N}$ , where $E=\{1,2,\ldots,q\}$ , $q\in\mathbb{N}$ , $q\geq 3$ , and $N\in{\mathbb{N}}$ (to avoid some complications in the future, we can think of $N$ being a multiple of $q$ ). The elements of $E$ are sometimes referred to as colors.

For convenience, in this note, we will restrict to the case of $q=3$ , i.e. the mean-field Potts model with three states or colors taken from the set $E=\{1,2,3\}$ . This from now on will be our standing assumption for the rest of the note. However, we remark that our argument remains valid for general $q\geq 3$ , if one changes the regime of temperatures appropriately. We will come back to this remark later.

On $\Omega$ we construct an energy function given by

[TABLE]

Here $\delta_{A}$ denotes the indicator function for an event $A$ (which is formally the Dirac measure for the event $A$ , to stress that our notation is consistent with our later use of $\delta$ ). Note that $H_{N}$ can be written as a function of the vector

[TABLE]

Indeed, one easily checks that

[TABLE]

$m_{N}$ is therefore called an order parameter of the model. With $H_{N}$ we associate a Gibbs measure $\pi_{\beta}$ at inverse temperature $\beta>0$ , i.e.

[TABLE]

Here $Z_{\beta}=\sum_{\tau}e^{\beta H(\tau)}$ is the partition function. Note that conventionally in statistical physics the energy function $H$ would carry an additional minus sign, and the Gibbs measure (as well as the partition function) would be defined in terms of the exponential of minus $\beta$ times that energy. Since the two minus signs would cancel and lead to the same definition of the Gibbs measure our Gibbs measure does not carry these conventional minus signs.

The mean-field Potts model was studied in a variety of papers. We refer to Ellis and Wang, (1990) and Kesten and Schonmann, (1989), who showed that there is a critical inverse temperature $\beta_{c}$ . This critical inverse temperature in the 3 states mean-field Potts model equals $\beta_{c}=4\log 2$ (cf. Cuff et al., (2012), which discusses the very interesting phenomenon of a temperature-dependent cut-off effect for the Glauber dynamics of the model).

At $\beta_{c}$ the model undergoes a first order phase transition. More precisely, the order parameter $m_{N}$ of the model converges in distribution to $\mathfrak{a}_{0}:=(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ , when $\beta<\beta_{c}$ . At smaller temperatures one observes the following: For $\beta\geq\beta_{c}$ there is $1>m^{*}(\beta)>\frac{1}{3}$ and vectors $\mathfrak{a}_{1}(\beta),\mathfrak{a}_{2}(\beta),\mathfrak{a}_{3}(\beta)\in{\mathbb{R}}^{3}$ such that the vector $\mathfrak{a}_{i}(\beta)$ has $m^{*}(\beta)$ in its i’th component and all other components are equal such that they sum up to one. For $\beta>\beta_{c}$ the distribution of $m_{N}$ converges to $\frac{1}{3}\sum_{i=1}^{3}\delta_{\mathfrak{a}_{i}(\beta)}$ . Here $\delta$ denotes the Dirac-measure. Finally, for $\beta=\beta_{c}$ there are, moreover, weights $\lambda_{1},\lambda_{2}>0$ that sum up to 1, such that the distribution of $m_{N}$ converges to $\lambda_{1}\delta_{\mathfrak{a}_{0}}+\lambda_{2}\sum_{i=1}^{3}\delta_{\mathfrak{a}_{i}(\beta_{c})}$ .

The phase transition is of first order, since $m^{*}(\beta_{c})>\frac{1}{3}$ , i.e. the jump is discontinuous. Moreover, the vector $\mathfrak{a}_{0}$ remains a local maximum of the distribution of $m_{N}$ for some temperatures below the critical temperature. Such a behavior can also be observed for general values of $q\geq 3$ at other values for $\beta_{c}$ . We will come back to this fact in Section 4, Lemmas 4.2 and 4.3, because it is of utmost importance for the proof of Theorem 4.1 to be given in Section 4.

3. Equi-Energy Sampling

Various modifications of the Metropolis-Hastings algorithm have been proposed to speed up its convergence. Among them the so-called swapping algorithm (see Geyer, (1991)), the exchange Monte Carlo method (see Hukushima and Nemoto, (1996)), parallel tempering (see Orlandini, (1998)) and the simulated tempering algorithm (see Marinari and Parisi, (1992), Geyer and Thompson, (1995), and Madras, (1998)) are very popular in applications. Another variant are Multicanonical Monte Carlo Simulations, introduced by Berg and Neuhaus, (1992), also see Berg, (2000). It is related to umbrella sampling (see Torrie and Valleau, (1977)) and is close in spirit to the swapping algorithm, simulated tempering, as well as EES. A major difference, however, is, how an a priori estimate of the probability distribution of interest is produced. Therefore, we have not yet been able to analyze so far, whether Multicanonical Monte Carlo Simulations suffer from the same shortcomings as swapping, parallel tempering or EES (see next paragraph).

In many situations the algorithms described in the previous paragraph seem indeed to be able to improve the convergence of the Metropolis chain, however, there are also negative theoretical results about these algorithms. Madras and Zheng, (2003) show that the swapping chain converges quickly for the Curie–Weiss model. On the other hand, Bhatnagar and Randall, (2004) and Bhatnagar and Randall, (2016) prove that both, the swapping algorithm and simulated tempering, are slowly mixing for the 3-state Potts model and conjecture that this is caused by the first order phase transition in the Potts model (also see our discussion in the remark following Lemma 4.3). Qualitative properties of the swapping algorithm and parallel tempering were studied in Doll et al., (2018). A first rapid convergence result for the Swapping Algorithm in an disordered situation was proved in Löwe and Vermet, (2009). Ebbers and Löwe, (2009) show that in disordered models the conjecture by Bhatnagar and Randall is not correct. They prove that the Swapping Algorithm mixes slowly on the Random Energy Model, even though this model has only a third order phase transition. In the Blume-Emery-Griffiths model both, rapid or torpid mixing may occur as was shown in Ebbers et al., (2014).

Another idea to improve the performance of the Metropolis chain is the so called Equi-energy sampling algorithm (see e.g. Kou et al., (2006)). This model was tested on the Ising model in Hua and Kou, (2011) and the question, how fast it converges, was posed. For the Potts model, we will answer this question in the next section. In particular, we will show that EES may be slowly mixing on relevant models from statistical mechanics. Variants of EES were studied, among others, in Baragatti et al., (2013).

The principle observation to motivate EES is that a main obstacle to fast mixing is the presence of a phase transition in the model. This, in turn, may be characterized by a multi-modal distribution of a macroscopic observable. Usually, then the (projected) Metropolis chain enters one of the modes rapidly and stays there for an exponentially long time. The EES tries to avoid this behavior by introducing shortcuts in the state space. These shortcuts are created by the observations of Metropolis chains at higher temperatures where the above mentioned modes are less pronounced or possibly not even present. More precisely, additionally to the Metropolis steps one allows also for jumps to points of the same or a similar energy as the present one, given one has observed these points already at higher temperatures (otherwise, the algorithm would require the exact structure of the energy function, in which case simulations would probably be pointless). The EES has been discussed in Kou et al., (2006), its convergence was shown in the same article, and, using a different technique, in Andrieu et al., (2008). We will now give an exact description of a version of this algorithm.

Let us first briefly recall the Metropolis-Hastings algorithm, which is the basis of the EES. To define the first let $K_{gen}$ denote the following aperiodic, symmetric and irreducible Markov chain on $\Omega$ :

[TABLE]

Here $d_{H}$ is the Hamming distance and $K_{gen}$ is a Markov chain, because every $\sigma\in\Omega$ has $2N$ neighbors. Define the corresponding Metropolis-Hastings chain for the probability $\pi_{\beta}$ as $T_{\beta}(\cdot,\cdot)$ :

[TABLE]

Note that $T_{\beta}(\cdot,\cdot)$ sometimes is slow in natural situations, e.g. when sampling from the low temperature distribution of the Curie-Weiss model (see e.g. Madras and Piccioni, (1999), of course, the $T_{\beta}$ has to be adapted to the situation of the Curie-Weiss model). To speed up its convergence, we consider the EES. To define it, we first introduce a sequence of energy levels:

[TABLE]

In our context, it is easily checked that

[TABLE]

which will be used later. Moreover, introduce a sequence of inverse temperature levels

[TABLE]

where we assume that $\beta$ is the temperature we want to sample from. It will often be convenient to take $\beta_{i}=i\frac{\beta}{M}$ . Note that $M$ may and will depend on $N$ , which is not made explicit in Kou et al., (2006), otherwise our construction, so far, agrees with the construction in Kou et al., (2006). We will make an explicit choice for $M$ and give reasons for this choice, after the description of the algorithm

For this, we will also need a dummy state $\iota$ and define $\tilde{\Omega}:=\Omega\cup\{\iota\}.$ Let $\mathcal{M}$ be an $(M+1)\times|\Omega|$ matrix over $\tilde{\Omega}$ , which is initially filled with $\iota$ , only.

The EES consists of alternations between two steps. One is a usual Metropolis step at a (random) temperature level $\beta_{i}$ . The other one is an equi-energy jump at the same temperature, if $i\geq 1$ . At inverse temperature 0 there are only Metropolis moves. We store the resulting energies of the states we see at temperature $\beta_{i}$ by entering them into the $i$ ’th row of the matrix $\mathcal{M}$ , if the state has not been seen before. In this case it replaces one of the $\iota$ ’s (in a pre-described order). To explain the equi-energy step assume that the chain is at temperature $\beta_{i},i\geq 1$ and in state $\sigma$ . We determine the energy level $k$ , such that $h_{k-1}<H(\sigma)\leq h_{k}$ and choose (with equal probabilities) a state $\tau$ from all states $\tau^{\prime}$ with $h_{k-1}<H(\tau^{\prime})\leq h_{k}$ , which we have already seen at temperature level $\beta_{i-1}$ . This new state is accepted with probability $\min\{1,\frac{\pi_{\beta_{i}}(\tau)\pi_{\beta_{i-1}}(\sigma)}{\pi_{\beta_{i}}(\sigma)\pi_{\beta_{i-1}}(\tau)}\}$ . Otherwise, especially, if we have not seen any state in the same energy band in the $i-1$ st row of $\mathcal{M}$ , the chain stays where it is. We denote the corresponding transition matrix (on $\Omega$ ) by $Q_{i}$ . Note that $Q_{i}$ in general depends on time. We will not make this explicit, because we will just analyze the algorithm in the ”best case scenario”, where the matrix $\mathcal{M}$ does not contain any $\iota$ ’s anymore. However, under this assumption, we will still be able to show that EES is slowly converging on the three state mean-field Potts model in a certain temperature regime. Formally

[TABLE]

Here $n$ is the time variable and $B_{n,k}$ is the set of states $\tau^{\prime}$ with $h_{k-1}<H(\tau^{\prime})\leq h_{k}$ , which we have already seen at temperature level $\beta_{i-1}$ by time $n$ .

One might expect, that we indeed use all states we have seen previously, rather than the ones we explored with the chain at temperature $\beta_{i-1}$ . However, there is hardly any difference between the two chains, because if temperatures are very different the chains will typically also see states of very different energies. Our choice has the advantage that it is easy to see that the global chain to be described below is reversible and moreover, it agrees with the choice in the literature, see Kou et al., (2006).

Based on this, we build a matrix that describes the movement of all particles simultaneously. This operator $\mathcal{R}$ will be a matrix on $\Omega^{M+1}$ , of course. We lift the movement of the i’th particle to $\Omega^{M+1}$ by building

[TABLE]

where $I$ is the identity matrix. Similarly, we consider the matrix $\mathcal{T}_{i}$ that lifts the Metropolis step $T_{\beta_{i}}$ to $\Omega^{M+1}$ , i.e. we consider

[TABLE]

Combining these operators the EES is defined by

[TABLE]

Note that the versions of the EES given in Kou et al., (2006) and Andrieu et al., (2008) differ from each other and also our version is slightly different from those. However, the spirit of the algorithms is the same.

In the sequel, we will only consider a number of energy levels $M$ that depends linearly on $N$ , such that $M=dN$ . We will furthermore assume that $h_{i}$ are equi-distant. Indeed, this choice of $M$ is somewhat arbitrary, allowing for a polynomial dependence between $M$ and $N$ would not alter the algorithm much. However, choosing $M$ , e.g. exponentially large in $N$ , would lead to empty, or almost empty energy bands which would make the equi-energy step obsolete. Moreover, it would obviously lead to exponential relaxation times (in $N$ ), because exponentially many temperatures have to be simulated. On the other hand, having $M$ too small, e.g. constant, leads to almost non-interactive components (i.e. an equi-energy jump is almost never accepted) and EES stands no chance of increasing the speed of convergence compared to the standard Metropolis algorithm.

Of course, eventually we will only be interested in the $M+1$ ’st coordinate of this Markov chain. However, studying it entirely, seems easier. First of all, let us note that indeed, the distribution of the $M+1$ ’st coordinate converges to $\pi_{\beta}$ .

Theorem 3.1.

The distribution of the $M+1$ ’st coordinate converges to $\pi_{\beta}$ as time tends to infinity.

Proof.

This is the content of Andrieu et al., (2008) for their version of the EES. For our version the assertion follows from the ergodic theorem for Markov chains. Indeed, denote by $\mathcal{S}$ the Markov chain on $\Omega^{M+1}\times\mathfrak{M}$ , where $\mathfrak{M}$ is the space of all $(M+1)\times 3^{N}$ matrices. $\mathcal{S}$ will behave in its first component like $\mathcal{R}$ while in the second component we keep record of the filling of $\mathcal{M}$ . Observe that each $T_{i}:=T_{\beta_{i}}$ is reversible with respect to $\pi_{\beta_{i}}$ and $\mathcal{M}$ does not play any role for it. On the other hand, once we reach a situation where $\mathcal{M}$ is entirely filled with states different from $\iota$ (we denote this state of $\mathcal{M}$ by $M_{0}$ in the second coordinate of $\mathcal{S})$ , i.e. we have seen all states at all temperatures, also all the equi-energy steps $Q_{i}$ are reversible with respect to $\pi_{\beta_{i}}$ . This is, because $Q_{i}(\sigma,\tau)>0$ , if and only if, $\sigma$ and $\tau$ lie in the same energy band and follows from the construction of the transition probabilities. Thus, once $M_{0}$ is reached – which happens almost surely in finite time – $\mathcal{S}$ is reversible with respect to

[TABLE]

Then the convergence follows from the convergence theorem for Markov chains. This, in particular, yields the assertion of the theorem. ∎

The proof is somewhat misleading, as it seems to indicate, that for exponentially large state spaces there is no hope that EES may converge in polynomial time, since first the state $M_{0}$ has to be reached. However, if we consider the high temperature situation $\beta<\beta_{c}$ in the Potts model the Metropolis-Hastings chain converges to its invariant distribution in polynomial time, even without any equi-energy steps.

On the other hand, we will see in the next section that in part of the low temperature regime $\beta>\beta_{c}$ the situation is even worse. Even, when we start $\mathcal{S}$ in the optimal state $M_{0}$ in its second component, i.e. when we assume the second component is already in $M_{0}$ , the mixing time may be exponential.

4. Torpid mixing of EES on the low temperature mean-field Potts model

We now come to the central result of the note.

Theorem 4.1.

EES is slowly mixing on the 3-state mean-field Potts model, when $\beta_{c}<\beta<3$ , even when the second component of the Markov chain $\mathcal{S}$ introduced in the proof of Theorem 3.1 above is in state $M_{0}$ .

We will prepare the proof of the theorem by explaining the ideas and stating some lemmas. In the proof of the theorem we will exploit one of the main differences between the mean-field Potts model and the Curie-Weiss model (i.e. when $E=\{1,2\}$ ) at low temperatures. This difference lies in the fact, that in the Curie-Weiss model the state where both colors occur equally often is a local minimum of the Gibbs measure at all low temperatures, while it is a local maximum of the Gibbs measure in the mean-field Potts model for some temperatures in the low temperature regime (also see Lemma 4.3). In particular, in the Curie-Weiss model, the Gibbs measure is flat in this state at the critical temperature while it exposes a local maximum in this state at the critical temperature in the Potts model. Thus, in the latter, EES will be very reluctant to move far away from a state $\sigma$ with $m_{N}(\sigma)\approx(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ . This is the core idea, even though the technical steps are somewhat more involved.

Let $c_{1},c_{2},c_{3}$ be numbers in $[0,1]$ that add up to 1 and such that $c_{i}N$ is an integer for each $i=1,2,3$ . Then for $\sigma_{c}$ such that $m_{N}(\sigma_{c})=(c_{1},c_{2},c_{3})=:c$ we have that

[TABLE]

where

[TABLE]

and $\Delta(c)$ is $o(1)$ . Note that we used Stirling’s formula to derive the second equality in (4.1) and the fact that we can rewrite $H_{N}(\sigma)$ as $H_{N}(\sigma)=\frac{1}{2}\sum_{i=1}^{3}c_{i}^{2}$ , if $m_{N}(\sigma)=c$ .

Letting $\mathcal{C}:=\{(c_{1},c_{2},c_{3})\in[0,1]^{3}:\sum_{i=1}^{3}c_{i}=1\}$ to be the domain of $f$ (and the set of all probabilities on the space $E$ ), Gore and Jerrum show:

Lemma 4.2.

(cf.(Gore and Jerrum,, 1999, Proposition 1)) Let $c$ be a local maximum of $f$ . Then $c$ satisfies:

(1)

$c$ * lies in the interior of $\mathcal{C}$ .* 2. (2)

Either $a_{i}=\frac{1}{3}$ for all $i=1,2,3$ , or there are $0<\alpha<\frac{1}{\beta}<\alpha^{\prime}<1$ , such that $a_{i}\in\{\alpha,\alpha^{\prime}\}$ for all $i=1,2,3$ . In the latter case there is exactly one $a_{i}$ equal to $\alpha^{\prime}$ , while all the other $a_{j},j\neq i$ are equal to $\alpha$ .

Analyzing the function $f$ around the point $(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ we find that (in accordance with Lemma 4.2) it might be a local but not a global maximum of $f$ , if $\beta>\beta_{c}=4\log 2$ is not too large (a similar observation was already made in Kesten and Schonmann, (1989)):

Lemma 4.3.

If $4\log 2=\beta_{c}\leq\beta<3$ then $(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ is a local mode of $\pi_{\beta}$ , if $N$ is large enough, i.e. $(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ is a local maximum point of $\pi_{\beta}$ .

Proof.

In view of (4.1) is suffices to analyze $f$ . For $x>0$ , and $a\in[0,1]$ consider $h(x):=f(\frac{1}{3}+x,\frac{1}{3}-ax,\frac{1}{3}-(1-a)x))$ . It is easy matter to check that $h^{\prime}(0)=0$ and $h^{\prime\prime}(0)=-(6-2\beta)(a^{2}+a+1)$ . The assertion follows. ∎

Remark

Lemma 4.3 is a main reason why Theorem 4.1 is true. It is not difficult to check that the same behavior is true for general $q\geq 3$ in an appropriately chosen temperature regime (depending on $q$ ). Therefore, also Theorem 4.1 could be proven for general $q\geq 3$ . Indeed, the property shown in Lemma 4.3 is intrinsically related to the first order phase transition of the mean-field Potts model. Such a phase transition can be characterized by the discontinuous transition of the accumulation point(s) of an order parameter of the model at the critical inverse temperature $\beta_{c}$ . In the Potts model this order parameter is the variable $m_{N}$ . However, in most natural models, these new accumulation point(s) are already local maxima of the distribution of the order parameter for some smaller values of $\beta$ . Similarly, the old accumulation point(s) remain local maxima of the distribution of the order parameter for some larger values of $\beta$ . This is exactly the statement of Lemma 4.3.

Another key ingredient of the proof is a conductance argument (also known as Cheeger’s inequality in Diaconis and Stroock, (1991))

Theorem 4.4.

(Sinclair and Jerrum, (1989)) Let $P$ be a Markov chain on a finite set $S$ . Assume it is reversible with respect to $\pi$ . For all $S^{\prime}\subseteq S$ , define

[TABLE]

The conductance $\Phi$ given by

[TABLE]

Then the following holds true for the spectral gap $\Gamma(P)$ of $P$ :

[TABLE]

As follows e.g. from Diaconis and Stroock, (1991) a spectral gap that is the inverse of a polynomial in the problem instance results in fast mixing of the Markov chain. On the other hand, if the spectral gap is the inverse of an exponential in the problem instance, the Markov chain mixes slowly. An immediate consequence of Theorem 4.4 is that the Metropolis algorithm alone is slowly mixing on the low temperature Potts model.

Proposition 4.5.

The Metropolis algorithm mixes slowly on the Potts model, if $\beta>\beta_{c}$ .

Proof.

Take the macro-state $\mathfrak{a}_{1}:=\mathfrak{a}_{1}(\beta)$ , i.e. the maximum point $a=(a_{1},a_{2},a_{3})$ of $f$ , where $a_{1}>a_{2}=a_{3}$ . This point exists according to Lemma 4.2 and because we are in the low temperature region. Since $\mathfrak{a}_{1}$ is a maximum of $f$ , there is $\varepsilon>0$ such that $f$ is decreasing on the ball of radius $2\varepsilon$ centered in $\mathfrak{a}_{1}$ , $B_{2\varepsilon}(\mathfrak{a}_{1})$ , when we walk from the center to the boundary on a straight line. $\mathfrak{a}_{1}$ is one of the three points in which the distribution of $m_{N}$ concentrates for large $N$ and that are equally likely. Thus for

[TABLE]

we obtain that

[TABLE]

when $N$ is large enough and $\varepsilon>0$ is fixed and small enough. Moreover, due to the exponential structure of $\pi_{\beta}$ , i.e.

[TABLE]

and the behavior of $f$ on $B_{2\varepsilon}(\mathfrak{a}_{1})$ (on $B_{2\varepsilon}(\mathfrak{a}_{1})$ , the function $f$ decreases like a multiple of the square of the two norms) we obtain that

[TABLE]

for a suitably chosen constant $c^{\prime}>0$ . But this implies that the set $\mathfrak{B}_{2\varepsilon}$ constitutes a ”bad cut”. Indeed with the notation of the previous theorem we see that

[TABLE]

Thus $T_{\beta}$ mixes slowly, when $\beta>\beta_{c}$ . ∎

As a consequence, if EES is fast on the low temperature Potts model, this will have to be caused by the equi-energy steps. However, the following important observation is that we will not be able to switch between two states that are at very different distances from the center mode $\mathfrak{a}_{0}:=(\frac{1}{3},\frac{1}{3},\frac{1}{3})$ by an equi-energy step. More precisely:

Lemma 4.6.

For each $\varepsilon>0$ and each $\varepsilon>\delta>0$ there is a number of spins $N_{0}$ such that for all $N>N_{0}$ and whenever $\sigma$ and $\tau$ satisfy $||m_{N}(\sigma)-\mathfrak{a}_{0}||_{1}<\delta$ and $||m_{N}(\tau)-\mathfrak{a}_{0}||_{1}>\varepsilon$ (where $||\cdot||_{1}$ denotes the 1-norm on $\mathcal{C}$ ) then

[TABLE]

Here $Q_{M}$ is defined in (3.4).

Proof.

The proof mainly shows that under the given conditions the energies of $\sigma$ and $\tau$ are too far apart from each other. Indeed, observe that $Q_{M}(\sigma,\tau)>0$ requires $\sigma$ and $\tau$ to be in the same energy band. Thus there is $i\in\{0,\ldots M-1\}$ such that $h_{i}<H_{N}(\sigma),H_{N}(\tau)\leq h_{i+1}$ . Now each $\sigma_{\mathfrak{a}_{0}}$ with $m_{N}(\sigma_{\mathfrak{a}_{0}})={\mathfrak{a}_{0}}$ minimizes the energy to $H_{N}(\sigma_{\mathfrak{a}_{0}})=\frac{N}{2}\times 3\times\frac{1}{9}=\frac{N}{6}$ . On the other hand, the states where all spins point into the same direction have maximal energy $\frac{N}{2}$ cf. (3.3).

Thus, recalling that $M=dN$ , the width of the energy bands is

[TABLE]

Therefore, $\sigma$ and $\tau$ are only in the same energy band, if

[TABLE]

i.e. if the two norms $||m_{N}(\sigma)||_{2}$ and $||m_{N}(\tau)||_{2}$ satisfy

[TABLE]

Since 1-norm and 2-norm are equivalent on $\mathcal{C}$ this proves the assertion. ∎

We will again use a conductance argument to prove Theorem 4.1. In order to prepare it let us lift the balls $\mathfrak{B}_{\varepsilon}$ to $\Omega^{M+1}$ : For $\varepsilon>0$ let

[TABLE]

From now on we will assume that $\beta_{c}<\beta<3$ . Recall that then still $\mathfrak{a}_{0}$ is a local (but not a global) maximum of the function $f$ . Let us fix $\varepsilon>0$ so small, that still $f$ is decreasing on $B_{\varepsilon}(\mathfrak{a}_{0})$ when we move away from the center (in particular, $\mathfrak{a}_{0}$ is the only mode of $\pi_{\beta}$ on $B_{\varepsilon}(\mathfrak{a}_{0})$ ). Moreover, let us fix $\delta<\varepsilon$ and $N_{0}$ so large that even with two equi-energy steps and a Metropolis step in between, a $\sigma$ with $||m_{N}(\sigma)-\mathfrak{a}_{0}||_{1}>\varepsilon$ cannot be reached from a $\tau$ with $||m_{N}(\tau)-\mathfrak{a}_{0}||_{1}<\delta$ .

This can be constructed as in Lemma 4.6. Indeed, we will need the following: For $\delta^{\prime}>0$ given with $\delta<\delta^{\prime}<\varepsilon$ there is $N_{1}$ , such that if $N\geq N_{1}$ an equi-energy jump started in $m_{N}\in B_{\delta}(\mathfrak{a}_{0})$ will not leave $B_{\delta^{\prime}}(\mathfrak{a}_{0})$ . The subsequent Metropolis step can only increase the 1-distance of $m_{N}$ to $\mathfrak{a}_{0}$ by at most $1/N$ , hence $m_{N}$ is still in, say, $B_{\delta^{\prime\prime}}(\mathfrak{a}_{0})$ , for some $\delta^{\prime}<\delta^{\prime\prime}<\varepsilon$ . Finally, there is $N_{2}$ , such that if $N\geq N_{2}$ an equi-energy jump started in $m_{N}\in B_{\delta^{\prime\prime}}(\mathfrak{a}_{0})$ will not leave $B_{\varepsilon}(\mathfrak{a}_{0})$ . We will from now on always take $N\geq N_{0}:=\max\{N_{1},N_{2}\}$ .

All this is necessary because the chains $\mathcal{R}$ and $\mathcal{S}$ possibly comprise two such jumps. Next we prove

Lemma 4.7.

Let $\beta_{c}<\beta<3$ and $\varepsilon>\delta>0$ and $N_{0}$ be chosen as above. Then, there exists $c^{\prime\prime}>0$ such that for $\tilde{\pi}(x):=\prod_{i=0}^{M}\pi_{\beta_{i}}(x_{i})$ , $x\in\Omega^{M+1}$ , we have

[TABLE]

Proof.

According to our above analysis $\mathfrak{a}_{0}$ is a local (but not a global) maximum point of the distribution of $m_{N}$ under $\pi_{\beta}=\pi_{\beta_{M}}=:\pi_{M}$ , if $\beta_{c}<\beta<3$ . Therefore

[TABLE]

for $c^{\prime\prime}>0$ chosen appropriately. The proof of this statement follows the concepts of the proof of Proposition 4.5. This fact is easily transferred to the measure $\tilde{\pi}$ due to its product structure. ∎

With the help of this lemma we will be able to establish that the set $\mathfrak{B}^{M+1}_{\varepsilon}$ constitutes a ”bad cut” for the Markov chain $\mathcal{S}$ .

Proposition 4.8.

Consider the Markov chain $\mathcal{S}$ on the state space $\Omega_{EES}:=\Omega^{M+1}\times\mathfrak{M}$ where again $\mathfrak{M}$ is the space of all $(M+1)\times 3^{N}$ matrices. Here the first coordinate keeps record of the current state of $M+1$ chains, while the second coordinate tracks the filling of the matrix $\mathcal{M}$ .

If $\beta_{c}<\beta<3$ and the second coordinate of $\mathcal{S}$ is equal to $M_{0}$ its conductance $\Phi(\mathcal{S})$ satisfies

[TABLE]

for some $c^{\prime\prime}>0$ , if $N$ is large enough.

Proof.

Since $\beta_{c}<\beta$ clearly $\pi_{M}(\{\sigma:m_{N}(x_{M})\in B_{\varepsilon}(\mathfrak{a}_{0})\})<\frac{1}{2}$ , when $N$ is large enough. Thus also

[TABLE]

for $N$ large enough. Thus

[TABLE]

Here we used, of course, our previous estimates together with our construction of $\delta$ . Starting in $B_{\delta}(\mathfrak{a}_{0})$ the combination of an equi-energy jump, a Metropolis move, and another equi-energy jump will not leave $B_{\varepsilon}(\mathfrak{a}_{0})$ according to Lemma 4.6 and the construction of $\delta$ and $N_{0}$ . ∎

Now we have prepared everything to prove Theorem 4.1.

Proof of Theorem 4.1.

Just note, that conditioned on the event that the second coordinate of $\mathcal{S}$ is in $M_{0}$ (which it cannot leave anymore), $\mathcal{S}$ is reversible with respect to $\pi$ . Hence we can apply Theorem 4.4 together with the conductance estimate of Proposition 4.8 to obtain the desired result. ∎

Remark

Note that a similar proof would not work in the Curie-Weiss model, because there the ”center point” $(1/2,1/2)$ , i.e. the $\sigma$ ’s where both directions for the spins occur equally often, is always a local minimum of the Gibbs measure at low temperatures.

Moreover, note that we could adapt the proof to different values of $q\geq 3$ as mentioned above.

Finally, a similar argument should work for ”more disordered” models, as Potts models on sufficiently dense Erdös-Rényi graphs, as e.g. analyzed in Kabluchko et al., (2019) for $q=2$ .

Remark

We have just seen that EES mixes slowly on the 3-state Potts model at $\beta_{c}<\beta<3$ , even when we know the energies of the entire set of states. We also argued that at high temperatures these temperature steps are not necessary, because already the Metropolis chain itself converges rapidly. However, one may doubt that there are reasonable models, at all, in which EES converges rapidly while the Metropolis algorithm does not. The point is, that, if we have not filled $\mathcal{M}$ almost entirely, a temperature jump may provide the desired tunneling effect, but to a rather unfavorable point of the target distribution.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andrieu et al., (2008) Andrieu, C., Jasra, A., Doucet, A., and Del Moral, P. (2008). A note on convergence of the equi-energy sampler. Stoch. Anal. Appl. , 26(2):298–312.
2Baragatti et al., (2013) Baragatti, M., Grimaud, A., and Pommeret, D. (2013). Parallel tempering with equi-energy moves. Stat. Comput. , 23(3):323–339.
3Berg, (2000) Berg, B. A. (2000). Introduction to multicanonical Monte Carlo simulations. In Monte Carlo methods (Toronto, ON, 1998) , volume 26 of Fields Inst. Commun. , pages 1–24. Amer. Math. Soc., Providence, RI.
4Berg and Neuhaus, (1992) Berg, B. A. and Neuhaus, T. (1992). Multicanonical ensemble: A new approach to simulate first-order phase transitions. Phys. Rev. Lett. , 68:9–12.
5Bhatnagar and Randall, (2004) Bhatnagar, N. and Randall, D. (2004). Torpid mixing of simulated tempering on the Potts model. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 478–487 (electronic), New York. ACM.
6Bhatnagar and Randall, (2016) Bhatnagar, N. and Randall, D. (2016). Simulated tempering and swapping on mean-field models. J. Stat. Phys. , 164(3):495–530.
7Cuff et al., (2012) Cuff, P., Ding, J., Louidor, O., Lubetzky, E., Peres, Y., and Sly, A. (2012). Glauber dynamics for the mean-field Potts model. J. Stat. Phys. , 149(3):432–477.
8Diaconis and Stroock, (1991) Diaconis, P. and Stroock, D. (1991). Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. , 1(1):36–61.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Equi-Energy sampling does not converge rapidly on the mean-field Potts model with three colors close to the critical temperature

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

2. The mean-field Potts model

3. Equi-Energy Sampling

Theorem 3.1**.**

Proof.

4. Torpid mixing of EES on the low temperature mean-field Potts model

Theorem 4.1**.**

Lemma 4.2**.**

Lemma 4.3**.**

Proof.

Theorem 4.4**.**

Proposition 4.5**.**

Proof.

Lemma 4.6**.**

Proof.

Lemma 4.7**.**

Proof.

Proposition 4.8**.**

Proof.

Proof of Theorem 4.1.

Theorem 3.1.

Theorem 4.1.

Lemma 4.2.

Lemma 4.3.

Theorem 4.4.

Proposition 4.5.

Lemma 4.6.

Lemma 4.7.

Proposition 4.8.