Fast algorithms at low temperatures via Markov chains
Zongchen Chen, Andreas Galanis, Leslie Ann Goldberg, Will Perkins,, James Stewart, Eric Vigoda

TL;DR
This paper introduces a new Markov chain approach for polymer models that achieves rapid mixing at low temperatures, enabling efficient sampling and approximation algorithms for the Potts and hard-core models on bounded-degree graphs.
Contribution
The authors develop a Markov chain method that bypasses complex zero-free region analysis, providing faster sampling algorithms with optimal running times for certain statistical physics models.
Findings
Achieves $O(n \, \log n)$ sampling time for the Potts model.
Achieves $O(n^2 \, \log n)$ sampling time for the hard-core model.
Proves polynomial mixing time for spin Glauber dynamics in restricted state spaces.
Abstract
We define a discrete-time Markov chain for abstract polymer models and show that under sufficient decay of the polymer weights, this chain mixes rapidly. We apply this Markov chain to polymer models derived from the hard-core and ferromagnetic Potts models on bounded-degree (bipartite) expander graphs. In this setting, Jenssen, Keevash and Perkins (2019) recently gave an FPTAS and an efficient sampling algorithm at sufficiently high fugacity and low temperature respectively. Their method is based on using the cluster expansion to obtain a complex zero-free region for the partition function of a polymer model, and then approximating this partition function using the polynomial interpolation method of Barvinok. Our approach via the polymer model Markov chain circumvents the zero-free analysis and the generalization to complex parameters, and leads to a sampling algorithm with a fast…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Fast algorithms at low temperatures via Markov chains††thanks: These results were announced in preliminary form (without proofs) as a brief abstract in the proceedings of APPROX/RANDOM 2019
Zongchen Chen School of Computer Science, Georgia Institute of Technology. Research supported in part by NSF grants CCF-1617306 and CCF-1563838.
Andreas Galanis The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) ERC grant agreement no. 334828. The paper reflects only the authors’ views and not the views of the ERC or the European Commission. The European Union is not liable for any use that may be made of the information contained therein. Authors’ address: Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK.
Leslie Ann Goldberg*†*
Will Perkins Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago. Supported in part by NSF grants DMS-1847451 and CCF-1934915. Part of this work was done while WP was visiting the Simons Institute for the Theory of Computing.
James Stewart*†*
Eric Vigoda∗
(April 13, 2021)
Abstract
Efficient algorithms for approximate counting and sampling in spin systems typically apply in the so-called high-temperature regime, where the interaction between neighboring spins is “weak”. Instead, recent work of Jenssen, Keevash and Perkins yields polynomial-time algorithms in the low-temperature regime on bounded-degree (bipartite) expander graphs using polymer models and the cluster expansion.
In order to speed up these algorithms (so the exponent in the run time does not depend on the degree bound) we present a Markov chain for polymer models and show that it is rapidly mixing under exponential decay of polymer weights. This yields, for example, an -time sampling algorithm for the low-temperature ferromagnetic Potts model on bounded-degree expander graphs. Combining our results for the hard-core and Potts models with Markov chain comparison tools, we obtain polynomial mixing time for Glauber dynamics restricted to appropriate portions of the state space.
1 Introduction
The hard-core model from statistical physics is defined on the set of independent sets of a graph , where the independent sets are weighted by a fugacity . The associated Gibbs distribution is defined as follows, for an independent set :
[TABLE]
where is the hard-core partition function (also called the independence polynomial), is the set of independent sets of , and is the fugacity.
In applications, there are two important computational tasks associated to a spin model such as the hard-core model. Given an error parameter , an -approximate counting algorithm outputs a number so that , and an -approximate sampling algorithm outputs a random sample with distribution so that the total variation distance satisfies .
While classical statistical physics is most interested in studying the hard-core model on the integer lattice , the perspective of computer science is to consider wider families of graphs, such as the set of all graphs, all graphs of maximum degree , or all bipartite graphs of maximum degree .
Almost all proven efficient algorithms for approximate counting and sampling from the hard-core model work for low fugacities (the weak interaction regime, akin to the low temperature regime of the Potts model). In the high temperature regime there are at least three distinct algorithmic approaches to approximate counting and sampling: Markov chains, correlation decay, and polynomial interpolation. One striking advantage of the Markov chain approach is that the algorithms are much faster and simpler than the algorithms from the other approaches. In particular, it is common for a Markov chain sampling algorithm to run in time , e.g., see [13, 14], while typical running times for algorithms based on correlation decay [31, 25] and polynomial interpolation [1] are where is the maximum degree of the graph.
In general there are no known efficient algorithms at low temperatures (high fugacities), but recently efficient algorithms have been developed for some special classes of graphs including subsets of [18], random regular bipartite graphs, and bipartite expander graphs in general [20, 24]. What these bipartite graphs have in common is that for large enough , typical independent sets drawn from align closely with one side or the other of the bipartition (the two ground states). This phenomenon is related to the phase transition phenomenon in infinite graphs, and implies the exponentially slow mixing time of local Markov chains [5, 16, 26]. The algorithms introduced in [18] exploit this phenomenon by expressing the partition function in terms of deviations from the two ground states, and then using a truncation of a convergent series expansion (the Taylor series or the cluster expansion) to approximate the log partition function. In statistical physics this is called a perturbative approach, and while in general it does not work in the largest possible range of parameter space, when it does work it gives a very detailed probabilistic understanding of the model [28, 7, 10].
To apply the so-called perturbative approach at low temperatures, one rewrites the original spin model as a new model in which single spin interactions are replaced by the interaction of connected components representing deviations from a chosen ground state. Such models are known in general as abstract polymer models [22], see Section 1.1 for the polymer models we consider here, and have long been used in statistical physics to understand phase transitions. In this paper, we show that once a low-temperature spin model has been transformed into a polymer model, Markov chains once again become an effective algorithmic tool. Using this approach we obtain nearly linear and quadratic time sampling algorithms for low temperature models on expander graphs in cases where only -time algorithms were previously known.
1.1 Subset polymer models
Abstract polymer models, as defined by Kotecký and Preiss [22], are an important tool in studying the equilibrium phases of statistical physics models on lattices, see, e.g., [23, 7] among many others.111See also the relevant notion of ‘animal models’ by Dobrushin [10]. The paper [4] has a more detailed history of their use in statistical physics and combinatorics. Recently, polymer models have been used to develop efficient deterministic algorithms for sampling and approximating the partition functions of statistical physics models on lattices [18] and expander graphs [20, 24] at low temperatures, the regime in which Markov chains like the Glauber dynamics are known to mix slowly.
We will study the following class of abstract polymer models, known as subset polymer models (defined by Gruber and Kunz [17]). We begin by describing the relevant polymers: start with a finite host graph and a set of spins. For each vertex , there is a ground-state spin . A polymer consists of a connected set of vertices together with an assignment of spins from to each vertex (we abuse notation and use to denote both the polymer and the associated set of vertices). The size of a polymer, , is the number of vertices in . The set of all polymers is .
A polymer model on consists of a set of ‘allowed’ polymers, and a non-negative weight for each polymer . We denote this model by . Two polymers and are called ‘compatible’ (written ) if their distance in the host graph is at least ; otherwise they are ‘incompatible’ (written ). The state space of allowable configurations is .
The partition function of the polymer model is
[TABLE]
where the empty set of polymers contributes to the partition function. The Gibbs measure is the probability distribution on given by
[TABLE]
Note that the polymer model is in fact a hard-core model on the ‘incompatibility graph’ of , where two polymers are joined by an edge if they are incompatible, with non-uniform fugacities given by the weights . The geometry inherited from the host graph and the sizes of the polymers adds additional structure to the model.
Example 1**.**
One instance of a polymer model is the hard-core model itself: polymers are single vertices of the graph , labeled with ‘1’ (for occupied) against a ground state ‘0’ (for unoccupied). Each polymer (vertex) comes with the weight function . Then the set of allowable polymer configurations is exactly the set of independent sets of , and so the polymer model partition function is exactly the partition function of the hard-core model on .
Example 2**.**
A second instance of a polymer model is related to the ferromagnetic -color Potts model on a graph (see Definition 8 below). Fix a color to be the ground state color, and define polymers to be connected subgraphs of of size at most , with vertices labeled by the remaining colors . A polymer has weight function where is the number of bichromatic edges in plus the size of the edge boundary of in . A configuration of compatible polymers maps to a unique Potts configuration in which all connected components of non--colored vertices have size at most , and the weight of in the Potts model is exactly the product of the weight functions of the polymers. The polymer model partition function , with an appropriate choice of , represents the contribution to the Potts model partition function of colorings where color ‘dominates’, see also Section 1.3 for more details.
As with the hard-core model, there are two main computational problems associated to a polymer model: approximate sampling from and approximate counting of . We will approach them both via Markov chain algorithms. In general we will be interested in families of polymer models defined on classes of graphs. We denote such a family , where for each graph , is a polymer model. We will always use to denote the number of vertices of a graph .
We consider two conditions on the weight functions and give their algorithmic consequences.
Definition 1**.**
A polymer model satisfies the polymer mixing condition if there exists such that
[TABLE]
for all and all .
We postpone the formal definition of mixing time to Section 2.2 and state our first main result here.
Theorem 2**.**
Suppose that a polymer model satisfies the polymer mixing condition (2). Then for each there is a Markov chain making single polymer updates with stationary distribution and mixing time .
Theorem 2 on its own does not guarantee an efficient algorithm for sampling from because the Markov chain only yields an efficient sampling algorithm if we can implement each step efficiently. We will show that under a stronger condition we can do this.
Definition 3**.**
A polymer model is said to be computationally feasible if, for each and each , we can determine, in time polynomial in , whether , and compute if it is.
Definition 4**.**
A polymer model with spins on a class of graphs of maximum degree satisfies the polymer sampling condition with constant if
[TABLE]
for all and all .
We have the following theorem.
Theorem 5**.**
If a computationally feasible polymer model satisfies the polymer sampling condition (3) then for all there is an -approximate sampling algorithm for with running time .
Note that the polymer sampling condition required by Theorem 5 is in general more demanding than the zero-freeness required by cluster-expansion algorithms, but, as Theorem 5 demonstrates, it leads to faster algorithms. Finally, we can use the sampling algorithm and simulated annealing to give a fully polynomial time randomized approximation scheme (FPRAS) for computing the partition function of polymer models.
Theorem 6**.**
If a computationally feasible polymer model satisfies the polymer sampling condition (3) then for all there is a randomized -approximate counting algorithm for with running time and success probability at least .
Fernández, Ferrari, and Garcia [15] introduced a condition very similar to the polymer mixing condition in the setting of polymer models on . Their objective was to derive probabilistic properties of polymer models directly, without going through the combinatorics and complex analysis inherent in the cluster expansion for the log partition function. They introduced a continuous time stochastic process whose stationary distribution was the infinite volume Gibbs measure of their polymer model and their version of condition (2) implied an exponentially fast rate of convergence of this process. They remarked that such an approach had the potential to be an efficient computational tool.
Here we take an algorithmic point of view, and use the polymer mixing and sampling conditions to show that a simple discrete time Markov chain mixes rapidly and can be used to design efficient sampling and approximation algorithms. Our approach differs from that of [15] in that while they are interested primarily in the probabilistic properties of spin models on , we are interested in algorithmic problems involving spin models on general families of graphs. Our setting of discrete time processes on finite graphs is also more suitable to studying algorithmic questions. Our work confirms the central point of [15]: that complex analysis and absolute convergence of the cluster expansion is not necessary to derive many important properties of a polymer model.
1.2 Applications
We apply our results for subset polymer models to two specific examples: the ferromagnetic Potts model and the hard-core model on expander graphs. To state these results we need some definitions.
Definition 7**.**
Let . A graph is an -expander graph if for all with , we have , where and is the number of edges exiting the set .
Definition 8**.**
The -color ferromagnetic Potts model with parameter is a random assignment of colors to the vertices of a graph defined by
[TABLE]
where is the number of bichromatic edges of under the coloring and is the Potts model partition function. The parameter is known as the inverse temperature.
Jenssen, Keevash, and Perkins [20] gave an FPTAS and polynomial-time sampling algorithm for the Potts model on expander graphs, with an algorithm based on the cluster expansion and Barvinok’s method of polynomial interpolation. Under essentially the same conditions on the parameters we give a Markov chain based sampling algorithm with near linear running time.
Theorem 9**.**
Suppose , are integers and is a real. Then for and any , there is an -approximate sampling algorithm for the -state ferromagnetic Potts model with parameter on all -vertex -expander graphs of maximum degree with running time . There is also an -approximate counting algorithm with running time and success probability at least 3/4.
Note that, if the desired error satisfies , then we can simply compute the partition function by brute force in poly time. This observation combined with the above result gives an FPRAS, but we can no longer guarantee a running time of for exponentially small values of . A similar point also applies to the algorithm that we give for the hard-core model.
Definition 10**.**
Let . A bipartite graph with bipartition is a bipartite -expander if, for and all where , we have where denotes the set of vertices that are adjacent to some vertex in .
Again we give a fast Markov chain based algorithm for sampling from the hard-core model for essentially the same range of parameters for which an FPTAS is given in [20].
Theorem 11**.**
Suppose is an integer and is a real. Then for any and , there is an -approximate sampling algorithm for the hard-core model with parameter on all -vertex bipartite -expander graphs of maximum degree . There is also an -approximate counting algorithm for the hard-core model with success probability at least . Both algorithms run in time .
The extra factor in the running time of the sampling algorithm for the hard-core model as compared to the Potts model is due to the fact that the hard-core model on a bipartite graph does not in general exhibit exact symmetry between the ground states, and so we must approximate the partition functions of the even and odd dominant independent sets to sample.
We can extend these algorithms to obtain fast sampling algorithms in most situations in which a counting problem can be put in the framework of subset polymer models. For instance, we can use Theorems 5 and 6 to improve the running times of the algorithms given by [21, 24] for sampling and counting proper -colorings in -regular bipartite graphs (for large ). The two papers give slightly different polymer models for proper -colorings on -regular bipartite graphs — see [21, Section 5] and [24, Section 5.2]. Section 5.2 of [24] shows that their polymer model is computationally feasible. Section 5.1 of [21] shows that their polymer model satisfies the Kotecký-Preiss condition — in fact, their proof establishes the polymer sampling condition (3). It is easy to see (by comparing the polymer weights) that the polymer model of [24] therefore also satisfies the polymer sampling condition. Thus, we get the following corollary of Theorem 5 and 6.
Corollary 12**.**
There is an absolute constant so that for all even , all and all , there is an -approximate sampling algorithm to sample a uniformly random proper -coloring from a random -regular bipartite graph running in time . Furthermore, there is a randomized -approximation algorithm for the number of proper -colorings with running time and success probability at least . For odd , there are -approximate counting and sampling algorithms that both run in time .
As with independent sets, the extra factor in the running time for odd comes from the fact that the ground states (colorings in which one side of the bipartition is assigned colors and the other side colors) are exactly symmetric only if is even.
Finally, we remark that the approximate counting algorithms for these applications based on truncating the cluster expansion can run faster than if the parameters (expansion, fugacity, inverse temperature) are high enough (see [21, Theorem 8]), but the sampling algorithms derived from this approach will not match the or sampling algorithms we obtain here.
1.3 Comparison to spin Glauber dynamics
A very natural idea to sample at low temperatures (large for the Potts model, large for the hard-core model) is to use a single-spin update Markov chain like the Glauber dynamics, but to start in one of the ground states of the model chosen at random. For example, pick one of the -colors with equal probability then start the Potts model Glauber dynamics in the monochromatic configuration with that color. The intuition is that the Glauber dynamics will mix well within the portion of the state space close to the chosen ground state, and the randomness in the choice of ground state will ensure that an accurate sample from the full measure is obtained. Analyzing this algorithm was suggested in [18] and [20].
While we are not yet able to show that this algorithm succeeds, we make partial progress. We show that Glauber dynamics, restricted to remain in a portion of the state space, mixes rapidly (in polynomial time). It is easiest to state our result for the ferromagnetic Potts model.
For a ground state color and an integer , let be the set of -colorings of the vertices of so that every connected component of colored with the palette of colors is of size at most . The set consists of colorings that come from the valid polymer configurations from Example 2 above. In [20] it is shown that for an appropriate choice of , the set forms an “almost partition” of the set of all colorings, in that the weight of both the overlap of the almost partition and the set of colorings uncovered by the almost partition is at most under the conditions of Theorem 9. In particular, an -approximate sample from the Potts model restricted to for is enough (by symmetry) to obtain a -approximate sample from the Potts distribution (cf. Lemma 28). Using Markov chain comparison, we show in Section 5.3.1 that this can be done using the usual spin Glauber dynamics restricted to remain in .
Theorem 13**.**
Suppose , are integers and is a real. Let be a real number and . Then, for any -vertex -expander graph of maximum degree and any , for the Glauber dynamics restricted to has mixing time polynomial in and .
We remark that the polynomial bound in Theorem 13 depends on exponentially, see the relevant Theorem 25 and Section 5.3.1 for details. Theorem 13 shows that despite exponentially slow mixing of the Glauber dynamics on the full state space [3], it can still be used to obtain a polynomial-time approximate sampling algorithm. We leave for future work two important extensions that would complete the picture: 1) showing that unrestricted Glauber dynamics starting from a well chosen configuration works 2) lowering the running time to from the large polynomial we obtain in the theorem.
In Section 5, we state a general theorem (Theorem 25) comparing the polymer model dynamics to spin model dynamics as well as a specific result for the hard-core model (Theorem 29).
2 Polymer models and Markov chains
Here we compare various conditions on the weight functions of a polymer model, namely the Kotecký–Preiss [22] condition and the polymer sampling condition, and show that the latter implies the former. Then, we define the polymer Markov chain which we use to prove Theorems 2 and 5.
2.1 A comparison of the conditions on the weights
Here we show that the polymer sampling condition (3) implies the well-known Kotecký–Preiss [22] condition:
[TABLE]
To see the implication, we use a lemma of Borgs, Chayes, Kahn, and Lovász.
Lemma 14** ([6]).**
Let have maximum degree and let . The number of connected induced subgraphs of of size containing is at most .
Now consider a polymer model satisfying (3) with constant . Fix . We have that
[TABLE]
In order to account for all of the polymers that we sum over in the above, we consider the connected induced subgraphs of of size that contain , and the assignments to them of colours. Using Lemma 14, we therefore obtain that
[TABLE]
so the Kotecký–Preiss condition is satisfied.
The Kotecký–Preiss condition, in turn, implies the polymer mixing condition (2) with since for . For the same reason (since gets much bigger than ), it is easy to see that the polymer mixing condition is weaker than the Kotecký–Preiss condition.
2.2 The polymer Markov chain
For each , let denote the collection of all polymers containing and let . By applying (2) to the smallest containing we have for all . Define the probability distribution on by for and .
The polymer dynamics on are defined by the following transition rule from a configuration to a configuration :
Polymer Dynamics
Choose uniformly at random. Let if and let otherwise. Note that is well defined since can have at most one polymer containing . 2. 2.
Mutually exclusively do the following:
- •
With probability , let .
- •
With probability , sample from , set if this is in and set otherwise.
Note that the polymer dynamics are aperiodic, since there are self-loops, and irreducible since we can transition from any to any (e.g., via the empty set). Since the polymer dynamics are finite, irreducible, and aperiodic, they are also ergodic. Next, we observe that the stationary distribution of the polymer dynamics is by checking detailed balance. Note that each transition of the dynamics changes a configuration by at most one polymer ; let . Then
[TABLE]
where is the transition matrix of the polymer dynamics, and so is the stationary distribution.
We now formally define the mixing time. If is an ergodic Markov chain with transition matrix and stationary distribution then the mixing time of from a state is given by
[TABLE]
where denotes the total variation distance between distributions and . The mixing time of is given by . We will write below if we need to emphasize which Markov chain we refer to.
2.3 Proof of Theorems 2 and 5
See 2
Proof.
We will show that under condition (2) the mixing time of the polymer dynamics is by applying the path coupling technique. We define a metric on by setting if or for a polymer and extending this as a shortest path metric; i.e., for any where denotes the symmetric difference of two sets.
Now suppose we couple two chains and by attempting the same updates in both chains at each step. Suppose that for some polymer . With probability we pick and remove which yields . On the other hand, we may attempt to add a polymer so that . That is, and . This occurs with probability and in this case . Putting these together we can bound
[TABLE]
Using (2) we have and so
[TABLE]
By the path coupling lemma (see [12, Section 6]), and with denoting the diameter of under , we have that the mixing time is at most , using that . This finishes the proof. ∎
To prove Theorem 5 we will show that a single update of the polymer dynamics can be computed in constant expected time. Assume that the polymer model is computationally feasible and that the polymer sampling condition (3) holds with constant . We will use the following algorithm. Let and let .
Single polymer sampler
Choose according to the following geometric distribution: for a non-negative integer,
[TABLE]
This gives . 2. 2.
Enumerate all polymers in and compute their weight functions. 3. 3.
Mutually exclusively output with probability , and with all remaining probability output . In particular if , then output with probability .
In order to show that this algorithm has constant expected running time, we will require the following result on enumerating connected subgraphs of bounded degree graphs.
Lemma 15** ([27] Lemma 3.7).**
Let have maximum degree and let . There is an algorithm running in time that outputs a list of all connected subgraphs of of size at most containing .
We now proceed to prove the following lemma.
Lemma 16**.**
Under the polymer sampling condition (3) the output distribution of the single polymer sampler is . Further, assuming the polymer model is computationally feasible, the expected running time of the sampler is constant.
Proof.
We first show that the probabilities sum to less than , which shows the last step of the sampling algorithm is well defined. Since ,
[TABLE]
We next show that the output of the algorithm has distribution . Given , to output we must choose . This happens with probability by the distribution of . Conditioned on choosing such a , the probability we output is , and multiplying these probabilities together gives as desired. Since this is true for all , the output distribution is exactly .
Finally we analyze the expected running time assuming that the model is computationally feasible. To do this, we observe that by Lemma 15, conditioned on the event that the enumeration step of the algorithm takes time , and the time to determine which polymers are allowed and computing their weights is for some , since the polymer model is computationally feasible; here, the factor accounts for the time to determine whether a single polymer of size is ‘allowed’ and to compute its weight. Therefore, the expected running time is
[TABLE]
where . ∎
Finally we prove Theorem 5. See 5
Proof.
By Theorem 2, there is there is an integer (independent of ) so that if we start with the empty configuration and run the polymer dynamics, then has distribution within total variation distance of . By Lemma 16, there is an integer (independent of ) such that the expected number of steps required to perform one update of the polymer dynamics is at most . To compute an -sample from , we repeat the following times, independently, and if no configuration is returned we return the empty configuration. Run the polymer dynamics for steps starting from , and if at least updates of the polymer dynamics were executed, return .
We next show that the probability that the algorithm does not timeout and return the empty configuration is at least , which therefore yields that the output distribution has total variation distance at most from . Let denote the total number of steps required to execute updates of the polymer dynamics, and note that . By Markov’s inequality, it follows that . Thus, the probability that for each of independent copies of , is less than . ∎
3 Approximate counting algorithm
In this section we show how to use a sampling oracle to approximately compute the partition function of the polymer model. One standard way is by self-reducibility. In [18] an efficient sampling algorithm for polymer models is derived from an efficient approximate counting algorithm by applying self-reducibility on the level of polymers. While we could apply polymer self-reducibility in the other direction to obtain counting algorithms from our sampling algorithm, here we use the simulated annealing method instead (see [2, 19, 30]) to obtain a faster implementation of counting from sampling.
Suppose that is a computationally feasible polymer model. Let be a parameter and define a weight function
[TABLE]
for all . Then for each this defines a computationally feasible polymer model on , where setting recovers the original model . If the original model satisfies the polymer sampling condition (3), then so does for every as the weight function is monotone decreasing in .
Given the graph , we write the partition function of the polymer model as a function of :
[TABLE]
The associated Gibbs distribution is denoted by . Since , we have (only the empty configuration contributes to this limit), and so we will use simulated annealing to interpolate between and our goal , assuming access to a sampling oracle for for all . To apply the simulated annealing method, roughly speaking, we find a sequence of parameters called a cooling schedule where , and then estimate using the telescoping product
[TABLE]
To estimate each term , we define independent random variables
[TABLE]
It is straightforward to see that (see Lemma 17). Using the sampling oracle for , we can sample for all , and by taking the product we get an estimate for .
The key ingredient of simulated annealing is finding a good cooling schedule. There are nonadaptive schedules [2] that depend only on , and adaptive schedules [19, 30] that also depend on the structure of . Usually the latter leads to faster algorithms than the former. In this paper we will use a simple nonadaptive schedule: for where . We will show that this cooling schedule already gives us a fast algorithm for the polymer model. The reason behind it is that the weight function decays exponentially fast, and so (see Lemma 18) the partition function is bounded by a constant when , leading to a short cooling schedule.
Our algorithm is as follows.
Polymer approximate counting algorithm
Let for where ; 2. 2.
For where :
- (a)
For :
- (i)
Sample from ; 2. (ii)
Let ; 2. (b)
Let ; 3. 3.
Let and output .
Before proving Theorem 6, we first present a few useful lemmas. We shall use for as our cooling schedule and we further define though it does not appear in the algorithm. For independently we define to be a random sample from and . Finally, we let .
Lemma 17**.**
For ,
[TABLE]
Therefore,
[TABLE]
Proof.
In the proof, we use to denote . We deduce from the definition of that
[TABLE]
and that
[TABLE]
Since are mutually independent, we obtain
[TABLE]
and
[TABLE]
Lemma 18**.**
Suppose that for all . Then we have
[TABLE]
Proof.
It is trivial that since has weight . Meanwhile, we have the crude bound
[TABLE]
We then deduce that
[TABLE]
where (a) follows from Lemma 14 and (b) from . ∎
Lemma 19**.**
We have
[TABLE]
Proof.
Since the weight function is decreasing in , the partition function is also decreasing, which implies . On the other hand, recalling Lemma 17, we have
[TABLE]
where is sampled from . Notice that for any we have
[TABLE]
Thus, the lemma follows. ∎
We are now ready to prove Theorem 6 which we restate for convenience.
See 6
Proof.
We first assume that we have access to an exact sampler that samples from for all . Using this sampler in the Polymer approximate counting algorithm, we find that, for each and each , is an exact sample from the distribution and hence is an exact sample of , independently for every and . Thus, is a sample of independently for every , and is the sample mean of ’s. We deduce from Lemmas 17 and 18 that
[TABLE]
and
[TABLE]
where we use and for all . Then
[TABLE]
By Chebyshev’s inequality we have
[TABLE]
where the second to last inequality follows from Lemmas 17 and 19:
[TABLE]
Thus, we deduce that
[TABLE]
so the error probability is at most . Note that the number of samples that we used is .
Now we replace the exact sampling oracle by an approximate one. For every , the polymer model is computationally feasible and satisfies the polymer sampling condition (3). Thus, for any , Theorem 5 gives a randomized algorithm that outputs a -approximate sample from . We then couple and optimally and run the algorithm with both and simultaneously, so that for any samples from and for coincide with probability at least . Let be the event that at least one of the samples from in the algorithm does not couple with that from . Then a union bound yields . Let be the event that the algorithm using fails. From our argument before we see that . Note that if neither of and happens, then the algorithm with will output a desired estimate. Hence, we conclude from the union bound that the algorithm with fails with probability at most
[TABLE]
Finally, we consider the running time of our algorithm. By Theorem 5 the running time of step 2(a)(i) is , and for step 2(a)(ii) the running time is . Thus, the running time of the algorithm is upper bounded by . ∎
4 Applications
Here we apply our results on subset polymer models to several approximate counting and sampling problems at low temperatures.
4.1 Ferromagnetic Potts model
In this section, we prove Theorem 9 for the Potts model. Throughout this section, we will work under the assumptions/conditions of Theorem 9. That is, we fix a real number , integers and and a real number . We let be the class of -expander graphs with maximum degree at most .
Consider the polymer model defined in Example 2 on an -vertex graph with and ground state color . We will use to denote the polymers and to denote the weight of a polymer ; recall that , where counts the number of external edges of plus the number of bichromatic internal edges. Let be the partition function of the polymer model .
Lemma 20**.**
Under the conditions of Theorem 9, the polymer model satisfies the polymer sampling condition (3) with .
Proof.
Since every is an -expander, for we have and hence . ∎
The following lemma is from [21].
Lemma 21** ([21, Lemma 12]).**
For any -vertex -expander graph and , is an -approximation of the Potts partition function .
We are now ready to prove Theorem 9. See 9
Proof.
Let be the class of -expander graphs of maximum degree at most . Clearly, the polymer models are computationally feasible. By Lemma 20, the models also satisfy the polymer sampling condition and therefore Theorems 5 and 6 apply. Consider any -vertex graph . Since , Lemma 21 applies to .
For the sampling algorithm, we pick a color uniformly at random and generate an -approximate sample from the Gibbs measure associated to using the algorithm of Theorem 5, in time . By Lemma 21, we conclude that the resulting output is an -approximate sample for the Potts model.
For the counting algorithm, we pick an arbitrary and produce using the algorithm of Theorem 6 a number in time , which is an -approximation to with probability . By Lemma 21, we conclude that is an -approximation for the partition function of the Potts model (with the same probability). ∎
4.2 Hard-core model
In this section, we prove Theorem 11 for the hard-core model.
Suppose is an -vertex bipartite -expander graph of maximum degree . We will consider the hard-core model on at sufficiently large fugacities . There are two relevant ground states corresponding to the two parts of , one is the independent set given by and the other is given by . We will capture deviations from the two ground states using the “even” and “odd” polymer models of Jenssen, Keevash and Perkins [21]. We remark that similar models were considered independently by Liao, Lin, Lu, and Mao [24].
For , we say a set is small if . In particular, Definition 10 requires that small sets expand.
Following [21], we will define a polymer model ; note that the host graph222 is the graph on vertex set , where two vertices are connected if their distance in is at most 2. is , rather than . The set of allowed polymers consists of all small sets which are connected subgraphs in . The set of spins is and the ground state spin for a vertex is if , and if ; the spin assignment for a polymer gives the spin to each . The weight of a polymer is defined as
[TABLE]
where we recall that denotes the set of vertices in which are adjacent to some vertex in . The key observation behind the definition of the weights is that for a set of compatible polymers from , the contribution to of all independent sets with is exactly
[TABLE]
see [21, Proof of Lemma 19] for more details.
Let denote the partition function of the polymer model (where two polymers are compatible if their distance in the host graph is at least 2). Using that is an -expander, we have the following lemma from [21].
Lemma 22** ([21, Lemma 19]).**
For any and any -vertex graph which is a bipartite -expander, the number
[TABLE]
is an -approximation of the hard-core partition function .
In particular, [21, Lemma 17] shows that counts the contribution to of every independent set of , but some independent sets are double counted: those independent sets for which the -connected components of and are all small. We call these independent sets sparse. The proof of [21, Lemma 19] shows that the relative contribution to of sparse independent sets is at most .
We are now ready to prove Theorem 11. See 11
Proof.
First note that , so Lemma 22 applies. Let denote the set of host graphs corresponding to bipartite -expanders of maximum degree . Noting that the polymer models are computationally feasible, we verify the polymer sampling condition (3) for them. Fix arbitrary . As in [20, Section 4.2], we have the bound
[TABLE]
so, using that , we have that the models satisfy the polymer sampling condition with . Therefore, we may also apply Theorems 5 and 6.
For the counting algorithm, we apply Theorem 6. Namely, by taking the median of trials, we can obtain and which are -approximations to and , respectively, with probability at least . Let be the event that and are indeed -approximations to and . Conditioned on , the number
[TABLE]
is an -approximation to the number . By Lemma 22 and since , is an -approximation to and hence is an -approximation to . Since occurs with probability at least , we obtain that is the desired approximation for the counting algorithm.
For the sampling algorithm, let be the random variable which takes the value 0 with probability and the value 1 otherwise, where are the quantities computed earlier. Then, use Theorem 5 to obtain an -approximate sample from the Gibbs distribution corresponding to the polymer model , say . Obtain then an independent set by including into each with probability and each vertex in (with probability 1). We claim that the output distribution of is -close to the hard-core distribution .
To prove this, consider the random independent set obtained by repeating the same steps above but using instead perfectly accurate computations, i.e., pick with probability and the value 1 otherwise, then, sample (perfectly) from the Gibbs distribution corresponding to the polymer model , and then obtain the independent set by including into each with probability and each vertex in (with probability 1). Then, if is not sparse, is generated with probability (cf. the observation below (4)). On the other hand, if is sparse, then is generated with probability . But by Lemma 22 and the remark following, the total variation distance between the distribution of and the hard-core distribution is bounded by the relative weight of the sparse independent sets, which, by Lemma 22, is at most .
We next observe that, conditioned on the event (i.e., that and are -approximations to and ), there is a coupling between and such that with probability at least . Indeed, the total variation distance between and is at most and hence there is a coupling of with so that with probability at least . Analogously, there is a coupling of with so that with probability at least . Since occurs with probability at least , it follows that the overall total variation distance between and is at most .
Hence, the output distribution of is -close to the hard-core distribution , finishing the proof of Theorem 11. ∎
5 Comparison to spin Glauber dynamics
In this section, we derive results for spin Glauber dynamics, restricted to appropriate sets in the state space, based on our results above (using fairly standard Markov chain comparison techniques). We start with the general framework of subset polymer models and obtain Theorem 25, which is then applied to the ferromagnetic Potts and hard-core models.
5.1 Restricted Glauber dynamics for polymer models
Here, we define the restricted Glauber dynamics for subset polymer models, and show the upcoming Theorem 25 which bounds its mixing time under some appropriate conditions.
Consider a subset polymer model as in Section 1.1. There is a natural map between allowed polymer configurations and spin configurations, given by if and and if . Let be the spin configurations obtainable as images of the map . It will be helpful to consider the inverse map and extend its domain to all , so that is the polymer configuration consisting of polymers that are connected components of vertices which do not receive their ground state spin; note that the range of the extended is not limited to anymore.
Restricted Glauber dynamics is defined as follows, starting from .
Choose and uniformly. 2. 2.
is formed from by assigning to spin (formally, by letting , forming from by assigning to spin , and finally letting ). 3. 3.
If let .
- •
With probability , .
- •
With probability , . 4. 4.
If then .
We will use the Markov chain comparison technique to show that the restricted Glauber dynamics is rapidly mixing. To do this, we need a mild condition on the set of allowed polymers . A polymer model is said to be single-update-compatible if, for every size- polymer , there is an ordering of the vertices in such that, for all , the set induces a connected subgraph of and we have that is a valid polymer itself, i.e., .
We will use the comparison method of Diaconis and Saloff-Coste [8, 9] as applied to mixing times by Randall and Tetali [29]. In order to avoid discussion of eigenvalues here, we use the version from Observation 13 of the survey paper [11]. We first show that the restricted Glauber dynamics is a reversible ergodic Markov chain with stationary distribution , which is easy to see from its definition.
Lemma 23**.**
Let be a graph and let be a single-update-compatible polymer model. The restricted Glauber dynamics is ergodic and reversible with stationary distribution .
Proof.
The restricted Glauber dynamics is aperiodic since we remain in the same state with positive probability after performing an update. It is irreducible since we can transition from any to any by adding and removing vertices one-by-one. This shows that the restricted Glauber dynamics is ergodic. To show that it is reversible and has stationary distribution , we check detailed balance. Suppose with and where is the transition matrix of the restricted Glauber dynamics. Then,
[TABLE]
The lemma follows. ∎
We next give some standard definitions that will be used in our comparison proof. Let denote the restricted Glauber dynamics and be its transition matrix. Let be the polymer dynamics and denote its transition matrix by . Define to be the set of pairs of configurations that can be achieved by one transition of the restricted Glauber dynamics; i.e., . Similarly, define for the polymer dynamics.
For every , we define a path from to to be a sequence of configurations such that every adjacent pair is a transition of the restricted Glauber dynamics; i.e, every adjacent pair of configurations is in . For this, we assume that the polymer model is single-update-compatible (see Section 1.3). If , then the choice is easy — we let . Suppose instead that for some polymer . Recall that there is a natural one-to-one mapping between the set of all (polymer) configurations and the set of spin configurations . Let and be the corresponding spin configurations. If has size , let be the ordering of vertices of from the definition of single-update-compatible so that, for all , the polymer induced by vertices is in . Let be the sequence of spin configurations such that each is obtained from by changing the spin of from to . The path is then defined to be . If for some , we can define the path in a similar manner. Note that in both cases the length of the path is .
For every , the congestion of the edge is defined to be
[TABLE]
The congestion of the choice of paths is the quantity
[TABLE]
The following comparison lemma gives an upper bound on the mixing time of the restricted Glauber dynamics by the mixing time of the polymer dynamics.
Lemma 24** ([11, Observation 13]).**
Let and . Then, for any we have
[TABLE]
We now proceed to establish the mixing-time of the restricted Glauber dynamics, which is the main result of this section. We will apply this to both the hard-core model (on bipartite -expander graphs) and the ferromagnetic Potts model (on -expander graphs), for which we will define appropriate single-update compatible polymer models. Furthermore, in both of these applications, below will be logarithmic in , giving polynomial mixing time for the restricted Glauber dynamics.
Theorem 25**.**
Suppose that a polymer model satisfies the polymer mixing condition. Consider a graph such that is single-update-compatible. Let . Suppose that, for every pair of configurations whose corresponding spin configurations differ at exactly one vertex, we have
[TABLE]
for some constant . Then for any , the restricted Glauber dynamics has mixing time
[TABLE]
Proof.
By Lemma 24, it suffices to upper bound the congestion for every where
[TABLE]
If , then for our choices of paths to get we must have . It follows that
[TABLE]
since by the update rule of the restricted Glauber dynamics.
Now suppose . Let and be the corresponding spin configurations. Notice that and differ at exactly one vertex, which we denote by . If and then no path would contain by our choice of paths, and thus . Assume next that and . Then, if for some , we must have for some polymer and also . Moreover, the spin configurations are all the same outside . This implies that the number of such paths is upper bounded by the number of polymers containing .
Now fix some such that and assume that for some polymer with . Then,
[TABLE]
As the path is obtained by changing the spins vertex by vertex in the corresponding spin configurations, and differ at most vertices. The condition of the theorem implies that
[TABLE]
The update rule of the restricted Glauber dynamics gives
[TABLE]
and for the polymer dynamics we have
[TABLE]
Let denote a polymer on with a spin from . Then, the polymer mixing condition implies that for some . Combining this and inequalities (6), (7), (8) and (9), we get
[TABLE]
For the case where and , the proof is almost the same and we can get the same bound. Thus,
[TABLE]
The theorem then follows from Theorem 2 and Lemma 24 once we notice that and that for all . ∎
5.2 Truncated polymer model
The bound on the mixing time of the restricted Glauber dynamics in Theorem 25 is exponential in the size of the largest polymer which is in general undesirable. For example, in our applications in Section 4, was linear in the number of vertices of the host graph. Here, we show that, under the polymer sampling condition, we can restrict our attention to polymers of size in the sense that the partition function as well as the Gibbs distribution of the truncated polymer model are close to those of the original polymer model.
Let be a polymer model on a graph . For , define the truncated polymer model by
[TABLE]
Also we let
[TABLE]
be the set of allowed configurations (note that ). The partition function of the truncated polymer model is given by
[TABLE]
The corresponding Gibbs distribution on is defined by . We remark that if the original polymer model satisfies the polymer sampling condition then so does the truncated polymer model, and thus Theorem 5 also applies to the truncated model.
The following lemma asserts that the Gibbs distribution and the partition function of the truncated polymer model are close to those of the original model , provided that the polymer sampling condition (3) holds.
Lemma 26**.**
Let be a family of graphs of maximum degree at most and let be a polymer model that satisfies the polymer sampling condition (3) with constant . Let be an -vertex graph from . Then for any and , we have
[TABLE]
Moreover, the total variation distance between and is at most .
Proof.
Note that follows immediately from . For , let and let
[TABLE]
Let be the collection of all polymers of size greater than . Notice that for each we have the crude bound
[TABLE]
Combining (10) and (11), we obtain that
[TABLE]
Since for all real , we have that
[TABLE]
The last inequality follows from the fact that satisfies the polymer sampling condition with constant . Then we deduce from Lemma 14 that
[TABLE]
and since , we get . It follows that
[TABLE]
Combining (12), (13), (14), and (15) yields , as needed. Finally, we bound the total variation distance between and :
[TABLE]
where the first equality is because if and only if , for which we have . This finishes the proof. ∎
5.3 Applications
In this section, we apply the previous results to show that (spin) Glauber dynamics for the ferromagnetic Potts and hard-core models mix in polynomial time on expander graphs, when restricted to configurations close to the ground states (which, as we have already seen, constitute the main portion of the probability space at low temperatures).
5.3.1 Restricted Glauber for ferromagnetic Potts
In this section, we prove Theorem 13 for the -color ferromagnetic Potts model. Throughout this section, we will work under the assumptions/conditions of Theorem 9. That is, we fix a real number , integers and and a real number . We let be the class of -expander graphs with maximum degree at most .
Let be an -vertex graph in and let be a value in . As in Section 4.1, we will consider the polymer model whose polymers are connected subgraphs of with at most vertices, which are labeled by the remaining colors . In fact, following Section 5.2, we will work with a truncation of this model. Namely, for , let be the polymer model on restricted to polymers of size at most .
Observation 27**.**
For every , the set , as defined in Section 1.3, is precisely the set of allowable polymer configurations in the truncated polymer model .
See 13
Proof.
We let be the class of -expanders with maximum degree at most . For the given -vertex graph , let be the Gibbs distribution of the polymer model .
By Lemma 20, we have that satisfies the polymer sampling condition with and hence so does the truncated polymer model . The result therefore follows by applying Theorem 25, after observing that (i) the polymer model is single-update-compatible (use DFS ordering), and (ii) for a pair of polymer configurations whose corresponding spin configurations differ at a vertex, we have
[TABLE]
where (since has maximum degree , changing the spin of a vertex can create at most new monochromatic/bichromatic edges). This finishes the proof. ∎
The following lemma justifies that the set with constitutes for all but of the aggregate weight of colorings in the Potts distribution on .
Lemma 28**.**
Suppose , are integers and is a real. Let be a real number and . Then, for any -vertex -expander graph of maximum degree and any , for , is an -approximation of the Potts partition function .
Proof.
Let be the partition function of the polymer model . Since and , by Lemma 21 we have that is an -approximation to . If is the class of -expanders with maximum degree at most then by Lemma 20, we have that satisfies the polymer sampling condition with and hence so does the truncated polymer model . It follows by Lemma 26 that, for , is an -approximation to . Therefore, is an -approximation to . ∎
5.3.2 Restricted Glauber dynamics for hard-core mixes in polynomial time
In this section, we state and prove the analogue of Theorem 13 for the hard-core model. In particular, let be an -vertex -expanding bipartite graph of maximum degree , and for and , let denote the independent sets whose deviations from the ground state consists of small connected components, more precisely, consists of connected components of size at most . Using similar methods to Section 5.3.1, we will show the following.
Theorem 29**.**
Fix , and . For any -vertex bipartite -expander with maximum degree at most and any and , with , the Glauber dynamics restricted to has mixing time polynomial in and .
As we shall see soon in the upcoming Lemma 32, and for large enough, the set for captures all but weight of the hard-core partition function and hence Theorem 29 can be used to obtain another polynomial time algorithm for the hard-core model on expanding graphs in that regime.
To prove Theorem 29, it will be simpler to work with somewhat different polymer models than those in Section 4.2. These models were originally used in [20] (the conference version of [21]). For , and following [20], we will define a polymer model . The host graph will be and the model will capture deviations from the ground state : a polymer will be a connected set of vertices in such that for some independent set . Specifically, the set of allowed polymers consists of all connected sets of vertices (in ) such that and for any , all of the neighbors of (in ) are also in . The set of spins is and the ground state spin for a vertex is if , and if ; the spin assignment for a polymer is given by for . The weight of a polymer is defined as
[TABLE]
The main observation behind the definition of the weight is that the weight of an independent set such that is .
Following again Section 5.2, it will be relevant to consider, for , the truncated polymer model whose polymers are of size at most ; observe that the set defined above is precisely the set of allowable polymer configurations in the truncated polymer model. We next verify the polymer sampling condition (3) for these models and conclude the proof of Theorem 29.
Lemma 30**.**
Fix and . Let be the class of bipartite -expanders with maximum degree at most . For and , the polymer model satisfies the polymer sampling condition (3) with .
Proof.
We have , so it suffices to show that for and it holds that . For we have that all of the neighbors of are also in and hence, by the -expansion of , we have that . This gives and therefore
[TABLE]
Proof of Theorem 29.
Let be the class of bipartite -expanders with maximum degree at most . Consider an -vertex graph and let be the Gibbs distribution of the model .
By Lemma 30, we have that satisfies the polymer sampling condition with and hence so does the truncated polymer model . The result therefore follows by applying Theorem 25, after observing that (i) the polymer model is single-update-compatible (use DFS ordering), and (ii) for a pair of polymer configurations whose corresponding independent sets differ in at most one vertex, we have
[TABLE]
Finally, we justify that, for large enough and , the aggregate weight of independent sets in captures all but fraction of the hard-core partition function . Let denote the partition function of the polymer model and denote the partition function of the polymer model . We will need the following lemma from [20].
Lemma 31** ([20, Lemmas 4.1 & 4.2]).**
For \lambda>\max\big{\{}(2e)^{\frac{8n}{\alpha n_{0}}},(2e)^{\frac{8n}{\alpha n_{1}}},(2e)^{(40/\alpha)}\big{\}}, the number is a -approximation of the hard-core partition function , where for .
Lemma 32**.**
Fix and . There exists a constant such that for the following holds for all -vertex bipartite -expander graphs of maximum degree at most . For all and , the number
[TABLE]
is an -approximation of the hard-core partition function , where for .
Proof.
Let for and observe that (using that is an -expander for , see [20] for details). Therefore, by taking large enough, we have that for all both Lemmas 30 and 31 apply. Let .
By Lemma 31, we have that is an -approximation to . By Lemma 30, we have that, for , satisfies the polymer sampling condition with and hence so does the truncated polymer model . (Here, as usual, we take to be the class of bipartite -expanders with maximum degree at most .) It follows by Lemma 26 that, for , is an -approximation to . Therefore, is an -approximation to . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Barvinok. Combinatorics and Complexity of Partition Functions . Algorithms and Combinatorics. Springer International Publishing, 2017.
- 2[2] I. Bezáková, D. Štefankovič, V. V. Vazirani, and E. Vigoda. Accelerating simulated annealing for the permanent and combinatorial counting problems. SIAM Journal on Computing , 37(5):1429–1454, 2008.
- 3[3] M. Bordewich, C. Greenhill, and V. Patel. Mixing of the Glauber dynamics for the ferromagnetic Potts model. Random Structures & Algorithms , 48(1):21–52, 2016.
- 4[4] C. Borgs. Absence of zeros for the chromatic polynomial on bounded degree graphs. Combinatorics, Probability and Computing , 15(1-2):63–74, 2006.
- 5[5] C. Borgs, J. T. Chayes, A. Frieze, J. H. Kim, P. Tetali, E. Vigoda, and V. H. Vu. Torpid mixing of some Monte Carlo Markov chain algorithms in statistical physics. In Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages 218–229, 1999.
- 6[6] C. Borgs, J. T. Chayes, J. Kahn, and L. Lovász. Left and right convergence of graphs with bounded degree. Random Structures & Algorithms , 42(1):1–28, 2013.
- 7[7] C. Borgs and J. Z. Imbrie. A unified approach to phase diagrams in field theory and statistical mechanics. Communications in mathematical physics , 123(2):305–328, 1989.
- 8[8] P. Diaconis and L. Saloff-Coste. Comparison techniques for random walk on finite groups. Ann. Probab. , 21(4):2131–2156, 1993.
