Fluctuation results for general block spin Ising models
Holger Kn\"opfel, Matthias L\"owe, Kristina Schubert, Arthur Sinulis

TL;DR
This paper analyzes a block spin mean-field Ising model, establishing large deviation principles, central limit theorems, and convergence rates for block magnetizations under general conditions.
Contribution
It provides the first comprehensive large deviation and CLT results for general block interaction matrices in the Ising model.
Findings
Proved Large Deviation Principles for block magnetizations.
Established Central Limit Theorems with convergence rates.
Analyzed high temperature regime behavior.
Abstract
We study a block spin mean-field Ising model, i.e. a model of spins in which the vertices are divided into a finite number of blocks with each block having a fixed proportion of vertices, and where pair interactions are given according to their blocks. For the vector of block magnetizations we prove Large Deviation Principles and Central Limit Theorems under general assumptions for the block interaction matrix. Using the exchangeable pair approach of Stein's method we establish a rate of convergence in the Central Limit Theorem for the block magnetization vector in the high temperature regime.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
spacing=nonfrench
Fluctuation results for general block spin Ising models
Holger Knöpfel
Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstraße 62, 48149 Münster, Germany
,
Matthias Löwe
Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstraße 62, 48149 Münster, Germany
,
Kristina Schubert
Fakultät für Mathematik, TU Dortmund, Vogelpothsweg 87, 44227 Dortmund, Germany
and
Arthur Sinulis
Fakultät für Mathematik, Universität Bielefeld, Postfach 100131, 33501 Bielefeld, Germany
Abstract.
We study a block spin mean-field Ising model, i. e. a model of spins in which the vertices are divided into a finite number of blocks with each block having a fixed proportion of vertices, and where pair interactions are given according to their blocks. For the vector of block magnetizations we prove Large Deviation Principles and Central Limit Theorems under general assumptions for the block interaction matrix. Using the exchangeable pair approach of Stein’s method we establish a rate of convergence in the Central Limit Theorem for the block magnetization vector in the high temperature regime.
Key words and phrases:
block spin Ising models, central limit theorem, large deviation principle, phase transition, Stein’s method
1991 Mathematics Subject Classification:
Primary 60F05, 60F10, Secondary 82B20
M.L.’s research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2044-390685587, Mathematics Münster: Dynamics - Geometry - Structure
A.S. acknowledges financial support by the German Research Foundation via the CRC 1283.
1. Introduction
Mean-field block models were introduced as an approximation of a lattice model of a meta-magnet, see e.g. formula (4.1) in [24]. Furthermore, they can arise in disordered systems with random pair interactions, studied for example in [32],[31],[9]. Later, they were rediscovered as interesting models for statistical mechanics systems, see [20], [17], [8], [27], [25], as well as models for social interactions between several groups, e.g. in [19], [1], [29]. This latter approach follows very much the social re-interpretation for one group of the Curie-Weiss model in [6] or of the Hopfield model in [10] or [26]. A third source of interest in mean-field spin block models is a statistical point of view. In [3], the authors gave another analysis of the bipartite mean-field Ising block model with equal block sizes, and asked the question whether one can recover the blocks from several observations from this model, and if so, how many observations are needed. In this aspect, the block spin models are related to the stochastic block models from random graph theory. These have been in the center of interest in statistics and probability theory over the past couple of years (see, e.g. [2], [21]). The statistical interest in them arises from their relation to graphical models. In this framework a major question is always how to reconstruct the block structure under sparsity assumptions (see e.g. [5], [28], [4]).
Our starting point is [27]. There, the fluctuations of an order parameter for a two-groups block model with equal block sizes were analyzed on the level of large deviations principles (LDPs, for short) and central limit theorems (CLTs). Starting from these results, there are several natural questions. First: Can these results be also proven for systems with not necessarily identical block sizes? Second: Can we generalize our results to the situation of more than two groups? And third: Can we give a speed of convergence for the CLT? The main goal of the current note is to (partially) answer these questions. To this end, we will present a new approach to mean-field block spin models, via the corresponding block interaction matrix. Moreover, to obtain a speed of convergence in the CLT, we will employ Stein’s method as in [14], [7] for the standard mean-field Ising, or Curie–Weiss model.
The rest of this note is organized in the following way. In the remaining part of this introduction, we define our model in a way that makes it accessible to our techniques in Sections 2 and 3, and state our main results. Section 2 is devoted to the proof of the LDP results. Afterwards, we analyze the critical points of the rate function and obtain the mean field equations, showing that in the high temperature case the only maximum is [math], whereas in the low temperature case there are nonzero maximizers, and we obtain a solution for a special class of block interaction matrices. In Section 3 we prove the CLT for the order parameter of the model in two ways. One uses the classical Hubbard–Stratonovich transformation. This was already used for proving the CLT for the magnetization in the Curie–Weiss model in [16], and also is the core technique for the CLT in [27]. The second proof uses a multivariate version of the exchangeable pair approach in Stein’s method, developed in [30]. Lastly, Section 4 contains a discussion of some of the results and further open questions.
1.1. The model
The block spin Ising model will be characterized by two quantities, a number – number of blocks – and a symmetric, positive definite matrix , which is the block interaction matrix. will determine the strength of interaction between two particles in block and respectively. Here, is the set of all by matrices with real entries.
Let be a strictly increasing subsequence of . For a system of size let be a partition of into blocks. Without loss of generality, we assume that the indices in the blocks are ordered, i.e. if and and , it follows . We call the block size of the -th block. Note that, in particular, we have a system of size , where for
[TABLE]
Define for each the matrix of the relative block sizes
[TABLE]
We assume that for each the limit
[TABLE]
exists, so that the matrix of asymptotic relative block sizes
[TABLE]
is invertible. If the partition blocks are asymptotically of the same size, i.e.
[TABLE]
we call this the uniform case. The block spin Ising model with blocks of sizes and block interaction matrix is defined as the Ising model with interaction matrix
[TABLE]
where is the matrix with all entries equal to . We denote this model by . More precisely, is the probability measure on defined by
[TABLE]
Here, of course, is the partition function
[TABLE]
Note that, contrary to the usual convention, we do not require the diagonal of to be zero for technical convenience. However, since , both and its “dediagonalized” version give rise to the same Ising model. Here and in the sequel, is a diagonal matrix with values on its diagonal. Lastly, for any and any matrix we define the operator norm
[TABLE]
1.2. Main results
We prove results on the fluctuations of the block magnetization vector on different scales. In what follows, we use the non-normalized and normalized versions of the block magnetization vector defined as
[TABLE]
Note that this allows us to rewrite the Hamiltonian of as
[TABLE]
which we use tacitly.
We begin by presenting the large deviation results. The first result is a generalization of [27, Theorem 2.1]. In that paper, an LDP for was proved in the situation of blocks of equal size. Here we analyze the general case.
Theorem 1.1**.**
Let and be a block interaction matrix. The sequence satisfies an LDP under with speed and rate function
[TABLE]
where
[TABLE]
and denotes the convex conjugate of , i.e.
[TABLE]
More precisely, in the notion of large deviations, the sequence of push-forwards satisfies an LDP with speed and the rate function .
In the special case of asymptotically uniform block sizes the function is related to the matrix in an even more straightforward way, since in this case
[TABLE]
We show that the rate function has a unique minimum at [math] in the case , which yields the following corollary.
Corollary 1.2**.**
Under the general assumptions, if , the normalized vector of magnetizations converges to [math] exponentially fast in -probability. By this we mean more precisely, for each there is a constant such that
[TABLE]
Let us discuss the large deviation results. In the classical Curie–Weiss model, i.e. the case , there is a phase transition: The limiting behavior of changes, depending on whether (the high temperature regime), or (the low temperature regime) (see [15] for an extensive treatment of this model). A corresponding phase transition can be observed in our model. This is stated in [18] for the bipartite model. In [25] the authors prove the existence of such a phase transition using the method of moments. Of course, with that method one cannot obtain an exponential speed of convergence as in Corollary 1.2. In accordance with the notion in the classical Curie–Weiss model, we will call these different parameter regimes the high temperature and low temperature regime, respectively. Here, the high temperature regime corresponds to and the low temperature regime to . In the special case of asymptotically uniform block sizes (i.e. ) these conditions reduce to and respectively.
Next, we consider the scaled block magnetization vector . Again, in the classical (i.e. one-dimensional) case it is known that the magnetization satisfies a central limit theorem with variance whenever . The following theorem is a generalization of this phenomenon.
Theorem 1.3**.**
Let and be a block interaction matrix. In the high temperature regime we have
[TABLE]
Consequently, in the uniform case
[TABLE]
Note that exists, and it can be expanded into a Neumann series. Moreover, if is an orthogonal decomposition, then . Again, a similar statement is derived in [25] using the method of moments.
Furthermore, we can treat the critical case. In the Curie–Weiss model, for , the quantity converges weakly to a measure with Lebesgue-density (see e.g. [15, Theorem V.9.5]). As proven in [27] and [18] a similar statement holds true for the vector of magnetizations in the case of blocks. The next theorem gives a further generalization of this fact in the case . Moreover, it shows that statistics associated to the orthogonal decomposition of the block interaction matrix give rise to asymptotically independent random variables with either a Gaussian distribution or a distribution with a Lebesgue-density .
In the multidimensional critical case we restrict to the uniform case with a simple eigenvalue , i.e. we have . Let be the orthogonal decomposition, where is a unitary -matrix and a diagonal -matrix. If we define the normalized vector
[TABLE]
and the matrix
[TABLE]
we have the following result.
Theorem 1.4**.**
Under the above assumptions let and be independent random variables, defined on a common probability space. Then converges in distribution to a probability measure with density
[TABLE]
for a suitable normalization that makes the expression (1.1) a probability density.
Thus, the vector converges to a normal distribution with covariance matrix and the random variable converges to a distribution with Lebesgue-density .
We believe it is possible to extend Theorem 1.4 to the case where the eigenvalue has multiplicity greater than , by appropriately rescaling all the eigenvectors which belong to the eigenvalue .
Note that the parameter is directly related to the variance of a random variable with that distribution; indeed, a short calculation shows that for we have , where is an absolute constant. Moreover, , where is the eigenvector belonging to the eigenvalue .
In a final step, we establish convergence rates in the CLT in the high temperature case for a special class of functions. We use the exchangeable pair approach of Stein’s method, that was also used in [14] and [7] in the case of the Curie–Weiss model. The proof of the next result will rely on a multivariate version of Stein’s method proven in [30]. To this end, define the function class
[TABLE]
of all three times differentiable functions with all partial derivatives (up to order three) bounded.
Theorem 1.5**.**
Assume that and for each let . For , we have
[TABLE]
2. Proofs of the large deviation results and the mean-field equations
Let us start off by proving the LDP result for the rescaled block magnetization vector . Recall the notion of an LDP (for which we also refer to [13] and [12]): If is a Polish space and is an increasing sequence of non-negative real numbers, we say that a sequence of probability measures on satisfies a large deviation principle with speed and rate function (i.e. a lower semi-continuous function with compact level sets for all ), if for all Borel sets we have
[TABLE]
where and denote the topological interior and closure of a set , respectively.
We say that a sequence of random variables satisfies an LDP with speed and rate function under a sequence of measures if the push-forward sequence satisfies an LDP with speed and rate function .
To prove Theorem 1.1, we will need the following lemma.
Lemma 2.1**.**
Let be a Polish space and assume that a sequence of measures on satisfies an LDP with speed and rate function . Let be a continuous function which is bounded from above and a sequence of functions such that . Then the sequence of measures
[TABLE]
satisfies an LDP with speed and rate function
[TABLE]
Proof.
Note that this is a slight modification of the tilted LDP, which is an immediate consequence of Varadhan’s Lemma ([13, Theorem III.17]). Indeed, according to this tilted LDP, the sequence of measures with -density satisfies an LDP with speed and rate function . Since for any and any the inequalities
[TABLE]
hold, this easily implies an LDP for with speed and the same rate function due to . ∎
Proof of Theorem 1.1.
First, note that under the uniform measure (i.e. ) we have
[TABLE]
so that
[TABLE]
By the Gärtner-Ellis Theorem ([12, Theorem 2.3.6]), satisfies an LDP under with speed and rate function
[TABLE]
where is the convex conjugate of . Next, it is easy to see that we can rewrite the -density of as
[TABLE]
where
[TABLE]
Note that we artificially inserted the truncation in to emphasize the boundedness of . This does not affect the quadratic form, as
[TABLE]
Moreover, is obviously continuous and satisfies
[TABLE]
on the support of , so that the assertion follows from Lemma 2.1. ∎
2.1. The mean-field equations
Theorem 1.1 states that the function
[TABLE]
determines the asymptotic behavior of the magnetization, and thus the critical points of are of utter importance. These satisfy the so-called mean-field equations
[TABLE]
For example, in the well-studied case , choosing
[TABLE]
for a positive definite matrix and equations (2.1) reduce to
[TABLE]
Whereas for the two-dimensional fixed point problem the existence of a solution can be shown by monotonicity arguments, the existence of a solution to (2.1) for general is more involved. First off, we show that in the high temperature regime the only critical point of is [math]. This will immediately yield Corollary 1.2.
Proof of Corollary 1.2.
In the sense of the formulation in Corollary 1.2, concentrates exponentially fast in the minima of the function . However, under the condition there is only one minimum, which is zero. To see this, note that any local minimum satisfies
[TABLE]
Here, is understood componentwise. Clearly, [math] is a solution, and due to
[TABLE]
this is a local minimum. Assume there is some solving (2.2), and observe that
[TABLE]
Here the first inequality follows from the general fact that the spectrum of the matrices and agree, applied to and . The last inequality follows from for all , with equality for only. This means that for any solution we have equality in (2.1). However, equality can only hold if whenever . Due to our assumption , this proves the claim. ∎
In contrast, in the low temperature regime, there are other solutions to the mean-field equations (2.1). Let us start with the following proposition showing the connection of the -dimensional mean-field equations to the one-dimensional equations of the Curie–Weiss model. It provides an explicit formula for the solution of the -dimensional problem in terms of the solution of the Curie–Weiss equation.
Proposition 2.2**.**
Let , and be a positive semidefinite, symmetric matrix with . If the eigenvector belonging to the largest eigenvalue can be rescaled to satisfy , then there exists a solution to the mean-field equations (2.1) and it is given by , where is the positive one-dimensional solution of the Curie–Weiss model with temperature .
Proof.
Let be the unique positive solution of the Curie–Weiss equation for and define . We have
[TABLE]
where in the second-to-last step we have used explicitly that , and so is a critical point of . Moreover, in this case it is easily seen that
[TABLE]
is negative definite. Indeed, from
[TABLE]
we obtain
[TABLE]
∎
Example 2.3*.*
Even though the assumptions in the previous proposition seem to be tailor-made for its proof (and the conclusion also holds true more generally), there are interesting non-trivial examples of a matrix satisfying the conditions of Proposition 2.2. One of them is the family of matrices () of the form
[TABLE]
for any parameters satisfying
[TABLE]
This corresponds to groups with an interaction parameter within the group and between the groups. For example, the condition (2.5) is satisfied whenever .
In the general case, the conclusion of Proposition 2.2 holds as well. In this case the proof relies on the fact that the continuous function has a global maximum on its (compact) domain , and the next lemma excludes maxima on the boundary. Hence there is always at least one solution (since [math] is either an inflection point or a minimum) to (2.1).
Lemma 2.4**.**
Let be the large deviation rate function from Theorem 1.1, i.e.
[TABLE]
and denotes the convex conjugate of .
- (1)
* has no global maxima on the boundary of .* 2. (2)
If satisfies the mean-field equations, we have
[TABLE] 3. (3)
The set of all global maximisers has a positive distance from the boundary.
Proof.
: Assume that is a global maximum of on the boundary. Then there is at least one index such that (if , switch to since ). Rewriting the fact that is a maximum of , we have for any and
[TABLE]
where is the vector obtained from by deleting the -th component. If we divide both sides by and let , the left hand side is finite, as , and the right hand side tends to by l’Hospital’s rule. This proves statement (1).
: Clearly, can only satisfy the mean-field equations if . Since it solves the mean-field equations, for any we have
[TABLE]
Inserting this into the function gives
[TABLE]
: The function is bounded in , as
[TABLE]
On the other hand, if there exists a sequence of maximisers approaching the boundary, i.e. for at least one we have , this gives . ∎
In the case of two blocks, i.e. , equal block sizes and the same interaction within a group, the set of maximisers of the rate function is explicitly known. Indeed, in [3, Proposition 4.1] and [27, Theorem 2.1] the authors show that for
[TABLE]
satisfying and (the low temperature case) the distribution of concentrates in the two points , and . In the case the limit points for become , and . Here is the largest solution to
[TABLE]
If , the distribution of concentrates in the origin. For , we can extend this result to arbitrary block sizes.
Proposition 2.5**.**
Let , be a block interaction matrix and for some . In the low temperature case, if the groups are not interacting (i.e. ) there exists either two or four global maxima of ; for , there are always two global maxima of .
Note that we have to restrict to and in order for to be positive definite. Moreover, the characterization of the high temperature phase (where is the Loewner partial ordering) can be reduced to and . Thus we are in the high temperature regime if and only if
[TABLE]
Proof.
The case is an easy consequence of the statements for the one-dimensional Curie–Weiss model, since and .
We treat the case only – the case follows immediately from the equality (with the appropriate modifications, e.g. the maximum will be in the second quadrant instead of the first).
Due to (2.6) the maximum of the rate function is non-negative, let us call this maximum . Then, implies , which is a contradiction to the low temperature case (recall the Hessian of in [math] given in equation (2.3)), so that . Moreover, every global maximum (and thus local maximum, as it is not attained on the boundary) satisfies the mean-field equations, and so the value of at any maximum is given by equation (2.6). As a consequence, all global maxima lie on a contour line , where was defined in the previous lemma.
Firstly, let us show that in the first quadrant there can only be one such point. Due to symmetry, the global maximum will also be present in the third quadrant. For the points on the contour line can be described by a function , and due to the monotonicity of the function is non-increasing. Moreover, the solutions of the mean-field equations can be described by the functions
[TABLE]
via
[TABLE]
The function can behave in two ways, depending on the parameter : For it increases monotonously. For it decreases first and then increases. More precisely, in the latter case, if and only if for some and is strictly increasing for . Moreover, the curve is only in the first quadrant if . In either case, there is only one intersection point of and in the first quadrant.
Secondly, the maximum cannot be in the second quadrant. Assume that there are solutions to the mean field equations both in the first and in the second quadrant. If we denote by the zeros of , for the solution in the second quadrant, we easily see that and Hence
[TABLE]
If there is also a solution in the first quadrant with coordinates , we obtain analogously
[TABLE]
This yields that the maximum must lie in the first quadrant ∎
Furthermore, we can treat the case for uniform block sizes and special matrices. The proof is motivated by [3, Proposition 4.1].
Lemma 2.6**.**
Let and be a block interaction matrix with positive entries such that we have for any for two constants and .
In the uniform case, there are exactly two maximisers of the rate function and they satisfy for solving the Curie–Weiss equation .
Proof.
Using the equality we can rewrite the rate function as
[TABLE]
where equality only holds in the case for all . Thus, we search for maximisers of on the generalized diagonal . On this set we have
[TABLE]
i.e it reduces to the Curie–Weiss equations in one dimension. For it has a unique nonzero solution , and solves the -dimensional maximization problem. ∎
Unfortunately, the proof cannot be modified in a straightforward way to deal with non-equal block sizes, not even in the case . The reason is that the inequality used in the proof does not give any information on the actual maximiser in this setting (i. e. is not maximized on any type of (weighted) diagonal). As such, we cannot reduce this to the one-dimensional setting.
Example*.*
For example, Lemma 2.6 can be used to prove that given three positive parameters with and , the rate function corresponding to
[TABLE]
only has two maximisers in the uniform case. The conditions on ensure that is positive definite, and it is clear that and .
As a concluding remark let us note that the previous results imply that there is indeed a phase transition in our block spin model. However, if or the block sizes are not equal, it seems hard to give a similarly explicit formula for the limit points. Nevertheless, the above observations show that there is a phase transition in a very general class of block spin models with an arbitrary number of blocks and general class of block sizes. In particular, they also justify the names “high temperature regime” and “low temperature regime”.
3. Proofs of the limit theorems
In this section we prove (standard and non-standard) Central Limit Theorems for the vector . In the first subsection we will treat the high temperature regime. Here we derive a standard CLT using the Hubbard–Stratonovich transform. This is in spirit similar to the third section in [27] and technically related to [22]. The result can also be derived from [17], where similar techniques are used. However, the subsection also prepares nicely for Subsection 3.2, where we treat the critical case and show a non standard CLT. This generalizes results from [18] and [27]. Finally, in Subsection 3.3 we will use Stein’s method, an alternative approach to prove the CLT for . This is not only interesting in its own right, but also has the advantage of providing a speed of convergence, which is missing in the case of a proof via the Hubbard–Stratonovich transform.
3.1. Central limit theorem: Hubbard–Stratonovich approach
For the proof we shall use the transformed block magnetization vectors
[TABLE]
where is the orthogonal decomposition. It is easy to see that
[TABLE]
Proof of Theorem 1.3.
As in [27] or [17] (both papers are inspired by [16]), we use the Hubbard–Stratonovich transform (i.e. a convolution with an independent normal distribution). For each ,
[TABLE]
Our first step is to prove that converges weakly to a normal distribution. Let be an independent sequence, which is moreover independent of . We have for any
[TABLE]
where we have defined
[TABLE]
Since , we obtain
[TABLE]
For parameters let and decompose
[TABLE]
Since (which is a consequence of the continuity of the eigenvalues) we have for any
[TABLE]
Next, we will estimate (3.1) from below in order to obtain an upper bound for . If we define , it follows that
[TABLE]
Here, we have used the convergence of to to bound and the fact that as , so that the right hand side is positive definite for small enough, uniformly in . Thus, after taking the limit , will vanish in the limit .
Lastly, we need to show that vanishes as well. To this end, we show that we can choose small enough to ensure that uniformly for and for large enough. Since and , choose large enough so that uniformly. Again, as before, it can be seen that [math] is the only minimum for chosen that way. Indeed, after some manipulations any critical point satisfies , and since and , this is only possible for . As a consequence, for any there is a constant such that uniformly , i.e.
[TABLE]
Lastly, choose so small that is uniformly positive definite, and observe that we obtain
[TABLE]
From here, it remains to undo the convolution (e.g. by using the characteristic function), giving
[TABLE]
With the help of Slutsky’s theorem and the definition this implies
[TABLE]
as claimed. ∎
Example*.*
Consider the case and
[TABLE]
is positive definite if and , i.e. if . We have the diagonalization
[TABLE]
and corresponds to the transformation performed in [27, Theorem 1.2] (up to a factor of ). In this case
[TABLE]
which is exactly the covariance matrix in [27] (again up to a factor of ). Note that similar results have been derived in [25].
Remark*.*
If is symmetric and positive semidefinite, then a variant of the proof shows that if we let with for , converges to an -dimensional normal distribution with covariance matrix . This can be applied to the matrix above with , resulting in a CLT for the magnetization in a Curie–Weiss model, which of course can also be obtained by choosing and .
3.2. Non-central limit theorem
Recall the situation of Theorem 1.4: The block interaction matrix has eigenvalues and we consider the uniform case, i.e. . Moreover, we use the definitions
[TABLE]
so that
[TABLE]
Proof of Theorem 1.4.
Let and be independent random variables, defined on a common probability space. We have for any Borel set
[TABLE]
where we used
[TABLE]
Now the proof is along the same lines as the proof of the CLT in the high temperature phase, with the slight modification that we use expansion of to fourth order
[TABLE]
We again split into three regions, namely the inner region for an arbitrary , the intermediate region for some arbitrary , where
[TABLE]
and the outer region . Also define the rescaled vector
[TABLE]
Firstly, in the inner region we rewrite
[TABLE]
and since the convergence of the error terms is uniform on any compact subset of , for any fixed this yields
[TABLE]
Secondly, we show that the outer region does not contribute to the limit . It can be seen by elementary tools that has a unique minimum [math] in [math], and so for any we have . Using the monotone convergence theorem, we obtain
[TABLE]
Lastly, we will estimate the contribution of the intermediate region from above by a quantity which vanishes as . To this end, we will bound the function from below. Recall that
[TABLE]
and since for this yields
[TABLE]
Now, as in the case of the central limit theorem, we can estimate from below the error term in such a way that there is a positive constant and a positive definite matrix such that
[TABLE]
from which we obtain an upper bound, i.e.
[TABLE]
and the right hand side vanishes as by dominated convergence. As a result, the limit exists and is equal to
[TABLE]
The convergence results for the non-convoluted vector follow easily by considering the characteristic functions. We have for any
[TABLE]
where and is the characteristic function of a random variable with distribution . Using the independence of and , the results follow by simple calculations. ∎
3.3. Central limit theorem: Stein’s method
Lastly, we will prove Theorem 1.5 using Stein’s method of exchangeable pairs. For brevity’s sake, for the rest of this section we fix and we will drop all sub- and superscripts (e.g. we write instead of , instead of , instead of et cetera). It is more convenient to formulate this approach in terms of random variables. Let be a random vector with distribution and be an independent random variable uniformly distributed on . First, denote by the exchangeable pair which is given by taking a step in the Glauber chain for , i.e. is the vector after replacing by an independent with distribution (the exchangeability follows from the reversibility of the Glauber dynamics). Consequently, is also exchangeable. More precisely, with the standard basis vectors of we have
[TABLE]
We need the following lemma to identify the conditional expectation of . Here, we write for the function that assigns to each position its block, i.e. .
Lemma 3.1**.**
Let and be defined as above. Then for each fixed
[TABLE]
Proof.
For any Ising model the conditional distribution of is given by and so
[TABLE]
where we recall the notation for the matrix without its diagonal, i.e. . In the case that is the block model matrix, this yields
[TABLE]
∎
Since the conditional expectation will be of importance, we define
[TABLE]
so that . Note that actually does not depend on , the latter term is added for convenience to rewrite the first term. Thus we have .
Lemma 3.2**.**
We have
[TABLE]
with
[TABLE]
Proof.
From equation (3.2) and Lemma 3.1 we obtain
[TABLE]
∎
For large enough, the matrix satisfies and is thus invertible, with inverse . Moreover, we also have .
We will need the following approximation theorem for random vectors.
Theorem 3.3** ([30], Theorem 2.1).**
Assume that is an exchangeable pair of -valued random vectors such that
[TABLE]
with symmetric and positive definite. Suppose further that
[TABLE]
is satisfied for an invertible matrix and a -measurable random vector . Then, if has -dimensional standard normal distribution, we have for every three times differentiable function
[TABLE]
where, with , we define the three error terms
[TABLE]
Here, denotes the supremum of the partial derivatives of up to order .
Note that in the proof the choice of for the conditional expectation is arbitrary; it suffices to take any -algebra with respect to which is measurable. Clearly, the value has to be adjusted accordingly.
Corollary 3.4**.**
Let be the block magnetization vector and as above, define and let . For any function
[TABLE]
with the three error terms
[TABLE]
Finally, the following lemma shows that all error terms can be bounded by a term of order .
Lemma 3.5**.**
In the situation of Corollary 3.4 we have
[TABLE]
Before we prove this lemma (and consequently Theorem 1.5), we will state concentration of measure results in the block spin Ising models. These will be necessary to bound . The first step is the existence of a logarithmic Sobolev inequality for the Ising model with a constant that is uniform in .
Proposition 3.6**.**
Under the general assumptions, if , then for large enough the Ising model satisfies a logarithmic Sobolev inequality with a constant , i.e. for any function we have
[TABLE]
where is the entropy functional and the sign flip operator.
This follows immediately from [23, Proposition 1.1], since , which implies the convergence of the norms, i.e. for large enough we have . Although the condition in [23] is , this was merely for applications’ sake and is sufficient to establish the logarithmic Sobolev inequality.
For any function and any we write
[TABLE]
so that (3.3) becomes
[TABLE]
Moreover, it is known that (3.3) implies a Poincaré inequality
[TABLE]
Proof of Lemma 3.5.
Error term : To treat the term , fix and observe that
[TABLE]
Thus, if we define
[TABLE]
we see that
[TABLE]
and we need to show that . Using the Poincaré inequality (3.4) it suffices to prove that .
Let be arbitrary and define . The first case is that , for which
[TABLE]
The second case follows by similar reasoning.
Error term : The second term is much easier to estimate, as
[TABLE]
Error term : To estimate the variance of the remainder term we first split it into two sums. For any write
[TABLE]
Clearly and we estimate these terms separately. It is obvious that the norm of the second term is of order . To estimate , we use to obtain
[TABLE]
In the last line we have used the fact that and for all
[TABLE]
which evaluated at gives . For the details see [23]. The constant depends on a norm of , which by convergence to can again be chosen independently of . ∎
Proof of Theorem 1.5.
The theorem follows immediately from Corollary 3.4 and Lemma 3.5. ∎
4. Discussion and open questions
Although the questions raised in the introduction have been answered to a certain degree, there are still open questions that we were not yet able to answer.
The first question concerns the maxima of the rate function . Firstly, note that by [11, Theorem A.1] the global maxima of are related to the global minima of the so-called pressure functional, which can for example be found in [17, equation (14)]. Using the compactness of and the continuity of , the existence of a maximiser easily follows, but the number of maximisers is still obscure. From real-analyticity of , we can infer that the set of maximisers is a null set, but it could in principle contain infinitely many points. However, Lemmas 2.5 and 2.6 as well as numerics suggest that for positive interactions and , the number of local minima is twice the number of independent systems - see Figures 2 for the and 3 for the case below.
However, we believe that the case of negative interactions between groups might drastically change the picture. Indeed, consider a model with three blocks and positive interaction within the blocks and negative interaction between the blocks. Then, if is large enough, the points within the blocks will tend to be aligned. However, as is negative, the magnetizations of block one and two will try to have different signs, but so do the magnetizations of blocks two and three, and three and one. Hence, frustration occurs. In this respect, a model with positive and negative interactions carries features of a spin glass.
Another question is the relationship of Theorems 1.3 and 1.5. In Theorem 1.5 we consider the distance to a normal distribution with covariance matrix and not to , which is the covariance matrix of the limiting distribution. Testing against functions , we see that is the limit of the matrices . It is an interesting task to provide suitable bounds of in any matrix norm, since [30, Proposition 2.8] provides bounds of for two random vectors with and in terms of the -distance of and .
Thirdly, it remains an open problem to quantify the distance to a normal distribution with the “limiting” covariance matrix . The central limit theorem in the one-dimensional Curie–Weiss model has been solved for example in [14, Corollary 2.9]. Therein one can see that the limiting covariance is by considering the approximate linear regression condition. A similar condition is true in the multidimensional case. For example, in Lemma 3.2 we have proven
[TABLE]
where and . Thus, in the case (e.g. consider a subsequence along which this holds) is the covariance matrix of the limit distribution. However, we have been unable to find a suitable modification of [30, Theorem 2.1] that enables one to compare the distribution of the random vector with .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Agliari, R. Burioni, and P. Contucci. A diffusive strategic dynamics for social systems. J. Stat. Phys. , 139(3):478–491, 2010.
- 2[2] A. A. Amini and E. Levina. On semidefinite relaxations for the block model. Ann. Statist. , 46(1):149–179, 2018.
- 3[3] Q. Berthet, P. Rigollet, and P. Srivastava. Exact recovery in the Ising blockmodel. Ann. Statist. , 47(4):1805–1834, 2019.
- 4[4] G. Bresler. Efficiently learning Ising models on arbitrary graphs [extended abstract]. In STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing , pages 771–782. ACM, New York, 2015.
- 5[5] G. Bresler, E. Mossel, and A. Sly. Reconstruction of Markov random fields from samples: some observations and algorithms. SIAM J. Comput. , 42(2):563–578, 2013.
- 6[6] W. A. Brock and S. N. Durlauf. Discrete choice with social interactions. Rev. Econom. Stud. , 68(2):235–260, 2001.
- 7[7] S. Chatterjee and Q.-M. Shao. Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie-Weiss model. Ann. Appl. Probab. , 21(2):464–483, 2011.
- 8[8] F. Collet. Macroscopic limit of a bipartite Curie-Weiss model: a dynamical approach. J. Stat. Phys. , 157(6):1301–1319, 2014.
