Interaction Order Estimation in Tensor Curie–Weiss Models
Somabha Mukherjee

TL;DR
This paper studies the estimation of interaction parameters in Curie–Weiss and Ising models, showing when it is possible or impossible to estimate these parameters based on temperature and model structure.
Contribution
The paper introduces a threshold function β*(p) that determines the feasibility of estimating the interaction order p in Curie–Weiss models.
Findings
Joint estimation of β and p is impossible due to contiguity.
Estimation of p is impossible if β is unknown.
Consistent estimation of p is possible only when β exceeds β*(p).
Abstract
In this paper, we consider the problem of estimating the interaction parameter p of a p-spin Curie–Weiss model at inverse temperature β, given a single observation from this model. We show, by a contiguity argument, that joint estimation of the parameters β and p is impossible, which implies that the estimation of p is impossible if β is unknown. These impossibility results are also extended to the more general p-spin Erdös–Rényi Ising model. The situation is more delicate when β is known. In this case, we show that there exists an increasing threshold function β*(p), such that for all β, consistent estimation of p is impossible when β*(p)>β, and for almost all β, consistent estimation of p is possible for β*(p)<β.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
- —FoS Tier 1
- —FRC Tier 1
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTheoretical and Computational Physics · Markov Chains and Monte Carlo Methods · Quantum many-body systems
1. Introduction
The Ising model [1] was originally introduced in Physics as a model for ferromagnetism, and has since then found numerous interesting applications in diverse areas, such as image processing [2], neural networks [3], spatial statistics [4], and disease mapping in epidemiology [5]. The classical (2-spin) Ising model is a discrete probability measure on the set of all binary strings of a fixed length, given by the following:
where the Hamiltonian is given by
is the interaction matrix (often taken as the scaled adjacency matrix of some graph), denotes the interaction strength (called inverse temperature in the physics literature), and is the normalizing constant, needed to ensure that the probabilities in (1) add up to 1 (referred to as the partition function in Physics). The Hamiltonian of the model (1) only captures pairwise dependence between the binary variables, arising from an underlying network/interaction structure. Unfortunately, in most real-life scenarios, pairwise interactions are not enough to explain all of the complex dependencies arising in network data, and one has to take into account higher-order interactions arising from peer-group effects. Multi-body interactions are also common in many other branches of science; for example in Chemistry [6], it is known that the atoms on a crystal surface do not just interact in pairs, but in triangles, quadruples, and higher-order tuples. A natural extension of the classical 2-spin Ising model with a focus on capturing higher-order dependencies is the p-spin Ising model [7,8,9,10], where the quadratic interaction term in the sufficient statistic is replaced by a multilinear polynomial of degree . The probability mass function of this model is given by the following:
where the Hamiltonian is given by the following:
is the interaction tensor (often taken as the scaled adjacency tensor of a hypergraph), denotes the interaction strength (or inverse temperature), p is the interaction order, and is the normalizing constant (or partition function).
The p-spin Ising models have applications, including their role in explaining the microscopic theory of magnetism in solid and fluid ^3^He films absorbed on the surface of graphite (see [11]). They also serve as important resources for analog quantum simulation and quantum computation, contribute to the mapping of fermions to artificial spin systems in quantum algorithms for quantum chemistry, and appear in spin models of cuprate superconductor Hamiltonians (see [12] and the references therein). Furthermore, they play crucial roles in error suppression schemes for quantum annealers, adiabatic topological quantum computation, and other related areas. p-spin Ising and spin glass models have also appeared in a number of classical and recent works, such as [7,8,10,13,14,15,16,17,18] and p-spin versions of the closely related Potts model have appeared in [19,20,21].
The estimation of the parameter in the model (2), assuming that the interaction order p is known, has been studied exptensively in the past (see, for example, [22,23,24] for the case and [9,10,16] for the case ). For the case, -consistency of the so called maximum pseudolikelihood estimator (MPLE) of under some general assumptions on the model (2) was first established in [23]. These assumptions can be verified easily for Ising models on many dense graphs at low temperatures (high values of ), and for Ising models on bounded-degree graphs as well as popular spin-glass models such as the Sherrington-Kirkpatrick model at all temperatures. Some of these results were extended in [22] where different rates of consistency were obtained for Ising models on sparse graphs, such as the sparse Erdös–Rényi graphs or sparse regular graphs, even at high temperatures. In addition, inestimability of was established in [22] for Ising models on a sequence of dense graphs converging to a graphon for all below a certain threshold, given by the inverse spectral norm of the limiting graphon. Precise central limit theorems for the maximum likelihood estimator (MLE) of were derived for the 2-spin Ising model on complete graphs (also known as the Curie–Weiss model) in [25]. Some of these results have been extended in [9,10,16] for the case .
Now, suppose that one is given a sample from a p-spin Ising model (2), with no other information on any of the parameters. A natural question is whether it is possible to estimate the interaction order p of the model. This is highly relevant from a practical perspective, as in many real-life applications such as recommender systems, strong peer-group effects exist, and two-interaction models are known to fit the data much worse than higher-order interaction models. For example, in [26], it was empirically shown that the 3-spin Ising model fitted the Last.fm music (accessed on 9 December 2020) (http://millionsongdataset.com/lastfm/) much better than the classical 2-spin model, thereby indicating the presence of complex peer-group effects in this dataset. This is a fan network database for a number of popular artists and bands, and the data consist of a list of binary opinions from the fan base for each artist, indicating whether each user is a fan of that artist or not as well as the user friendship network. The exact procedure involved first estimating the parameter assuming a 2-spin Ising model from the data using the maximum pseudolikelihood estimator, followed by simulating a number of samples from the corresponding 2-spin Ising model with the estimated as the parameter. It was observed that for most artists under consideration, the true Hamiltonian from the data lies outside the and quantiles of the histogram of the sampled Hamiltonians, thereby suggesting misfit. When a 3-spin Ising model was fit with the triangles in the user network as the hyperedges, the true Hamiltonian actually fell comfortably within the bulk of the histogram, thereby suggesting a good fit. In a different context, in [11], the authors mentioned that experimental and theoretical data indicated the relevance of three, four, five and six-spin exchange processes over a wide density range in adsorbed ^3^He films.
One might, in such situations, be interested in figuring out a systematic way of determining the value of the interaction order p for which the corresponding p-tuple interaction model fits the data best. To the best of our knowledge, this reverse problem of estimating the interaction order p given a single observation from the model (2) even with known has not been addressed in the literature before. This question may be quite difficult for arbitrary underlying interaction tensors , which necessitates some convenient structural assumptions on this tensor.
In this paper, we assume that tensor has all entries equal to , which corresponds to the p-spin Curie–Weiss model [7,9,27], given by the following:
where . Even under this structural assumption, the possibility of estimating p depends on whether the interaction strength is known or not. We will show that consistent estimation of p is impossible if is unknown, which will be a consequence of our argument on the impossibility of the joint consistent estimation of and p. At the heart of these impossibility results is the fact that the sufficient statistic in the model (3) converges to a mixture of point masses at the maximizers of a certain function , and that these maximizers do not uniquely determine the tuple . This idea is formalized by a contiguity argument between the Curie–Weiss measures and N-fold products of Rademacher distributions, with the largest maximizer of as the mean. It should be mentioned here that the related problem of joint inestimability of for the classical 2-spin Curie–Weiss model with an additional magnetic field parameter h was addressed in [24] using similar contiguity arguments. We also extend these impossibility results to the more general p-spin Erdös–Rényi Ising models.
The situation is more intricate when is known. In this case, we will show that there exists a strictly increasing threshold function , such that the consistent estimation of p is impossible for . However, for almost all (to be precise, for all but possibly countably many ), p can be estimated consistently whenever . The question of exactly describing the exceptional set of countably many s for which the region is inestimable for p, is still open. Finally, we want to mention that although joint consistent estimation of and p is impossible using just one sample from the p-spin Curie–Weiss model (3), we still hope to do so using multiple samples.
2. Impossibility of Jointly Estimating (β,p)
We start by showing that joint consistent estimation of and p using only one sample from the model (3) is, in general, impossible. Towards this, for every , define a set:
where D denotes the set of all integers , and
Theorem 1. For every , such that , there does not exist any sequence of estimators (measurable functions of ), which is consistent for under the model (3).
The following lemma is crucial for proving Theorem 1.
Lemma 1. For every , denote the distribution of a Rademacher random variable with mean m by μ. Then, the product measure is contiguous to for all , and all .
Proof. To begin with, note that on the event , we have:
Now, by Lemma 3.2 and Lemma 3.4 in [9], we have:
Also, by Taylor expansion, we have:
Hence, we have:
for some constant . Hence, for any event , we have:
thereby providing the following:
Now, suppose that . Then, for every , we have the following:
Taking the limit as throughout the above inequality, we can conclude that , which completes the proof of Lemma 1. □
With Lemma 1 in hand, we are now ready to prove Theorem 1.
Proof of Theorem 1Suppose that there exists a sequence of consistent estimators of on . Fixing two different points and , we can construct disjoint neighborhoods and around them, respectively. Through the consistency of on , we have the following:
From Lemma 1, we have the following:
which contradicts the facts that , are disjointed, and is a probability measure. □
Remark 1. Our argument implies that the consistent estimation of p is impossible if β is unknown. Because, if there were such a consistent estimator , then we could choose and from some ; construct disjoint open intervals and around and , respectively; and argue from contiguity that for , which is a contradiction.
The question is how do the sets look for different values of ? To answer this, let us define for each ,
It follows from [9] that for , the function has a unique postitive maximizer if . By convention, we define .
Proposition 1. The sequence as , which can be sorted in ascending order as . Further, if we denote to be the projection of onto the p-coordinate, then
The proof of Proposition 1 is technical, and is given in Appendix A. Note that for , the set is uniquely determined by its projection , because every uniquely corresponds to the element , where . Proposition 1 thus provides a complete description of the family of sets for all values of , and states that although each of these sets is finite, we can choose sets as large as possible from this family. The following proposition describes the set .
Proposition 2.
- .*
The proof of Proposition 2 follows from the following three facts proved in [9]:
- If , then 0 is the unique global maximizer of .
- If , then has two different non-negative global maximizers.
- If , then has a unique non-negative global maximizer, which happens to be positive.
Remark 2. All of the results in this section also hold almost surely for the somewhat more general p-spin Erdös–Rényi Ising model [27], where the interaction tensor in (2) is given by , with the entries of the tensor being i.i.d. Bernoulli random variables with a mean α, for some fixed . This follows from the fact that the Erdös–Rényi Ising and Curie–Weiss measures are mutually contiguous, which follows from Lemma 6.6 in [28].
3. Estimation of p When β Is Known
Throughout this section, we assume that the parameter is known. To begin with, for every , let us define the following two sets:
For Lemma A.1 in [10], the set is empty if , and is of the form for some integer otherwise.
Theorem 2. When β is known, no sequences of estimators that are consistent with exist.
Proof. It follows from Proposition 2 and Lemma 1, that the N-fold product measure of the mean-0 Rademacher distribution is contiguous to for all . The proof now follows from the argument given in Remark 1. □
Remark 3. One can consider an extension of the model (3) by adding an external magnetic field parameter h as follows:
It has been shown in [9] that the consistent estimation of h is always possible when β is known. Proposition 2 shows that this is not the case for estimating p, which is impossible if . This inestimability region also coincides with that for estimating β when p is known (see [10]).
Now, we turn our attention to the set , and we break it down to the following two parts:
where is defined as the largest non-negative maximizer of .
Theorem 3. When β is known, there does not exist any sequence of estimators that is consistent for .
Proof. Once again, from Lemma 1, we know that the N-fold product of the Rademacher distribution with a mean of is contiguous to both and , where and is such that . Clearly, . Once again, the rest of the proof follows from the arguments of Remark 1. □
We now show that it is possible to estimate p consistently on the set , if is known. Towards this, let us define the following estimator of p:
The intuition behind constructing our estimator, is that as ; hence, the distance between and is expected to be minimized at for large N. The next theorem shows that with a high probability, this is indeed the case.
Theorem 4. For every and , we have
for some constant not depending on N.
Proof. Suppose that for some . It follows from Lemma A.1 in [10] and the proof of Proposition 1 that as , if and otherwise. Hence, in any case, is not an accumulation point of the sequence . This shows that there exists , such that for all not equal to p.Now, it follows from the proofs of Lemma 3.1 and Lemma 3.3 in [9] that
for some constant not depending on N. Also, it follows from the proofs of Lemma 3.1 and Lemma 3.3 in [9] and the proof of Proposition 1, that
for some constant not depending on N. Define and . It is clear that on the event . Theorem 4 now follows, as . □
Remark 4. One can tune the parameter in (5) to allow for an optimal number of integers over which the minimization is to be performed, in order to ensure proper convergence of . Also, in case , eventually , so one just stops at the largest q, for which in the minimization (5).
In the reverse problem of estimating with known p, consistent estimation is possible for all (see [9]). In contrast, a further challenging inestimability region arises in our problem of estimating p with known . The next proposition shows that for almost all , the set is actually empty, i.e., the entire region is estimable.
Proposition 3. There exists a countable set , such that for .
The proof of Proposition 3 is given in Appendix B. However, in the proof we provide an exact enumeration of one countable set satisfying Proposition 3, namely:
our analysis did not allow us to dig in any further. In particular, the following question is open:
Open Problem: Give an exact enumeration of the minimal countable set satisfying Proposition 3.
Note that if for some integers , then is a stationary point of both and . We can actually say that if this stationary point turns out to be a global maximizer of both these functions. However, checking the last condition will likely involve more intricate analysis, and is left as an open question.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ising E. Beitrag zur theorie der ferromagnetismus Z. für Phys.19253125325810.1007/BF 02980577 · doi ↗
- 2Geman S. Graffigne C. Markov random field image models and their applications to computer vision Proceedings of the International Congress of Mathematicians Berkeley, CA, USA 3–11 August 198614961517
- 3Hopfield J.J. Neural networks and physical systems with emergent collective computational abilities Proc. Natl. Acad. Sci. USA 1982792554255810.1073/pnas.79.8.25546953413 PMC 346238 · doi ↗ · pubmed ↗
- 4Banerjee S. Carlin B.P. Gelfand A.E. Hierarchical Modeling and Analysis for Spatial Data Chapman and Hall/CRC Boca Raton, FL, USA 2014
- 5Green P.J. Richardson S. Hidden Markov models and disease mapping J. Am. Stat. Assoc.2002971055107010.1198/016214502388618870 · doi ↗
- 6Aslanov L.A. Crystal symmetry and atomic interactions model Comput. Math. Appl.19881644345110.1016/0898-1221(88)90234-9 · doi ↗
- 7Barra A. Notes on ferromagnetic p-spin and REM Math. Methods Appl. Sci.20093278379710.1002/mma.1065 · doi ↗
- 8Derrida B. Random-energy model: Limit of a family of disordered models Phys. Lett.1980457910.1103/Phys Rev Lett.45.79 · doi ↗
