Interaction Order Estimation in Tensor Curie–Weiss Models

Somabha Mukherjee

PMC · DOI:10.3390/e27030245·February 27, 2025

Interaction Order Estimation in Tensor Curie–Weiss Models

Somabha Mukherjee

PDF

Open Access

TL;DR

This paper studies the estimation of interaction parameters in Curie–Weiss and Ising models, showing when it is possible or impossible to estimate these parameters based on temperature and model structure.

Contribution

The paper introduces a threshold function β*(p) that determines the feasibility of estimating the interaction order p in Curie–Weiss models.

Findings

01

Joint estimation of β and p is impossible due to contiguity.

02

Estimation of p is impossible if β is unknown.

03

Consistent estimation of p is possible only when β exceeds β*(p).

Abstract

In this paper, we consider the problem of estimating the interaction parameter p of a p-spin Curie–Weiss model at inverse temperature β, given a single observation from this model. We show, by a contiguity argument, that joint estimation of the parameters β and p is impossible, which implies that the estimation of p is impossible if β is unknown. These impossibility results are also extended to the more general p-spin Erdös–Rényi Ising model. The situation is more delicate when β is known. In this case, we show that there exists an increasing threshold function β*(p), such that for all β, consistent estimation of p is impossible when β*(p)>β, and for almost all β, consistent estimation of p is possible for β*(p)<β.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals3

cuprate graphite 3He

Diseases1

injury to

Funding2

—FoS Tier 1
—FRC Tier 1

Keywords

Curie–Weiss modeljoint estimationinteraction ordercontiguity

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTheoretical and Computational Physics · Markov Chains and Monte Carlo Methods · Quantum many-body systems

Full text

1. Introduction

The Ising model [1] was originally introduced in Physics as a model for ferromagnetism, and has since then found numerous interesting applications in diverse areas, such as image processing [2], neural networks [3], spatial statistics [4], and disease mapping in epidemiology [5]. The classical (2-spin) Ising model is a discrete probability measure on the set of all binary strings of a fixed length, given by the following:

[eqn]

where the Hamiltonian $[eqn]$ is given by

[eqn]

$[eqn]$ is the interaction matrix (often taken as the scaled adjacency matrix of some graph), $[eqn]$ denotes the interaction strength (called inverse temperature in the physics literature), and $[eqn]$ is the normalizing constant, needed to ensure that the probabilities in (1) add up to 1 (referred to as the partition function in Physics). The Hamiltonian $[eqn]$ of the model (1) only captures pairwise dependence between the binary variables, arising from an underlying network/interaction structure. Unfortunately, in most real-life scenarios, pairwise interactions are not enough to explain all of the complex dependencies arising in network data, and one has to take into account higher-order interactions arising from peer-group effects. Multi-body interactions are also common in many other branches of science; for example in Chemistry [6], it is known that the atoms on a crystal surface do not just interact in pairs, but in triangles, quadruples, and higher-order tuples. A natural extension of the classical 2-spin Ising model with a focus on capturing higher-order dependencies is the p-spin Ising model [7,8,9,10], where the quadratic interaction term in the sufficient statistic is replaced by a multilinear polynomial of degree $[eqn]$ . The probability mass function of this model is given by the following:

[eqn]

where the Hamiltonian $[eqn]$ is given by the following:

[eqn]

$[eqn]$ is the interaction tensor (often taken as the scaled adjacency tensor of a hypergraph), $[eqn]$ denotes the interaction strength (or inverse temperature), p is the interaction order, and $[eqn]$ is the normalizing constant (or partition function).

The p-spin Ising models have applications, including their role in explaining the microscopic theory of magnetism in solid and fluid ^3^He films absorbed on the surface of graphite (see [11]). They also serve as important resources for analog quantum simulation and quantum computation, contribute to the mapping of fermions to artificial spin systems in quantum algorithms for quantum chemistry, and appear in spin models of cuprate superconductor Hamiltonians (see [12] and the references therein). Furthermore, they play crucial roles in error suppression schemes for quantum annealers, adiabatic topological quantum computation, and other related areas. p-spin Ising and spin glass models have also appeared in a number of classical and recent works, such as [7,8,10,13,14,15,16,17,18] and p-spin versions of the closely related Potts model have appeared in [19,20,21].

The estimation of the parameter $[eqn]$ in the model (2), assuming that the interaction order p is known, has been studied exptensively in the past (see, for example, [22,23,24] for the case $[eqn]$ and [9,10,16] for the case $[eqn]$ ). For the $[eqn]$ case, $[eqn]$ -consistency of the so called maximum pseudolikelihood estimator (MPLE) of $[eqn]$ under some general assumptions on the model (2) was first established in [23]. These assumptions can be verified easily for Ising models on many dense graphs at low temperatures (high values of $[eqn]$ ), and for Ising models on bounded-degree graphs as well as popular spin-glass models such as the Sherrington-Kirkpatrick model at all temperatures. Some of these results were extended in [22] where different rates of consistency were obtained for Ising models on sparse graphs, such as the sparse Erdös–Rényi graphs or sparse regular graphs, even at high temperatures. In addition, inestimability of $[eqn]$ was established in [22] for Ising models on a sequence of dense graphs converging to a graphon for all $[eqn]$ below a certain threshold, given by the inverse spectral norm of the limiting graphon. Precise central limit theorems for the maximum likelihood estimator (MLE) of $[eqn]$ were derived for the 2-spin Ising model on complete graphs (also known as the Curie–Weiss model) in [25]. Some of these results have been extended in [9,10,16] for the case $[eqn]$ .

Now, suppose that one is given a sample $[eqn]$ from a p-spin Ising model (2), with no other information on any of the parameters. A natural question is whether it is possible to estimate the interaction order p of the model. This is highly relevant from a practical perspective, as in many real-life applications such as recommender systems, strong peer-group effects exist, and two-interaction models are known to fit the data much worse than higher-order interaction models. For example, in [26], it was empirically shown that the 3-spin Ising model fitted the Last.fm music (accessed on 9 December 2020) (http://millionsongdataset.com/lastfm/) much better than the classical 2-spin model, thereby indicating the presence of complex peer-group effects in this dataset. This is a fan network database for a number of popular artists and bands, and the data consist of a list of binary opinions from the fan base for each artist, indicating whether each user is a fan of that artist or not as well as the user friendship network. The exact procedure involved first estimating the parameter $[eqn]$ assuming a 2-spin Ising model from the data using the maximum pseudolikelihood estimator, followed by simulating a number of samples from the corresponding 2-spin Ising model with the estimated $[eqn]$ as the parameter. It was observed that for most artists under consideration, the true Hamiltonian from the data lies outside the $[eqn]$ and $[eqn]$ quantiles of the histogram of the sampled Hamiltonians, thereby suggesting misfit. When a 3-spin Ising model was fit with the triangles in the user network as the hyperedges, the true Hamiltonian actually fell comfortably within the bulk of the histogram, thereby suggesting a good fit. In a different context, in [11], the authors mentioned that experimental and theoretical data indicated the relevance of three, four, five and six-spin exchange processes over a wide density range in adsorbed ^3^He films.

One might, in such situations, be interested in figuring out a systematic way of determining the value of the interaction order p for which the corresponding p-tuple interaction model fits the data best. To the best of our knowledge, this reverse problem of estimating the interaction order p given a single observation from the model (2) even with known $[eqn]$ has not been addressed in the literature before. This question may be quite difficult for arbitrary underlying interaction tensors $[eqn]$ , which necessitates some convenient structural assumptions on this tensor.

In this paper, we assume that tensor $[eqn]$ has all entries equal to $[eqn]$ , which corresponds to the p-spin Curie–Weiss model [7,9,27], given by the following:

[eqn]

where $[eqn]$ . Even under this structural assumption, the possibility of estimating p depends on whether the interaction strength $[eqn]$ is known or not. We will show that consistent estimation of p is impossible if $[eqn]$ is unknown, which will be a consequence of our argument on the impossibility of the joint consistent estimation of $[eqn]$ and p. At the heart of these impossibility results is the fact that the sufficient statistic $[eqn]$ in the model (3) converges to a mixture of point masses at the maximizers of a certain function $[eqn]$ , and that these maximizers do not uniquely determine the tuple $[eqn]$ . This idea is formalized by a contiguity argument between the Curie–Weiss measures and N-fold products of Rademacher distributions, with the largest maximizer of $[eqn]$ as the mean. It should be mentioned here that the related problem of joint inestimability of $[eqn]$ for the classical 2-spin Curie–Weiss model with an additional magnetic field parameter h was addressed in [24] using similar contiguity arguments. We also extend these impossibility results to the more general p-spin Erdös–Rényi Ising models.

The situation is more intricate when $[eqn]$ is known. In this case, we will show that there exists a strictly increasing threshold function $[eqn]$ , such that the consistent estimation of p is impossible for $[eqn]$ . However, for almost all $[eqn]$ (to be precise, for all but possibly countably many $[eqn]$ ), p can be estimated consistently whenever $[eqn]$ . The question of exactly describing the exceptional set of countably many $[eqn]$ s for which the region $[eqn]$ is inestimable for p, is still open. Finally, we want to mention that although joint consistent estimation of $[eqn]$ and p is impossible using just one sample from the p-spin Curie–Weiss model (3), we still hope to do so using multiple samples.

2. Impossibility of Jointly Estimating (β,p)

We start by showing that joint consistent estimation of $[eqn]$ and p using only one sample $[eqn]$ from the model (3) is, in general, impossible. Towards this, for every $[eqn]$ , define a set:

[eqn]

where D denotes the set of all integers $[eqn]$ , and

[eqn]

Theorem 1. For every $[eqn]$ , such that $[eqn]$ , there does not exist any sequence of estimators (measurable functions of $[eqn]$ ), which is consistent for $[eqn]$ under the model (3).

The following lemma is crucial for proving Theorem 1.

Lemma 1. For every $[eqn]$ , denote the distribution of a Rademacher random variable with mean m by μ. Then, the product measure $[eqn]$ is contiguous to $[eqn]$ for all $[eqn]$ , and all $[eqn]$ .

Proof. To begin with, note that on the event $[eqn]$ , we have:

[eqn]

Now, by Lemma 3.2 and Lemma 3.4 in [9], we have:

[eqn]

Also, by Taylor expansion, we have:

[eqn]

Hence, we have:

[eqn]

for some constant $[eqn]$ . Hence, for any event $[eqn]$ , we have:

[eqn]

thereby providing the following:

[eqn]

Now, suppose that $[eqn]$ . Then, for every $[eqn]$ , we have the following:

[eqn]

Taking the limit as $[eqn]$ throughout the above inequality, we can conclude that $[eqn]$ , which completes the proof of Lemma 1. □

With Lemma 1 in hand, we are now ready to prove Theorem 1.

Proof of Theorem 1Suppose that there exists a sequence $[eqn]$ of consistent estimators of $[eqn]$ on $[eqn]$ . Fixing two different points $[eqn]$ and $[eqn]$ , we can construct disjoint neighborhoods $[eqn]$ and $[eqn]$ around them, respectively. Through the consistency of $[eqn]$ on $[eqn]$ , we have the following:

[eqn]

From Lemma 1, we have the following:

[eqn]

which contradicts the facts that $[eqn]$ , $[eqn]$ are disjointed, and $[eqn]$ is a probability measure. □

Remark 1. Our argument implies that the consistent estimation of p is impossible if β is unknown. Because, if there were such a consistent estimator $[eqn]$ , then we could choose $[eqn]$ and $[eqn]$ from some $[eqn]$ ; construct disjoint open intervals $[eqn]$ and $[eqn]$ around $[eqn]$ and $[eqn]$ , respectively; and argue from contiguity that $[eqn]$ for $[eqn]$ , which is a contradiction.

The question is how do the sets $[eqn]$ look for different values of $[eqn]$ ? To answer this, let us define for each $[eqn]$ ,

[eqn]

It follows from [9] that for $[eqn]$ , the function $[eqn]$ has a unique postitive maximizer $[eqn]$ if $[eqn]$ . By convention, we define $[eqn]$ .

Proposition 1. The sequence $[eqn]$ as $[eqn]$ , which can be sorted in ascending order as $[eqn]$ . Further, if we denote $[eqn]$ to be the projection of $[eqn]$ onto the p-coordinate, then

[eqn]

The proof of Proposition 1 is technical, and is given in Appendix A. Note that for $[eqn]$ , the set $[eqn]$ is uniquely determined by its projection $[eqn]$ , because every $[eqn]$ uniquely corresponds to the element $[eqn]$ , where $[eqn]$ . Proposition 1 thus provides a complete description of the family of sets $[eqn]$ for all values of $[eqn]$ , and states that although each of these sets is finite, we can choose sets as large as possible from this family. The following proposition describes the set $[eqn]$ .

Proposition 2.

$[eqn]$ .*

The proof of Proposition 2 follows from the following three facts proved in [9]:

If $[eqn]$ , then 0 is the unique global maximizer of $[eqn]$ .
If $[eqn]$ , then $[eqn]$ has two different non-negative global maximizers.
If $[eqn]$ , then $[eqn]$ has a unique non-negative global maximizer, which happens to be positive.

Remark 2. All of the results in this section also hold almost surely for the somewhat more general p-spin Erdös–Rényi Ising model [27], where the interaction tensor $[eqn]$ in (2) is given by $[eqn]$ , with the entries $[eqn]$ of the tensor $[eqn]$ being i.i.d. Bernoulli random variables with a mean α, for some fixed $[eqn]$ . This follows from the fact that the Erdös–Rényi Ising and Curie–Weiss measures are mutually contiguous, which follows from Lemma 6.6 in [28].

3. Estimation of p When β Is Known

Throughout this section, we assume that the parameter $[eqn]$ is known. To begin with, for every $[eqn]$ , let us define the following two sets:

[eqn]

For Lemma A.1 in [10], the set $[eqn]$ is empty if $[eqn]$ , and is of the form $[eqn]$ for some integer $[eqn]$ otherwise.

Theorem 2. When β is known, no sequences of estimators that are consistent with $[eqn]$ exist.

Proof. It follows from Proposition 2 and Lemma 1, that the N-fold product measure of the mean-0 Rademacher distribution is contiguous to $[eqn]$ for all $[eqn]$ . The proof now follows from the argument given in Remark 1. □

Remark 3. One can consider an extension of the model (3) by adding an external magnetic field parameter h as follows:

[eqn]

It has been shown in [9] that the consistent estimation of h is always possible when β is known. Proposition 2 shows that this is not the case for estimating p, which is impossible if $[eqn]$ . This inestimability region also coincides with that for estimating β when p is known (see [10]).

Now, we turn our attention to the set $[eqn]$ , and we break it down to the following two parts:

[eqn]

where $[eqn]$ is defined as the largest non-negative maximizer of $[eqn]$ .

Theorem 3. When β is known, there does not exist any sequence of estimators that is consistent for $[eqn]$ .

Proof. Once again, from Lemma 1, we know that the N-fold product of the Rademacher distribution with a mean of $[eqn]$ is contiguous to both $[eqn]$ and $[eqn]$ , where $[eqn]$ and $[eqn]$ is such that $[eqn]$ . Clearly, $[eqn]$ . Once again, the rest of the proof follows from the arguments of Remark 1. □

We now show that it is possible to estimate p consistently on the set $[eqn]$ , if $[eqn]$ is known. Towards this, let us define the following estimator of p:

[eqn]

The intuition behind constructing our estimator, is that $[eqn]$ as $[eqn]$ ; hence, the distance between $[eqn]$ and $[eqn]$ is expected to be minimized at $[eqn]$ for large N. The next theorem shows that with a high probability, this is indeed the case.

Theorem 4. For every $[eqn]$ and $[eqn]$ , we have

[eqn]

for some constant $[eqn]$ not depending on N.

Proof. Suppose that $[eqn]$ for some $[eqn]$ . It follows from Lemma A.1 in [10] and the proof of Proposition 1 that as $[eqn]$ , $[eqn]$ if $[eqn]$ and $[eqn]$ otherwise. Hence, in any case, $[eqn]$ is not an accumulation point of the sequence $[eqn]$ . This shows that there exists $[eqn]$ , such that $[eqn]$ for all $[eqn]$ not equal to p.Now, it follows from the proofs of Lemma 3.1 and Lemma 3.3 in [9] that

[eqn]

for some constant $[eqn]$ not depending on N. Also, it follows from the proofs of Lemma 3.1 and Lemma 3.3 in [9] and the proof of Proposition 1, that

[eqn]

for some constant $[eqn]$ not depending on N. Define $[eqn]$ and $[eqn]$ . It is clear that $[eqn]$ on the event $[eqn]$ . Theorem 4 now follows, as $[eqn]$ . □

Remark 4. One can tune the parameter $[eqn]$ in (5) to allow for an optimal number of integers over which the minimization is to be performed, in order to ensure proper convergence of $[eqn]$ . Also, in case $[eqn]$ , eventually $[eqn]$ , so one just stops at the largest q, for which $[eqn]$ in the minimization (5).

In the reverse problem of estimating $[eqn]$ with known p, consistent estimation is possible for all $[eqn]$ (see [9]). In contrast, a further challenging inestimability region $[eqn]$ arises in our problem of estimating p with known $[eqn]$ . The next proposition shows that for almost all $[eqn]$ , the set $[eqn]$ is actually empty, i.e., the entire region $[eqn]$ is estimable.

Proposition 3. There exists a countable set $[eqn]$ , such that $[eqn]$ for $[eqn]$ .

The proof of Proposition 3 is given in Appendix B. However, in the proof we provide an exact enumeration of one countable set $[eqn]$ satisfying Proposition 3, namely:

[eqn]

our analysis did not allow us to dig in any further. In particular, the following question is open:

Open Problem: Give an exact enumeration of the minimal countable set $[eqn]$ satisfying Proposition 3.

Note that if $[eqn]$ for some integers $[eqn]$ , then $[eqn]$ is a stationary point of both $[eqn]$ and $[eqn]$ . We can actually say that $[eqn]$ if this stationary point turns out to be a global maximizer of both these functions. However, checking the last condition will likely involve more intricate analysis, and is left as an open question.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ising E. Beitrag zur theorie der ferromagnetismus Z. für Phys.19253125325810.1007/BF 02980577 · doi ↗
2Geman S. Graffigne C. Markov random field image models and their applications to computer vision Proceedings of the International Congress of Mathematicians Berkeley, CA, USA 3–11 August 198614961517
3Hopfield J.J. Neural networks and physical systems with emergent collective computational abilities Proc. Natl. Acad. Sci. USA 1982792554255810.1073/pnas.79.8.25546953413 PMC 346238 · doi ↗ · pubmed ↗
4Banerjee S. Carlin B.P. Gelfand A.E. Hierarchical Modeling and Analysis for Spatial Data Chapman and Hall/CRC Boca Raton, FL, USA 2014
5Green P.J. Richardson S. Hidden Markov models and disease mapping J. Am. Stat. Assoc.2002971055107010.1198/016214502388618870 · doi ↗
6Aslanov L.A. Crystal symmetry and atomic interactions model Comput. Math. Appl.19881644345110.1016/0898-1221(88)90234-9 · doi ↗
7Barra A. Notes on ferromagnetic p-spin and REM Math. Methods Appl. Sci.20093278379710.1002/mma.1065 · doi ↗
8Derrida B. Random-energy model: Limit of a family of disordered models Phys. Lett.1980457910.1103/Phys Rev Lett.45.79 · doi ↗