The Wishart planted ensemble: A tunably-rugged pairwise Ising model with a first-order phase transition
Firas Hamze, Jack Raymond, Christopher A. Pattison, Katja Biswas, and, Helmut G. Katzgraber

TL;DR
The paper introduces the Wishart planted ensemble, a tunably-hard Ising model with a first-order phase transition, useful for benchmarking optimization algorithms and understanding rugged energy landscapes.
Contribution
It presents a new class of Ising models with tunable hardness, analytical insights into their thermodynamics, and characterization of their phase transition and energy landscape.
Findings
Exhibits a first-order phase transition in temperature.
Displays a wide range of algorithmic hardness with a peak in solution time.
Analytical expressions for hardness peak and energy distribution are derived.
Abstract
We propose the Wishart planted ensemble, a class of zero-field Ising models with tunable algorithmic hardness and specifiable (or planted) ground state. The problem class arises from a simple procedure for generating a family of random integer programming problems with specific statistical symmetry properties, but turns out to have intimate connections to a sign-inverted variant of the Hopfield model. The Hamiltonian contains only 2-spin interactions, with the coupler matrix following a type of Wishart distribution. The class exhibits a classical first-order phase transition in temperature. For some parameter settings the model has a locally-stable paramagnetic state, a feature which correlates strongly with difficulty in finding the ground state and suggests an extremely rugged energy landscape. We analytically probe the ensemble thermodynamic properties by deriving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Wishart planted ensemble:
A tunably rugged pairwise Ising model with a first-order phase transition
Firas Hamze
Microsoft Quantum, Microsoft, Redmond, Washington 98052, USA
D-Wave Systems, Inc., Burnaby, British Columbia, Canada, V5G 4M9
Jack Raymond
D-Wave Systems, Inc., Burnaby, British Columbia, Canada, V5G 4M9
Christopher A. Pattison
Department of Physics, California Institute of Technology, Pasadena, California 91125, USA
Department of Physics and Astronomy, Texas A&M University, College Station, Texas 77843-4242, USA
Katja Biswas
Department of Physics and Astronomy, Texas A&M University, College Station, Texas 77843-4242, USA
Helmut G. Katzgraber
Microsoft Quantum, Microsoft, Redmond, WA 98052, USA
Department of Physics and Astronomy, Texas A&M University, College Station, Texas 77843-4242, USA
Santa Fe Institute, Santa Fe, New Mexico 87501 USA
Abstract
We propose the Wishart planted ensemble, a class of zero-field Ising models with tunable algorithmic hardness and specifiable (or planted) ground state. The problem class arises from a simple procedure for generating a family of random integer programming problems with specific statistical symmetry properties but turns out to have intimate connections to a sign-inverted variant of the Hopfield model. The Hamiltonian contains only -spin interactions, with the coupler matrix following a type of Wishart distribution. The class exhibits a classical first-order phase transition in temperature. For some parameter settings the model has a locally stable paramagnetic state, a feature which correlates strongly with difficulty in finding the ground state and suggests an extremely rugged energy landscape. We analytically probe the ensemble thermodynamic properties by deriving the Thouless-Anderson-Palmer equations and free energy and corroborate the results with a replica and annealed approximation analysis; extensive Monte Carlo simulations confirm our predictions of the first-order transition temperature. The class exhibits a wide variation in algorithmic hardness as a generation parameter is varied, with a pronounced easy-hard-easy profile and peak in solution time towering many orders of magnitude over that of the easy regimes. By deriving the ensemble-averaged energy distribution and taking into account finite-precision representation, we propose an analytical expression for the location of the hardness peak and show that at fixed precision, the number of constraints in the integer program must increase with system size to yield truly hard problems. The Wishart planted ensemble is interesting for its peculiar physical properties and provides a useful and analytically transparent set of problems for benchmarking optimization algorithms.
I Introduction
The interface between physics and computational complexity has yielded fruitful insights over decades of research. Hard optimization problems—which are ubiquitous throughout the natural sciences and domains such as operations research—are of significant importance to humans and are widely believed to admit no efficient algorithms for their solution over all members of their class. It was recognized that such problems show analogous features to those found in statistical mechanical systems, for example the existence of algorithmic phase transitions Hogg et al. (1996); Monasson et al. (1999), under which typical problems show a dramatic increase in the difficulty faced by known exact and heuristic algorithms.
In some cases, insights from the physics of spin glasses and disordered systems have inspired remarkable new algorithms; for example Mézard, Parisi, and Zecchina Mézard et al. (2002) studied algorithmic hardness transitions in random 3-satisfiability (3-SAT) problems using tools from statistical physics and subsequently proposed survey propagation as a promising method for solving such problems. In addition, physics-based approaches have suggested ensembles of very hard problems; for example locked constraint satisfaction problems Zdeborová and Mézard (2008) owe their difficulty to the fragmentation of the solution space into widely separated sets. The NK model Kauffman and Levin (1987); Kauffman and Weinberger (1989) is a well-studied class of tunably rugged cost functions proposed to capture the complexity of a variety of physical and biological systems.
An important special class of hard problem ensembles are those whose solutions are known to the constructor; these are often known as optimization problems with planted solutions. Aside from their theoretical interest, such problems are noteworthy for several reasons. They may, for example, serve as candidates for cryptographic one-way functions, that is functions whose outputs are cheaply computable for any input but for which determining an input yielding a given output is hard. Furthermore, they serve as useful benchmark problems for evaluating heuristic or exact algorithms. In recent years, the need for such benchmarks has increased with the advent of physical devices implementing quantum annealing Johnson et al. (2011) and related (e.g., Ref. Wang et al. (2013)) algorithms. In such situations, it is desirable to not only have access to a set of problems of tunable hardness but to also be able to compare an algorithm’s performance with the correct answer.
A physics-based approach for generating hard 3-SAT problems with planted solutions was proposed by Barthel et al. Barthel et al. (2002); more recently, Krzakala and Zdeborová presented a technique known as quiet planting Krzakala and Zdeborová (2009) for devising graph -coloring problems with known solutions whose properties are indistinguishable from those of a random ensemble. This concept can be generalized to a variety of sparse problems, and has a close connection to reconstruction on trees Ricci-Tersenghi et al. (2019).
While the aforementioned techniques and analyses have yielded numerous elegant insights, they all share the undesirable property of considering problems that are structurally far-removed from contemporary physics-based optimization devices. -satisfiability problems, for example, require energy functions to include terms whose value depends on groups of variables; in all the previously mentioned work over binary variables, . Realistic spin system models and optimization hardware on the other hand are typically restricted to pairwise interactions; emulating higher-order interactions on such systems can require tremendous overhead. In contrast, -coloring problems directly correspond to the antiferromagetic Potts model of statistical physics and are thus expressible in terms of -body interactions, but must be larger than to yield hard problems as -coloring instances can be solved in linear time. On devices natively encoding problems consisting of binary variables, this can be problematic. Various techniques have recently been published Hen et al. (2015); Wang et al. (2017); Hamze et al. (2018); Hen (2019) for constructing planted Ising instances on sparse graph topologies, which in some cases appear to yield quite difficult problems Hamze et al. (2018), but in common with short-ranged disordered models in general, most of their known properties are inferred from numerical simulation and much remains unexplored about which features make them amenable as benchmarks for given algorithms.
In this paper, we propose a simple randomized procedure for generating systems of binary-constrained integer programs with a known solution Pattison et al. (2018). The system coefficients are generated according to a specific type of correlated multivariate Gaussian distribution. When translated to an effective Ising Hamiltonian, we obtain a novel type of disordered system with known ground state which we call the Wishart planted ensemble; the name is inspired by the distribution followed by the resultant random matrix of couplers.
The Hamiltonian includes interactions among all pairs of variables; this aspect makes it a less-than-perfect fit for testing on devices which implement short-range topologies. Unlike hard problem ensembles based on -SAT or graph coloring however, the variable domains are binary valued and the interactions are pairwise. Vitally, the ensemble displays an rich array of thermodynamic and computational properties of considerable relevance to both classical and quantum algorithms. Computationally, the problems can be tuned to range in difficulty from very easy to extremely difficult at quite modest system sizes. We emphasize that nearly all statements about problem “hardness” in this paper refer to the empirically observed typical-case difficulty encountered by heuristics (and in all likelihood exact algorithms) and have no bearing on theoretical computer science questions concerned with worst-case difficulty.
A striking physical property of the Wishart ensemble is the existence of a first-order phase transition, a discontinuous jump in the free energy derivative at some system-dependent critical temperature. A key parameter in the generation procedure is , which specifies the number-of-equations–to–number-of-variables ratio; is analogous to parameters such as the clause-to-variable ratio in the satisfiability problems and exerts critical influence on the physical and algorithmic complexity. By deriving the Thouless-Anderson-Palmer Thouless et al. (1977) equations for the ensemble with special care to account for the correlated couplers, we obtain the mean-field free energy, from which we find that at any finite , the internal energy drops discontinuously from some excited value at some -dependent critical temperature . We verify this temperature and the nature of the transition with extensive parallel tempering Monte Carlo Geyer (1991); Hukushima and Nemoto (1996) simulations, which reveal that the system converges to its asymptotic predicted properties quite rapidly. For large , the magnitude of the discontinuity decreases monotonically and the thermodynamics smoothly change character toward a traditional second-order ferromagnetic transition. When , the paramagnetic state, i.e., the set of all configurations having no correlation with the planted solution, is stable at any nonzero temperature. This feature signals difficulty for classical heuristic algorithms as it behaves as a deceptive dynamical “trap.” More specifically, following free-energy gradients as done by methods like simulated annealing Kirkpatrick et al. (1983) will overwhelmingly lead to solutions far from the true optimum. Only by fortuitous initialization within the ground-state basin of attraction will the problem be solved with high probability. It turns out that modulates the size of the ground-state basin, with larger values increasing the probability of solution by lucky initialization. When , on the other hand, the paramagnetic state becomes unstable for some temperature ; at that point local algorithms can successfully find the solution by “rolling downhill” and hence such problems are typically easy. Remarkably, Barthel et al. Barthel et al. (2002) also argue that the hardness of their hard 3-SAT planted ensemble is predicated on the existence of a first-order ferromagnetic transition. We confirm the results of the Thouless-Anderson-Palmer (TAP) analysis with two alternative approaches: the replica method Edwards and Anderson (1975) and the annealed approximation.
First-order phase transitions are well known to exist in the -state Potts model for and in some Ising-like systems such as the Blume-Capel model Blume (1966); Capel (1966) in which variables nonetheless assume more than two states. They are, however, quite unusual in the pairwise Ising model in zero field (though see Ref. Yedidia and Georges (1990)); in particular neither the Sherrington-Kirkpatrick Sherrington and Kirkpatrick (1975) nor the Edwards-Anderson Edwards and Anderson (1975) spin glasses exhibit such a transition. The Hopfield model Hopfield (1982) displays a first-order transition between the spin glass and retrieval phases when a relatively small number of patterns () are stored Amit et al. (1987), though as discussed in Sec. III, the Hopfield model is less appropriate as a class of problems with controllable hardness than the Wishart planted ensemble. The puzzling presence of such a transition in our system is accounted for by noting that the couplers are correlated to enforce the existence of the planted solution rather than independently disordered. Simulating systems with a first-order transition is widely known to be challenging.
The Wishart planted ensemble is of particular interest because it shares several features with models that have been shown by Nishimori and Takada Nishimori and Takada (2017) to be promising candidates for exhibiting (limited) exponential speedup when simulated using so-called nonstoquastic quantum driver Hamitonians; such systems cannot be simulated classically and hence represent a “strong” type of quantum effect. The advantage of the Wishart planted ensemble over the -spin models considered in Ref. Nishimori and Takada (2017) is once again that the interactions are naturally pairwise rather than requiring terms of order .
Suitable nonstoquastic devices are not available as of the writing of this paper, but the presented model can be used to explore interesting problems on near-term stoquastic devices as well. In particular, the combination of a rough multimodal landscape (replica symmetry breaking) in the space orthogonal to the planted solution alongside tunable control of the energy of the planted solution (at leading order in ) opens up the possibility to explore the hard population transfer problem recently proposed by Smelyanskiy et al. Smelyanskiy et al. (2018). This model is useful for distinguishing physical quantum dynamics from classical dynamics such as quantum Monte Carlo in transverse field Ising models due to the multipath tunneling phenomena (miniband resonance). In principle one can prepare a state in a planted mode, tune the mode energy to equality (resonance) with other modes, and explore the rate of escape.
While this paper is mostly concerned with classical properties, these intriguing connections to both types of quantum devices will certainly be explored in future work.
Computationally, the Wishart planted ensemble emerges from our procedure for generating a certain type of random integer linear program Papadimitriou and Steiglitz (1998) and has numerous connections with well-studied Fu and Anderson (1989); Mertens (2001); Borgs et al. (2001); Karmarkar et al. (1986); Lagarias and Odlyzko (1985) optimization problems such as the number partitioning and subset sum Garey and Johnson (1979) problems. In common to these problems, the allowable precision over the problem parameters has important influence over combinatorial properties. There is also, however a crucial distinction from these problems; when the parameter is fixed, the number of equations in the integer program scales linearly with the number of variables rather than remain constant (at unity, in the case of the previous two problems).
Random problem ensembles without a planted solution typically display a parametrized “easy-hard” difficulty transition (e.g., Refs. Gent and Walsh (1996) andMertens (2001)) in their optimization variants. The Wishart ensemble, on the other hand, shows an “easy-hard-easy” character. One of the easy regimes is due to the presence of a very large number of acceptable solutions coexisting with the planted ground state; given the task of locating any one of them, an optimization method has a relatively high likelihood of success. The other is due to the planting procedure effectively constraining the search space, providing “hints” to the algorithm toward the solution. The hard regime, however, is seen to be extremely difficult: Numerical experiments using a distributed, state-of-the art parallel tempering implementation show a dramatic hardness peak for small () system sizes. When , parallel tempering Monte Carlo fails to locate even an approximate solution under lax and permissive target criteria within the allotted simulation time of around 11 h on contemporary high-speed hardware.
While we derive the Wishart planted ensemble in terms of a somewhat abstract random integer program, the resultant model turns out to have a remarkable structural similarity to the Hopfield model Hopfield (1982); Amit et al. (1987) of biological neurons and more particularly, a sign-inverted variant Nokura (1998) proposed to model neural “unlearning”. As discussed in Sec. III, there turn out to be several important differences between the models; nonetheless, it is exciting that such completely different starting points as integer programming and unlearning in neural networks result in models with close connections.
The rest of the paper is structured as follows. In Sec. II we describe our procedure to generate Wishart planted instances based on random integer programming problems. We also discuss the ensemble’s computational properties and how to represent its members, whose parameters are defined to take continuous values, with finite precision. Section III analyzes the physical properties of the class; the TAP free energy (derived in the Appendix) is analyzed and shown to have global minima along a one-dimensional subspace of the set of spin magnetizations. Furthermore, it has a locally stable paramagnetic state for all temperatures when and for when . These properties give rise to the first-order transition between the paramagnetic and planted states; we determine the transition temperature in terms of . We show that as grows, the system begins to increasingly behave like a ferromagnet, i.e., with a second-order transition. The predicted first-order transition temperature is validated with extensive Monte Carlo simulation. In Sec. IV, we turn our attention to empirical algorithmic properties; under finite-precision representation, the ensemble displays an easy-hard-easy relation with respect to parallel tempering time to solution as is varied. After showing that the ensemble-averaged energies of the Wishart planted ensemble follow a gamma distribution and introducing the notion of an intrinsic search space, we analytically predict the location of the hardness peak for any target energy threshold and confirm the prediction using optimized parallel tempering simulations. The prediction is shown to be precisely accurate even for approximate solution criteria. We show that generating difficult problems under constant precision restriction requires scaling the number of constraints in the integer program approximately linearly. The Appendices VII contain most of our calculations, and confirm the TAP results through replica analysis and an annealed approximation.
II The Wishart planted ensemble
II.1 Generation procedure
Our goal is to construct an ensemble of zero-field Ising Hamiltonians over the -spin complete graph with planted ground state , in other words, having the form
[TABLE]
where and refer to configurations on the -spin Ising model configuration space and such that
[TABLE]
When not explicitly stated will be taken to be the ferromagnetic ground state and its image, as the minimizer to such a problem can be subsequently concealed by gauge randomization.
Consider the real-valued matrix , whose columns are denoted by for . The value of turns out to modulate the ensemble properties such as thermodynamics and hardness; in this work we are primarily concerned with the regime in which scales linearly as a constant factor of , i.e., for .
Given a desired ground state , our procedure seeks to construct a consistent homogeneous Ising-constrained linear system with as a solution, in other words, to obtain such that
[TABLE]
This is because the positive semidefinite quadratic form
[TABLE]
would then attain its minimum value of zero at , and hence if we define
[TABLE]
and zero its diagonal to form
[TABLE]
then the Hamiltonian
[TABLE]
attains its ground state at with energy
[TABLE]
where the property that for has been used. The scaling by in the definition of is to make the energy extensive, i.e., scaling linearly with system size.
We obtain the linear system by individually generating the columns of such that . We propose a simple projective method for efficiently generating correlated Gaussian variates satisfying the summation and other desirable properties. More precisely, the column vectors are set to be distributed as
[TABLE]
where the covariance matrix is given by
[TABLE]
with the -dimensional identity matrix. In other words, for each column vector , all elements have unit variance, and for all variable pairs , the covariances are
[TABLE]
Note that as expected; given any components of , the remaining one follows deterministically. To generate the column vectors, we first determine the square root of , i.e., such that , to be
[TABLE]
We then iterate over the loop described in Algorithm 1:
One can readily verify that for all and that is distributed according to Eq. (6). While the components of are correlated, the Gaussian nonetheless has strong structure that simplifies the subsequent analysis. Following an appropriate gauge transformation to the ferromagnetic state, the elements of imply that the distribution is exchangeable, i.e., invariant to a permutation of the components. Exchangeability is a stronger property than stationarity, which merely requires the covariance of components and to depend only on . The property is used in Sec. III.1 when deriving the TAP free energy and again in Sec. IV.1 when obtaining the ensemble energy distribution.
The random matrix follows a Wishart distribution, a well-studied matrix generalization of the distribution; in light of this we call our problem class the Wishart planted ensemble (WPE). When , the support of the Wishart density lies on a low-dimensional subspace of matrices Uhlig (1994). As we see in Sec. III.2, its spectral distribution is a key feature underlying the phase behavior. We note also that while sampling according to the Gaussian simplifies the analysis it is by no means necessary in practice for our results to hold in the large limit. For example, if used to obtain were vectors of length consisting of independent and uniform variates rather than uncorrelated Gaussians, then central limit arguments show that is nonetheless asymptotically Wishart. We note however, that using finite precision introduces the possibility that states other than may attain the ground-state energy. The optimization problem in this paper is defined to be that of locating any ground state; the algorithmic implications of the presence of several solutions is discussed in the next section and in detail in Sec. IV.
II.2 Computational properties
Before proceeding to an examination of the WPE thermodyamics, we discuss its properties from a computational perspective in light of its interpretation as a constrained homogeneous linear system. Readers familiar with related settings such as linear error correcting codes should bear in mind that arithmetic here is over the real numbers rather than a finite field such as .
Given a matrix , the task of finding the ground state of Eq. (5) is equivalent to finding a solution to the following NP-hard problem called integer programming feasibility
[TABLE]
Suspending for a moment the fact that scales with in the WPE, we can obtain a sense for how it may impact problem difficulty. If consists of independent columns, then . The dimension of the nullspace of implies the search space for potential Ising state solutions to Eq. (9), and so the larger it is, the more difficult the problem may be guessed to be. In particular, when consists of a single row () and hence an dimensional nullspace, the problem may be surmised to be maximally hard. When is specified with relatively low precision, this turns out to not be the case; exponentially many solutions other than overwhelmingly appear as increases and so locating any such satisfying can be quite easy. This is reflected in the Ising Hamiltonian (5); when , can be verified to be fully frustrated com (a), which gives rise to tremendous low-energy degeneracy.
At the other extreme in which assumes large values, the nullspace of becomes one-dimensional and hence the two Ising solutions to Eq. (9) are trivially recovered by inspection from a vector spanning the nullspace. In the Hamiltonian in Eq. (5), making large results in a ferromagnetic system. To see this, we note that
[TABLE]
where the limit follows from the law of large numbers. From the covariance matrix , the couplers thus uniformly approach
[TABLE]
implying that the system reduces to a gauge-transformed and rescaled Curie-Weiss ferromagnet, whose ground state is easy to find due to the lack of frustration. Hence one may reasonably guess that for a given allowable precision, the most difficult problems occur for some intermediate value of . This easy-hard-easy profile is shown to indeed hold and is discussed in Sec. IV, in which we make more explicit the role of and develop the conjecture that the difficulty peak occurs at the value of in which the fewest number of solutions occur relative to the volume of the nullspace.
A prototypical special case of Eq. (9), corresponding to , is the subset sum problem; its decision variant, that of establishing whether a satisfying to Eq. (9) exists, was one of Karp’s Karp (1972) original 21 NP-complete problems. We note that must be specifiable to arbitrary precision for the optimization variant in Eq. (9) to avoid efficient solution via the technique of dynamic programming as the complexity of this algorithm scales exponentially in the number of bits needed to specify the coefficients. This requirement holds more generally: when the value of is fixed to any integer and the matrix elements belong to a finite set, integer programming problems of the form of Eq. (9) can be solved in pseudopolynomial time using the same technique Papadimitriou (1981).
Subset sum problems with feasible solutions have been of particular interest due to their complexity underpinning the security of an early family of cryptographic systems Merkle and Hellman (1978). Remarkably, a method has been devised Lagarias and Odlyzko (1985) for solving with high probability a family of low-density random subset sum problems, in which the maximum element of the single-column is large compared to , based on an idea known as lattice basis reduction Lenstra et al. (1982).
A well-studied further specialization of the subset sum problem is known as the number partitioning problem, corresponding to the task of partitioning a base set of positive integers into two blocks with sums of minimum absolute difference (or “discrepancy”). When the integers are chosen independently from a uniform distribution (the problem does not respect a planted solution), number partitioning displays several interesting properties, in particular an algorithmic easy-hard phase transition Gent and Walsh (1996); Borgs et al. (2001); Mertens (2001, 2006). More specifically, if the positive integers forming the base set are bounded by for a fixed , then when , partitions with a discrepancy of zero exist with vanishing probability, and typical instances become difficult for known heuristic and complete algorithms. Conversely, when lies below this threshold, partitions with zero discrepancy are abundant, and the problems are easily solved.
While the subset sum and number partitioning problems bear obvious connections to the model presented in this work, it is apparent that our assumptions and focus are different. Broadly speaking, in the WPE a state must be found which now simultaneously satisfies the relations in the integer program; the fact that scales as rather than remaining fixed introduces consequences in the algorithmic hardness properties.
II.3 Representing a WPE instance
The WPE construction presented in Sec. II.1 was defined in terms of Gaussian variables; in a practical implementation, however, one must contend with finite precision and generally cannot represent the required continuous-valued parameters. In the WPE, this can be dealt with in one of two ways. The first, which will be examined in this section, is to maintain the correlation structure defined by but replace the Gaussian variates with a discrete zero-mean, unit variance ensemble; the simplest such choice is to have uniformly and independently take the two values (sometimes called a Rademacher distribution). This turns out to allow an exact representation of the problem parameters using a logarithmic (in ) number of bits; further, as mentioned in Sec. II.1, the coupler matrix is nonetheless asymptotically Wishart and the same physical properties derived in this paper result.
The second and more heuristic way to represent a problem is to simply round the parameters to the closest machine-representable number. This introduces numerical errors; for example, the planted solution may no longer have its theoretically intended energy. Consequently, one must introduce a tolerance on what defines a “solution.” We use this approach in our algorithmic hardness experiments because as discussed in Sec. IV.3, it minimizes the potential discrepancy between the finite-size statistical properties and our analytical predictions, which may arise due to using non-Gaussian generator variables on the small systems we were forced to consider. Nonetheless, using the theoretically-derived energy histogram, we are able to account for the observed hardness peak under this approximate representation.
We now discuss the exact discretized WPE over a Rademacher distribution, i.e., where are independently and uniformly in rather than drawn from for and . We first rewrite
[TABLE]
where the integer-valued matrix
[TABLE]
i.e.,
[TABLE]
Because
[TABLE]
then up to the leading constant of , the elements of may assume values in the set of equally spaced integers
[TABLE]
Thus, in the integer programming formulation, the Rademacher-discretized WPE takes approximately bits to encode the parameters.
Obtaining the required precision in the Hamiltonian formulation (i.e., on ) is more messy but analogous. Recalling that
[TABLE]
then using the previous restriction on the values of we can show that
[TABLE]
where
[TABLE]
Not all elements in this set of integers are actually attainable by , but it provides a useful upper bound on the number of possible values and shows that representing in the Rademacher-discretized WPE takes no more than on the order of bits, which when is .
III Thermodynamic properties
Analyzing long-range disordered systems has a rich history in statistical mechanics Nishimori (2001). The Sherrington-Kirkpatrick (SK) model Sherrington and Kirkpatrick (1975), a fully connected Ising model with independently sampled Gaussian bond strengths, is a prototypical example of such systems. The replica method Edwards and Anderson (1975) is a powerful framework for performing such analyses, and has yielded great successes such as the solution to the SK model Parisi (1979) which have since been rigorously proved to be correct (see, for example, Ref. Panchenko (2013) and the references therein). This approach is pursued in Appendix VII.5 were we are able to recover the transition properties developed in this section, and through connections with the anti-Hopfield model identify some additional interesting transitions within the model.
A different and in many senses complementary approach to the analysis of weakly coupled, fully connected disordered systems is due to Thouless, Anderson, and Palmer Thouless et al. (1977). The TAP equations are a set of nonlinear relations satisfied by the local magnetizations for a given instance. They can be arrived at in one of several ways Opper and Winther (2001). For the SK model, they are often interpreted as correcting the “naïve” mean field equations with a so-called Onsager reaction term. The approach of Plefka Plefka (1982) arrives at the TAP equations by second-order expansion of the free energy at constant magnetization, which turns out to have an appealing information geometric interpretation Tanaka (2000). In this section we use another approach called the cavity method Mézard et al. (1987).
Determination of the TAP equations for the WPE is somewhat complicated by the fact that are not independent variates as they are for the SK model. The TAP equations for systems with correlated have been determined in the past: A notable example, which turns out to have a close connection to our ensemble, is the Hopfield model Hopfield (1982) of a biological neural network. The couplings of the Hopfield model are given by
[TABLE]
where the vectors , known as patterns to be stored for later retrieval, consist of independent zero-mean binary random variables. The physics of the Hopfield model was first studied with the replica approach by Amit et al. Amit et al. (1985, 1987); the TAP equations were derived by Mézard et al. Mézard et al. (1987) using the cavity method, but yielded results inconsistent with the replica analysis. TAP equations consistent with Ref. Amit et al. (1985) were obtained by Nakanishi and Takayama Nakanishi and Takayama (1997) using Plefka’s method, and subsequently by Shamir and Sompolinski Shamir and Sompolinsky (2000) via an elegant cavity approach.
The connection of the WPE to the Hopfield model is apparent if we express the elements of defined in Eq. (3) as
[TABLE]
An obvious difference from the Hopfield model is the presence of the leading negation. Consequently, while the Hopfield Hamiltonian tends to favor spin configurations aligned with the patterns , the WPE Hamiltonian penalizes configurations overlapping with the directions . An additional distinction, however, is that while the vectors are independently drawn from the previously defined Gaussian distribution, the components of each vector are now correlated, a property which emerged due to the solution planting procedure. Remarkably, a model defined by negating the sign of the Hopfield model Hamiltonian (only the first of the two differences above) has been proposed as a model of “unlearning” paramagnetic configurations and thereby enhancing learning for biological networks. A replica analysis of this “anti-Hopfield” model was undertaken by Nokura Nokura (1998). The additional “layer” of correlations and the presence of a planted solution in the WPE turns out to lead to very different behavior from that of the anti-Hopfield model; in particular, the anti-Hopfield model has no first-order transition comparable to the transition into the planted solution, and the large- regime is Sherrington-Kirkpatrick-like rather than ferromagnetic.
Another Ising ensemble related to the WPE is the random orthogonal model (ROM) proposed by Parisi and Potters Parisi and Potters (1995), where the matrices are generated by uniformly sampling an orthogonal matrix , forming diagonal matrix whose elements (often additionally assumed to have a trace of zero), and setting . Having been devised with different aims, this model is also quite different from the WPE. First, no planting takes place, so the ground state is uncontrolled. Further, the ground-state energy is only known if an Ising-feasible state happens to be an eigenvector of , which become exceedingly unlikely for even moderately sized systems. Finally, the eigenvalue distributions of the matrices are not the same, nor are the TAP equations presented in Ref. Parisi and Potters (1995). The systems thus have quite differing thermodynamic properties; in particular, a first-order transition analogous to the planting transition is absent in the ROM.
It is natural to wonder about the suitability of the original Hopfield model to the task at hand. Unfortunately, it is not possible to generate sufficiently rugged energy landscapes while maintaining control over the ground states using this model. Hence, while the Hopfield model is an appealing abstraction of an associative memory, it is not as well suited to usage as a tunable planted optimization objective. The Hopfield model does not exhibit, for any setting, a persistent metastable paramagnetic state analogous to that of the WPE. In contrast to the WPE in which the planted ground state is undisturbed while a rough energy landscape is induced in the orthogonal subspace, ruggedness in the Hopfield model arises due to interference among the stored patterns, causing undesired spurious minima Amit et al. (1987). For values of the Hopfield , which specifies the ratio of number of stored patterns to number of variables, of less than around 0.05, a first-order transition between a spin glass and ”retrieval” phase takes place, but finding one of the embedded patterns is quite easy in this case (as it should be for the model’s intended purpose). At somewhat larger values of , the model continues to function as an associative memory, though the patterns are only assured of being local optima rather than ground states. When , only a second-order paramagnetic to spin glass transition takes place, and control over even the local minima is completely lost.
III.1 TAP equations
We derive the TAP equations for the WPE following the two-step cavity approach of Shamir and Sompolinski Shamir and Sompolinsky (2000), making appropriate adaptations to deal with the correlations among the components of each . The complete calculation is shown in Sec. VII.1. Alternatives to the TAP method based on the annealed approximation and replica method are considered in Appendices VII.4 and VII.5, yielding consistent results. In these appendices the connection with the anti-Hopfield model is explored, as is the notion of a planted solution with tuned energy that might permit other uses, such as the population transfer experiment discussed in the introduction.
Let and be the vector of all magnetizations. Define
[TABLE]
and
[TABLE]
The TAP equations for the WPE are
[TABLE]
Solutions to Eq. (13) are stationary points of the following TAP free energy:
[TABLE]
where the local entropy terms are
[TABLE]
To determine the quenched free energy one must consider a weighted sum over minima de Dominicis and Young (1983). In this section we present a more limited analysis that is appropriate when only a single minimum dominates the free energy. The result, which may be an upper bound to the free energy, is consistent with the replica symmetric analysis of Appendix VII.5, and in good agreement with numerical results of Sec. IV. We expect the approximation to be sufficient at higher temperature, and that in other cases the consequences of multiple fixed points may be small following the replica symmetry breaking analyses in related models as discussed in Appendix VII.5 Parisi and Potters (1995); Nokura (1998).
In general, determining the global minima of is a difficult task as lies in an -dimensional space subject to the bound constraints
[TABLE]
In our case, however, the existence of the planted solution considerably simplifies things by assuring that the free-energy minimum necessarily occurs along the ground-state direction:
[TABLE]
To see this, consider the restriction of to the spherical shell given by
[TABLE]
for some or, alternatively,
[TABLE]
for arbitrary unit vector . The claim is that
[TABLE]
minimizes on the shell; from Eq. (14), the term
[TABLE]
is minimized for due to the planting procedure. Further, the term
[TABLE]
is minimized on the shell at the points at which the have the same magnitude, which holds when , and the final term in is independent of on the sphere. The global optimum, which is the minimum over all the shells’ minimizers, thus occurs along
[TABLE]
Recall that
[TABLE]
and that
[TABLE]
Using the fact that for we define the one-dimensional TAP free energy as , i.e.,
[TABLE]
Consider the difference
[TABLE]
present in Eq. (15). Using the higher-order moments of correlated Gaussians, one can show that despite the dependence among the components of , the term scaling is at most . Normalizing by to obtain the free energy per spin, we obtain
[TABLE]
In the thermodynamic limit, the first term vanishes; neglecting the constant term of , the global free-energy minima of Eq. (14) can thus be found as those of the function
[TABLE]
The mimina of must be stationary points of Eq. (17), which are attained for solutions of the equation
[TABLE]
III.2 Thermodynamic properties of the WPE
The WPE displays phase properties quite surprising in the Ising model. Rather than undergo an SK-like spin-glass transition typical of many disordered systems, the presence of a planted solution gives rise to a thermal first-order phase transition. This transition in the WPE occurs at an -dependent temperature such that two stable states contribute equally to the free energy. As expected, at high temperature the disordered paramagnetic state () is the unique global minimizer to . At a critical temperature , this state remains globally minimizing but is no longer unique; another state with is also optimal. Below , the paramagnet ceases to be the minimum, but whether remains a local minimum (metastable state) to or an unstable state (saddle point or maximizer) depends on and the temperature. We find the fascinating result that when , the paramagnetic state is locally stable at all nonzero temperatures. As will be discussed, this property has considerable algorithmic implications.
The transition is first-order because of the discontinuity in the minimizing and in the derivative of the free-energy minimum with respect to ; consequently the internal energy also undergoes a jump at as we will see. Remarkably, for any finite , the transition is always technically first-order, in the sense that there is a discontinuity in the log partition function derivative. The magnitude of this discontinuity decreases with , and the transition gradually segues from first to second order.
III.2.1 Stability of
The stability of can be ascertained from the Hessian of , i.e.,
[TABLE]
The point is a local minimum of if and only if is positive definite. It turns out that the spectral distribution of the random matrix is key in determining the relation between the definiteness of and at given . The limiting eigenvalue distribution of is calculated in Sec. VII.2; somewhat surprisingly, it is closely related to the Marchenko-Pastur Marchenko and Pastur (1967) law for the spectrum of Wishart matrices constructed with uncorrelated Gaussians. The important aspect for now is that is always (asymptotically) the smallest eigenvalue of .
Computing the partial derivatives from the definition of in Eq. (14) we obtain the Hessian matrix elements
[TABLE]
At we thus have
[TABLE]
or compactly, the Hessian at
[TABLE]
with
[TABLE]
is positive definite when is large enough to shift the eigenvalues of to all be positive. From the spectral property mentioned above, stability hence occurs when
[TABLE]
For a given , this condition is equivalent to
[TABLE]
or
[TABLE]
Obviously, is non-negative; when the relation thus holds for all while when , it is satisfied when .
To summarize, when , the point is a locally stable stationary point to the TAP free energy at any temperature; on the other hand, when , the Hessian becomes indefinite at temperature . For temperatures below , note that the eigenvalue of corresponding to eigenvector (the planted solution) becomes negative, which implies that the paramagnetic solution becomes unstable along this direction. Interestingly, the curvature of along is of largest magnitude for at the temperature such that is minimal. In terms of ,
[TABLE]
which is smallest when . As local search heuristics follow free-energy gradients in their quest for the solution, it seems reasonable that this temperature is in some sense optimal to start with when searching for the ground state in the (easy) regime. A further special interpretation of this temperature relates to the anti-Hopfield model, as discussed in Appendix VII.5.
III.2.2 First-order phase transition
The first-order transition for given occurs at a such that for . For any , the stationary points of are determined numerically by iteratively solving the saddle point equation [Eq. (18)]; using binary search in , we can then localize the transition temperatures at any . Figure 1 shows the relation of with , and also shows for the paramagnetic instability point . It is clear that uniformly but as increases the two temperatures converge; this in turn constrains the jump magnitude between the two sides of the transition. While has no closed-form expression in , we can obtain a lower bound as
[TABLE]
which serves as a surprisingly accurate approximation for small but deteriorates as gets larger than .
We illustrate these results by plotting at representative temperatures for a few values of . Consider first the relatively small value of , with landscape illustrated in Fig. 2. The first-order transition at is clearly visible in the middle panel as is now a coexisting global minimum with the other minima close to the end points com (b). Remarkably, when , shown in the bottom panel, remains locally optimal.
In Fig. 3 we see similar behavior with an increased and a persistent metastable state at low temperature for .
From an algorithmic perspective, the low-temperature stability of is quite intriguing as it is widely believed to correlate with genuine combinatorial hardness, sounding the death knell for heuristic approaches attempting to locally optimize trial configurations. This category of algorithms includes workhorses such as simulated annealing and parallel tempering Monte Carlo, which exploit correlations in the energy landscape to search by performing biased random walks. Paramagnetic stability suggests that a problem is hard because it implies that cooling an initially disordered configuration will overwhelmingly lead to states that are also disordered. There is no exploitable information within the landscape guiding the dynamics to the correct region of the state space, and the only hope is to begin the search from the appropriate basin. Disordered states form the vast bulk of the configuration space, however, so the probability of a correct initialization decreases exponentially with system size. While all problems with exhibit this feature, the specific value of turns out to modulate the size of the planted solution basin, i.e., all states from which the ground state can be reached at reasonable cost, which of course exerts a critical effect on the observed performance of heuristics. This aspect can be observed by comparing the intervals between the end points and maxima of in Figs. 2 and 3 and noting them to be wider for . Broadly speaking, small values of lead to smaller basins and hence, one would surmise, more difficult problems. This assertion assumes, however, that the vectors are composed of real numbers and specifiable to arbitrary numerical precision. When the precision is bounded, the interpretation is more involved and is discussed in detail in Sec. IV.
Figure 4 shows how is affected by for a value of , a regime in which we expect to become unstable. The first-order transition is still readily apparent at . For temperatures lower than but higher than , the paramagnet remains a local minimum. Finally, when cooled to , becomes a maximum along the planted ground state direction. From a computational perspective, we anticipate that a well-designed local algorithm will be likely to succeed below this temperature as it follows the free-energy gradients leading it to the planted solution.
To further appreciate the first-order transition, we illustrate the discontinuity at of some key thermodynamic quantities for a few representative values. A key observation is that the nature of the transition changes “gracefully” from first to second order as increases past unity.
First, we recall (see, for example, Ref. Mezard and Montanari (2009)) that the ensemble partition function is related to the mean-field free-energy density as
[TABLE]
Using Laplace’s method, we obtain the “log partition density”
[TABLE]
where an additive term arising from Laplace’s formula vanishes for large . We define the minimum and minimizer of the free-energy density, respectively, as
[TABLE]
and
[TABLE]
The limiting internal energy density is
[TABLE]
so that
[TABLE]
In Fig. 5, we plot for four values of , with the corresponding located at the vertical dashed line. For the first two values of ( and ), the log partition function is clearly a smooth function on either side of the transition but is not differentiable at . At , the two sides still have different derivatives but the discrepancy is diminished, and when it is no longer visibly discernible.
Figure 6 shows the magnetization for the same four values of ; the smaller values show a sharp, discontinuous bifurcation at at which point the system abruptly shifts from being paramagnetic to being closely aligned with the planted solution. The jump is still apparent but attenuated for , while for the bifurcation starts to resemble a second-order ferromagnetic transition.
Finally, the internal energy densities are plotted in Fig. 7. Once again, the energies drop discontinuously at from some excited values to the planted solution ground state energies (per spin) of , which is most evident for the smaller two values of , diminished but still apparent when , and essentially invisible for . Note that the energy gap between the two sides of is larger when than when , in accordance with its higher transition temperature. The gap closes again for large but the energies thereafter progress continuously to their ground-state values.
The gradual reversion to a continuous transition is sensible in light of the construction procedure; when is very large, the matrix converges to , where is the covariance matrix. Since , this implies that in the large limit, the system becomes a (scaled) Curie-Weiss ferromagnet with a transition temperature of .
III.3 Monte Carlo simulation
In this section we perform finite-temperature Monte Carlo simulation of fully connected spin glasses with couplers drawn from a Wishart distribution. To detect the existence of a finite-temperature phase transition we measure the Binder cumulant Binder (1981) given by
[TABLE]
where represents a thermal average, represents a disorder average, and
[TABLE]
is the magnetization per spin. Although the model is disordered, it orders into a ferromagnetic phase because the ferromagnetic ground state is planted by construction. In general, the Binder ratio scales as
[TABLE]
where is polynomial for small values of its argument, is the temperature, and the critical temperature. In general, is independent of and, as such, on can determine by the point where data for different cross. However, because as we shall see the transition is first order, the shape of the Binder ratio as a function of temperature is somewhat different to the commonly known shape in second-order phase transitions (see, for example, Ref. Katzgraber et al. (2006)). When a first-order phase transition is present, the Binder ratio starts at , dips into and then plateaus to . One can determine the critical transition temperature by either extrapolating for , where is the minimum value of . Alternatively, one can study the crossing of data for different system sizes in the range . The former is usually easier to use to detect a transition, because the latter is often close to where data for different do not splay enough to see a clean crossing. However, as shown in Refs. Binder et al. (1992); Vollmayr et al. (1993), , whereas the corrections to scaling in Eq. (23) decrease proportional to . As such, here we determine the position of the critical temperature by studying the crossing of the Binder ratio.
To further corroborate the existence of a first-order transition to a ferromagnetic phase we study the distribution of the magnetization [Eq. (22)]. Close to the transition temperature where latent heat is present the order parameter should signal two competing phases, i.e., peaks at , as well as a competing peak at .
The simulations are done using parallel tempering Monte Carlo Geyer (1991); Hukushima and Nemoto (1996). Thermalization is verified by ensuring that all measured observables are independent of simulation time. We do this by analyzing how the results for all observables vary when the simulation time is successively increased by factors of (logarithmic binning). We require that the last three results for all observables agree within error bars. Simulation parameters are shown in Table 1. Error bars are determined via a jackknife analysis over the disorder.
Figure 8 shows data for the Binder ratio as a function of temperature. Figures 8(a), 8(c), and 8(e) show the Binder ratio for , , and , respectively, over the whole temperature range. A negative dip signaling a first-order transition is clearly visible. Figures 8(b), 8(d), and 8(f) zoom into . As can be seen, the data cross for all three values. From the crossing points we estimate , , and , in perfect agreement with our analytical estimates. Note that for decreasing , i.e., the problems become increasingly harder numerically for smaller values of .
In Fig. 9 data for the magnetization distribution are shown for , , and and . In all three cases peaks at are visible, signaling ferromagnetic order. However, a third peak for grows with increasing , thus signaling a jump in the magnetization close to the transition. Note that for decreasing the first-order transition is more pronounced.
IV Hardness phase transition
Having studied the phase behavior of the WPE and elicited properties that would plausibly correlate with algorithmic difficulty, we now proceed to examine some empirical results in which we probe the typical time to find the ground state using a parallel tempering (PT) Monte Carlo method. This algorithm is widely used to simulate complex physical and biological systems; contemporary implementations routinely examine spin glasses with several thousand variables and are quite competitive as optimization algorithms as well Moreno et al. (2003); Zhu et al. (2020). In essence, it is a careful stochastic local search Hoos and Stützle (2004) method such that configurations are allowed to escape metastable states by traveling to high temperatures and return afresh to probe the low-energy landscape while ensuring asymptotically correct equilibrium sampling at each temperature.
We seek to check and account for the existence of an algorithmic hardness peak, a sharp increase in the time required to find the ground state as is varied. Unfortunately, our numerical studies were severely hampered by the extreme difficulty of the WPE in its hard regime, which disallowed us from empirically localizing the peak for sizes that would otherwise be considered modest by the standards of other Ising ensembles (e.g., the SK model). We systematically reduced the sizes in which PT failed, within our computational resources, to approach the planted ground-state energy for a range of values. Reliable statistics were finally obtained for ; the plot of median time to solve the problem, or more precisely and as justified in Sec. IV.3, to find a state with energy within of that of the planted ground state, is shown in Fig. 10. Other choices for are certainly sensible, and in Sec. IV.3 we consider and analyze the less stringent values of and . Details of the simulation protocol are found in Appendix VII.6.
The hardness peak is evident at , where the problems typically took around minutes to solve. As expected, the instances get steadily easier for the larger values, but the observation that is easy may initially seem surprising. As alluded to previously, this is due to the fully frustrated nature of , giving rise to a tremendous number of low-energy states, coupled with the fact that standard double-precision floating point arithmetic was used for the computations. Consequently, the potential exists for several states that under unbounded precision would have had close but nonetheless distinct energies to be mapped to the same value. Under the low-energy degeneracy associated with , one thus observes a huge and exponentially increasing (in ) number of numerically indistinguishable “ground states”; finding one such acceptable state turns out to be relatively easy com (c).
Yet, the number of solutions alone cannot be expected to predict problem difficulty; in the computationally easy large- regime, the Hamiltonian starts to look increasingly like a ferromagnet, and so is essentially assured of having a unique solution. Thus, in contrast to uncorrelated problems like number partitioning in which an easy-hard transition is observed at the parameter value such that “perfect” solutions to the problem disappear Gent and Walsh (1996); Mertens (2006), the correlations resulting from the planting procedure give rise to another factor influencing difficulty and which leads to the observed easy-hard-easy pattern. We conjecture that hardness at a given is determined by two competing factors: First, the number of solutions satisfying the integer program and, second, the intrinsic search space size of the integer program. It seems plausible that the regime in which the ratio of these two quantities is minimized signals the hardness transition as this would be analogous to having the fewest needles in the biggest haystack. Our aim is to analytically predict the hardness peak for the -variable WPE at given and numerical tolerance .
To determine the number of satisfying solutions under precision , we first derive the ensemble-averaged energy histogram for the WPE, discussed in Sec. IV.1. We then formalize our notion of the intrinsic search space in Sec. IV.2; as anticipated in Sec. II.1, the size of the nullspace of plays a key role. Finally, we make our quantitative conjecture regarding the transition location in Sec. IV.3 and show that it precisely predicts the location of the peak in Fig. 10. In Sec. IV.4, we present some preliminary numerically obtained features of the WPE energy landscape to complement the main analysis performed here.
IV.1 Energy histogram
We consider an ensemble of problems over spins with parameter such that
[TABLE]
with the columns of simulated as and Hamiltonian
[TABLE]
for . Note that the diagonal elements are included in this formulation as it is presently more convenient to take the ground-state energy to be zero. Without loss of generality, we assume the ferromagnetic ground state is planted so that for all
[TABLE]
We seek the distribution of energies marginalized over the problem ensemble
[TABLE]
where is the probability of drawing by uniform sampling a state with energy from the given problem.
The result is surprisingly simple; the energies follow a gamma distribution with density
[TABLE]
and cumulative distribution
[TABLE]
where and are the gamma and incomplete gamma functions Abramowitz and Stegun (1964), respectively. The calculation of the distribution is shown in Appendix VII.3. It is quite similar to Mertens’ Mertens (2001) analysis of the number partitioning problem cost density, but here the energy is a more complicated quadratic function with a correlated coupler matrix rather than a rank-1 absolute value discrepancy over independent coefficients. Figure 11 displays for the WPE when for a few values of . When , the density is overwhelmingly concentrated on low-energy values; combined with bounded precision, this degeneracy is responsible for the problems being easy. By the properties of the gamma distribution, the mean and standard deviation of are and respectively; consequently (nonplanted) low-energy states become exponentially less likely as increases.
If the coupler matrix rather than were used, the density would simply be translated by the ground-state energy of , i.e.,
[TABLE]
We point out that Eq. (25) in fact describes the energy density of non-planted states, which as is shown in the Appendix, is dominated by paramagnetic states () far from the planted solution. In fact, the true density should be regarded as a mixture of a function at zero energy with weight , representing the probability of sampling the planted solution, and of the gamma distribution (25) with weight .
The distribution enables evaluation of the ensemble-averaged number of states whose energies are at most and which are not related by a global spin flip com (d), which turns out to be highly relevant to computational properties. This can be shown to be
[TABLE]
The leading “1” is due to the persistence of the planted solution; its presence in the expectation may initially seem strange and for small values of it is indeed irrelevant, but as increases, the paramagnetically contributed solutions start to vanish and its influence in fact becomes dominant.
IV.2 Intrinsic search space cardinality
We have speculated that because we aim to solve an Ising-constrained linear program or, equivalently, to solve for a state lying in the nullspace of , the dimensionality of the nullspace plays some role in problem complexity. In this section, we clarify the notion and extract an -dependent intrinsic search space. This is defined as a discrete set of reduced-dimensional states over which one would need to exhaustively search to solve the problem without reference to any information about the values of . In some sense, this is analogous to the reduced complexity enjoyed by various optimization problems on graphs of low treewidth. The result is that when has independent rows, one only needs to consider states in brute-force search; the subtraction by one is simply to remind that the problem is invariant to global spin flips. We note that the following procedure is not meant for literal implementation because we are disregarding, for example, numerical issues that may be important in practice, but simply to show that up to a polynomial prefactor the exhaustive search complexity is exponential in , where
[TABLE]
We remind the reader that the integer programs are always feasible due to the planting construction. Let the vectors
[TABLE]
span . Such a basis can be obtained in polynomial time using, for example, the singular value decomposition of . The integer program can be equivalently phrased as
[TABLE]
where is the matrix whose columns are and . For any state , one can in polynomial time either solve for satisfying coefficients or show that no such coefficients exist using standard linear algebra techniques. While a solution will eventually be found when the method is repeated for all possible states, we now show that not all need be checked.
We first obtain a matrix composed of a subset of independent rows from . Such a matrix can be determined using for example the QR decomposition with pivoting Golub and Van Loan (2012). We apply this decomposition to the columns of :
[TABLE]
where is an column-permuting matrix, is and orthogonal, and is and upper-triangular. This ensures that the first columns of are independent, so we take
[TABLE]
where is an submatrix composed of the first columns of and encodes which rows of were selected for . These steps are computed only once for a given problem.
Now let be an Ising state of length . The system
[TABLE]
with respect to partial state always has a solution because is full rank. After solving for , we verify whether this generalizes to a feasible solution over all Ising variables by computing
[TABLE]
The first components of will be by construction; if the rest are also in , then is a feasible solution. If not, then we choose another and repeat the process starting with solving for . This shows that up to the polynomial overhead involved in the various steps, one only needs to consider the number of states in ; in fact due to the spin-flip symmetry, the intrinsic search space contains elements. In the case where and so the nullspace is one dimensional, only a single state need be checked.
IV.3 Predicting the hardness transition
We now synthesize the two preceding antithetical factors into an expression that predicts when problems are likely to become difficult. For illustration, we first consider the case of degenerate ground states, in which the reasoning does not in fact make any assumptions about the specific generative process or family from which problems are drawn. The number of solutions not related by global flip symmetry to a specific problem is denoted by , a random variable whose distribution may correspond to some parametric setting.
In the exhaustive algorithm described in the previous section, a set of no more than partial states need be checked for feasibility. Given a traversal order over the partial states, the expected time to locate a solution to a problem is simply the average number of iterations until a specific mapping to a feasible completion is found. Noting that each partial state corresponds to at most one solution and assuming that an adversary uniformly allocates the solutions among the partial states, by searching among in arbitrary order, the solution time can be identified with the number of draws of a partial state (without replacement) from a population of such states, of which are solutions, until a single solution is found. This quantity follows a specific type of negative hypergeometric distribution, where for convenience we obtain the inverse of its mean for our particular values as
[TABLE]
When averaged over all problems in the ensemble, the expected inverse solution time is thus approximately
[TABLE]
Equation (29) makes explicit the competing effects of the number of solutions (numerator) and search space size (denominator) on problem difficulty.
Returning our attention to the WPE, we presently determine how to compute the expected number of solutions, or equivalently the mean number of ground states in the Ising formulation, defined as
[TABLE]
where the degeneracy results from the discreteness of the generator variables . We recall from Sec. II.3 that in the WPE, all independent and standardized yield the same limiting properties as those resulting from using a Gaussian and presented in this work. Consequently, a good estimate for the ground-state degeneracy can be obtained for sufficiently large systems using the expected count in Eq. (27). For example the number of ground states in the Rademacher-discretized WPE presented in Sec. II.3 can be approximated by
[TABLE]
where represents the smallest attainable excited-state energy for the problem size.
As mentioned, the computational expense associated with simulating the WPE in its hard regime forced us to simulate systems of relatively small size. Such a restriction can, however, potentially introduce finite-size effects; in other words, the properties of the discretized systems may be far from those predicted by their Gaussian asymptotics. To minimize such artifacts, we opted to sample the parameters using Gaussian generators represented with the maximum available precision and consequently, to contend with numerical errors. The task was thus translated to that of approximate solution.
Bounded precision means we cannot represent all the “true” values of the Gaussian variables used to generate nor can we in general exactly represent the values of resulting from the planting procedure using the rounded variables or even the exact energy of a state relative to the rounded . Rounding and discretization cause errors in the representation of some energies; in particular, states that had distinct energies under arbitrary precision may be mapped to numerically indistinguishable values, and the relative ordering among closely spaced true energies may change. Let be the random variable resulting from finite-precision representation of some true continuous-valued energy ; its corresponding CDF now has step discontinuities. If we interpret the mapping of a state’s exact energy to its representable approximation as a pseudo-random walk process leaving invariant the probability of having energy of at most , then provided is large enough to be representable we can approximate the finite-precision energy CDF with the continuous distribution introduced in Sec. IV.1:
[TABLE]
where the latter equality follows from the nonnegativity of . We can then approximate the expected number of states with observed energy of at most using Eq. (27):
[TABLE]
The value of is in principle arbitrary but must be large enough to reflect the approximation error of the planted ground-state energy. We observe that under the aggregated discretization effects of the construction procedure, the true planted ground-state energy of zero is typically distorted to a value on the order of the so-called machine epsilon for double-precision arithmetic. A sensible strategy is thus to take to be some value not much larger than this, which would conservatively ensure that no planted ground states are missed while accepting a certain number of “false positives,” i.e., that some nonplanted states may be considered successful solutions. Excessively small values of , however, resulted in prohibitively long computational times for the problems in the hard regimes. Our methodology to analyze the empirical solution time settled on using three target energy values: . We hence strive to yield accurate predictions of the hardness transition for the important task of approximate (or relaxed) solution to the problem.
Finally, we define the function trading off number of solutions to effective search space size
[TABLE]
and anticipate that problems become most difficult at
[TABLE]
as this coincides with the fewest number of solutions relative to the effectively constrained search space. In Fig. 12, we plot for system sizes and , when in accordance with our most stringent target. When , in perfect accordance with the location of the hardness peak shown in Fig. 10. The latter size was far too large to solve in the hard regime within our constraints, but the plot suggests that to obtain the most difficult problems for this larger size, needs to increase. It is instructive to predict the required scaling of with for maximally hard problems at this level of precision.
Figure 13 shows as a function of , where a clear linear scaling is apparent. By linear regression, we find this relation to be
[TABLE]
suggesting that the hardest problems at occur when . The general message is clear: To have truly difficult problems under precision constraints, the number of equations in the integer program cannot be constant.
The reader may notice that the definition of resembles that of the mean inverse solution time (29) with respect to the exact partial state traversal algorithm. Directly relating , which considers all energies in a specified range, to the solution time of an exact solver is not straightforward however. The issue is that a partial state may now have several completions to full Ising states whose energies match the target but which are not ground states. While we have shown that finding the ground-state completion of a given or verifying that no such completion exists is straightforward, locating a completion guaranteed to match a generic target is nontrivial. We may nonetheless heuristically justify as follows. Suppose that fixing the partial state variables sufficiently constrains the remaining free variables that locating the extension of minimum energy can typically be done rapidly and with high probability with a heuristic algorithm. An “exhaustive-approximate” algorithm for finding a state whose energy is bounded by can thus proceed as follows: Traverse the in some order, where for each , the minimizing extension is heuristically obtained, and stop if a resultant state’s energy is less than . Under the assumption that the target states are uniformly distributed among the partial state “bins,” can once again have an interpretation as an expected inverse solution time. To justify our conjecture, we now show that can serve to localize the hardness peak for generic values of .
Figure 14 shows the PT median times to solution for the same WPE ensemble considered so far, but for three values of defining an acceptable solution: , , and . Unsurprisingly, the typical solution times decrease as the energy criterion becomes more permissive. Additionally, we note that the location of the hardness peak shifts to larger values of as is relaxed. Most importantly, in Fig. 15 we observe the predictive power of at these values of : In all three cases, the minimizer of at the respective values of precisely corresponds with empirically observed PT solution time. We have hence proposed a robust, theoretically motivated framework for generating tunably difficult problems over a wide range of approximate solution targets.
IV.4 Properties of locally optimal states
As we have seen, the energy histogram derived in Sec. IV.1 has been useful in predicting the algorithmic properties of WPE instances. Nonetheless, this distribution does not provide information about topological aspects of the local minima, i.e., states that are energetically stable with respect to a single spin flip.
In this section, we briefly probe the properties of local minima using exhaustive search on small instances; we save analytic examination of these properties, along the lines of Bray and Moore’s Bray and Moore (1980) analysis for the SK model, for later work.
A natural statistic to analyze is the expected number of local minima as is varied. Furthermore, it is instructive to define a residual energy histogram restricted to stable states. These are shown for a system with variables in Figs. 16 and 17, respectively. As expected, the number of minima decreases monotonically in as the ensemble tends toward a ferromagnet. At small , we observe a large number of minima, but also that the residual energy itself is likely to be small. This is consistent with the observation that at small , the problems with restricted precision are easy.
To further illustrate properties of the WPE, Fig. 18 shows examples of disconnectivity graphs for four specific instances at their respective values of . Disconnectivity graphs are two-dimensional representations of high-dimensional energy landscapes. In their simplest form they depict the minima of the system and the lowest energy barrier connecting any two minima, where an energy barrier is defined as the highest energy value encountered along a specific pathway. The barriers represent the minimum increase in energy necessary to transition from one minimum to another. In this work, the minima of the system were obtained via complete enumeration and the barriers were calculated using a search over all possible pathways identified with flipping spins that are misaligned between pairs of minima. This method was outlined by Garstecki, Hoang and Cieplak Garstecki et al. (1999). To deal with the large number of minima in a computationally efficient manner, we further made use of an approximation based on the relative proximity of minima; specifically, for each minimum only the barriers to the minima closest to it in Hamming distance were obtained. This approximation is based on the fact that transitions between two minima can also happen through basins of intermediate minima, and hence if two minima are separated by a large Hamming distance it is likely that the lowest energy barrier between these two minima will be already represented via transitions between intermediate minima and their corresponding barriers.
In Fig. 18 the minima are represented by vertical bars whose lowest points denote their energies. On top, they are connected by lines converging to a common point representing the height of the barrier that needs to be crossed in order to transition between the connected minima. Due to the continuous nature of the energy values, we sort the minima into a hierarchical cluster structure whose end points comprise the intervals , where is the energy of the barrier and denotes the length of the interval. Minima whose connecting barriers fall within the same energy interval are sorted into a common cluster. Within an individual cluster the minima are arranged based on the number of spins in the up or corresponding down states. Minima which have a high number of up-state spins are sorted toward the left and, correspondingly, minima with a high number of down-state spins are sorted toward the right. Note, that this order strictly only applies within the individual cluster; the order of the individual clusters relative to each other is determined by the hierarchical structure. In this work we set . Figure 18 shows a clear progression in the energy landscape for which small values of , e.g., in Fig. 18 (a), are characterized by a very large number of almost degenerate metastable states, and larger values of [Figs. 18(b) –18(d)] tend to break the degeneracy, emphasize the planted ground state, and make the landscape more funnel-like.
Figure 19 shows the distribution of the average barrier height within the first Hamming distance of its individual minima states. It represents the distribution of the average increase in energy necessary for the system to escape its minima via the shortest route, i.e., to transition to adjacent minima. For a given instance, the average was taken over the barriers to the minima which lie within the shortest Hamming distance to a given minimum, say minimum . are the energies of the individual barriers and is the energy of minimum . The overall distribution is then obtained by sampling over all minima of sample systems for each of the values of .
As can be seen in Fig. 19, at small values of the distribution is more spread than at large values, where additionally it is dominated by relatively small values of the average energy barrier. From an energy landscape perspective this indicates that transitions from the minima at large are more likely to be achievable with less energy cost and therefore might be more probable than at small values of com (e).
V Discussion
We have proposed a planted Ising ensemble with several noteworthy physical and algorithmic properties. The model exhibits a first-order temperature transition, a persistent locally stable paramagnetic state when , and when represented with finite precision, an easy-hard-easy algorithmic difficulty profile. Its physical properties are consistent with the observed hardness of finding its ground state; moderately sized problems are extremely difficult in the hard regime. This meshes well with the intuition that the transition and paramagnetic stability give rise to a golf-course-like energy landscape. After deriving the instance-averaged energy distribution—which turns out to follow a gamma law—we compare the expected number of states matching a solution criterion with a quantity we introduced quantifying the intrinsic search space size at given to analytically predict the location of the hardness peak. The prediction is validated using solution times obtained with a highly optimized implementation of parallel tempering Monte Carlo.
The first-order transition between the planted and paramagnetic phases is furthermore established by the TAP analysis in Sec. III.1, with alternative derivations supported by the replica method (Appendix VII.5) and annealed approximations (Appendix VII.4). Careful Monte Carlo simulations demonstrate the correctness of our analytically predicted transition temperatures.
A connection is made with the anti-Hopfield model and this is developed analytically in Appendix VII.5 Nokura (1998). This analysis indicates that we might expect to see the impact of replica symmetry breaking if we focus on the region orthogonal to the planting space. In this paper we have not focused on this for two reasons. First, the emphasis has been on intermediate scale problems, where it is difficult to numerically establish such phenomena, finite size effects may dominate, and practical issues such as finite-precision representation may have greater impact. Second, at large replica symmetry breaking has little impact on the free-energy barrier separating the bulk of the space from the planted solution, which is the primary driver of hardness for heuristic optimization in the planted case.
Nevertheless, we believe the qualitative description of the space orthogonal to the planted solution, that of a rough energy landscape with deep solutions almost orthogonal to the planted solution, is in effect. Roughness is apparent at the small scales we have worked with empirically. With this in mind, a modification of the planting procedure is pursued in Appendices VII.4 and VII.5 whereby the planted solution is partially penalized. Its leading-order energy is then tunable at fixed , at the cost of losing strict guarantees that it is a ground state. This modification, it is believed, might be the basis for interesting tests of dynamics (escape into or out of the planted solution). In particular, an interesting suggestion has been made that for this type of problem in the presence of a transverse field, quantum dynamics may be differentiated from classical counterparts Smelyanskiy et al. (2018). The features of our model with this modified planting procedure represent a practical realization of many of the abstract model features underlying the population transfer hypothesis.
Future work will explore this direction more deeply, and consider as well whether the features of the WPE energy landscape lend themselves to the demonstration of fundamental speedup using emerging quantum annealing devices, in particular those whose classical simulation is known to be intractable. It is our hope that the insights into the ensemble’s physical and algorithmic classical properties presented in this work will solidly underpin these future directions.
In addition, we are interested in pursuing the relation between the WPE and the important class of low-rank estimation problems Lesieur et al. (2017) used in unsupervised feature extraction and dimensionality reduction, in further examining whether any deeper connections can be made between the ensemble and biological unlearning Nokura (1998), and in studying potential connections with models for analyzing large-system code-division multiple-access (CDMA) multiuser detectors Tanaka (2002).
VI Acknowledgements
We acknowledge useful discussions with Bill Macready, Federico Ricci-Tersenghi, Hidetoshi Nishimori, Jon Machta, Cris Moore, Koji Hukushima, Brad Lackey, Lenka Zdeborová, Salvatore Mandrà, Mohammad Amin, Andrew King, Cathy McGeogh, and Aidan Roy. H.G.K. thanks the Department of Poultry Science at Texas A&M University for providing support. H.G.K. and C.P. are supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via MIT Lincoln Laboratory Air Force Contract No. FA8721-05-C-0002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon.
VII Appendix
VII.1 Cavity calculation of the TAP equations
For notational convenience, we redefine the Hamiltonian of the WPE to include the constant terms corresponding to the diagonal elements of ; this will not affect the final result. Further, without loss of generality we assume that the ferromagnetic solution and its reversal are to be planted as the ground state. The Hamiltonian of the WPE is
[TABLE]
where
[TABLE]
and . As before, for and . The elements of the covariance matrix are as follows:
[TABLE]
This covariance structure implies that the Gaussian process generating is not only stationary but exchangeable. As discussed in Sec. III, this model bears many resemblances to the Hopfield network, but in addition to the negation of the interactions, we have the additional complication of the “antipatterns” consisting of correlated elements. Nonetheless, by closely following the two-step cavity-based method presented by Shamir and Sompolinksi Shamir and Sompolinsky (2000) for the Hopfield model, taking special care to account for the correlations, we may derive the TAP equations for the WPE.
The cavity approach Mézard et al. (1987) derives the self-consistent relation for each local spin magnetization by first considering removal of the spin from the -spin system and defining a state distribution on the -spin subsystem. Remarkably, the joint distribution of the field and spin in the original system can be expressed in terms of the field distribution resulting from the subsystem, allowing the spin and field statistics for the full system to be related to those of the subsystem. On their own, these exact relations do not give much insight because they are intractable to compute, but when the subsystem field distribution can be justified to be Gaussian, the TAP equations for the magnetizations may be obtained. A substantial amount of the work is in deriving the correct parameters for the field distribution. The details for the WPE follow.
VII.1.1 Cavitating a spin
Consider an WPE matrix coupling spins through vectors , i.e., . We can decompose this Hamiltonian into a sub-Hamiltonian consisting only of interactions among spins , denoted here by , and a term accounting for the interaction between spin zero and the others:
[TABLE]
with
[TABLE]
and
[TABLE]
The final constant in is irrelevant and will be dropped. The exact joint distribution over can be shown to be
[TABLE]
where
[TABLE]
and refers to thermal averaging with respect to . From this, we obtain the (intractable) relations
[TABLE]
and
[TABLE]
The next step is to compute the field statistics to be used following a Gaussian assumption for .
As for the Hopfield model, define the field mean and variance as
[TABLE]
Note that
[TABLE]
where
[TABLE]
Using the definition of , we obtain
[TABLE]
If we define the overlap of spins with the last components of as
[TABLE]
and its covariance under the cavitated spin distribution
[TABLE]
then we obtain
[TABLE]
and hence
[TABLE]
Noting while for , we proceed to determine the magnitude of in order to simplify the field variance, bearing in mind that while is independent of , there are componentwise correlations within each vector not present in the Hopfield model. The conclusion will be that just as for the Hopfield model, is when and and when .
Define
[TABLE]
where the superscript on has been dropped. We seek and the fluctuations , where the expectations are taken over .
The linear expectations are straightforward to compute; when , we have
[TABLE]
while, recalling the covariance structure of , when ,
[TABLE]
which is different from the Hopfield model, in which this quantity is zero.
Direct computation of the quadratic expectation is tedious but straightforward. In the expansion, there will be a total of terms of the form . We first make a relevant partitioning of these terms for generic and then compute the expectations for the cases where they are equal and different.
- •
Case ,
If , then there are terms of the form
If , then there are terms like
- •
Case ,
If , then there are terms as
If , then we have terms like and terms as
- •
Case ,
This is a rotated version of the previous case. When , there are terms of the form
When , there are terms like and terms as
- •
Case ,
When , there are terms like and terms as
When , there are terms as and terms like
Consider now the case of . Adding the terms in the expansion, recalling that is independent of and that when , we find that
[TABLE]
implying that is O\Big{(}\frac{1}{N^{3/2}}\Big{)}. This quantity is of identical order in the Hopfield model despite the correlations in .
When on the other hand, we find that
[TABLE]
where the terms in Eq. (36) refer to the effects of the products. Using the properties of the higher-order moments of a correlated Gaussian distribution (e.g., Ref. Tong (2012)), we have by the exchangeability of our distribution for any indices such that different letters refer to different values,
[TABLE]
Substituting these into Eq. (36), we obtain the result that and hence that is typically
[TABLE]
namely O\Big{(}1/N\Big{)}. These results imply that the field variance in Eq. (35)
[TABLE]
is
[TABLE]
where the order of the second summation follows from the independence of and ; by self-averaging it can hence be approximated by
[TABLE]
Finally, recall that . The decorrelate at the same rate as they do in the Hopfield model; further, follows deterministically from and adds no information about the state distribution on sites and hence on the . This suggests that the field can be approximated by a limiting Gaussian distribution:
[TABLE]
where . This Gaussian approximation dramatically simplifies relations in Eqs. (33) and (34), which reduce to
[TABLE]
Considering deletion of any spin rather than zero, the TAP relation
[TABLE]
follows. Note that the distinction between and disappears at this point; a consequence of the cavity method is that terms are disregarded as one expects. To fully specify the TAP relation, must be determined for the WPE; we turn to this task next.
VII.1.2 Cavitating a
Consider now a system of spins but with . The corresponding Hamiltonian can be related to that of a system with a single removed. Specifically, if
[TABLE]
then the full Hamiltonian is
[TABLE]
where again an irrelevant constant has been dropped. Now following the cavity procedure, we obtain the distribution of relative to the Boltzmann distribution corresponding to in terms of that corresponding to :
[TABLE]
The expectation by self-averaging due to the stationarity of . The variance was shown to be O\Big{(}\frac{1}{N}\Big{)} previously; for large , its typical value is
[TABLE]
where
[TABLE]
Finally, assuming that is Gaussian and using the relation in Eq. (41), we obtain
[TABLE]
which is a Gaussian with mean zero and variance
[TABLE]
Note that this value would result for any removed. To obtain the field variance, we use Eq. (38) to obtain
[TABLE]
This resembles that of the Hopfield model but with a changed sign in the denominator.
VII.2 Limiting spectral distribution of
We determine the limiting eigenvalue distribution of the matrix. To do so, it suffices to determine the spectral distribution of . To see why, first note that for all by the law of large numbers. Recalling that , this implies that is asymptotically related to by addition of a uniform quantity to the diagonal, namely
[TABLE]
which in turn means that the eigenvalues of are simply those of translated by .
For Wishart matrices of the form , where are matrices whose elements are independent zero-mean unit-variance Gaussian variates and , the limiting spectral distribution is known as the Marchenko-Pastur Marchenko and Pastur (1967) law. The fact that the columns of are correlated Gaussian variates seems at first to complicate the determination for . The structure of the specific covariance matrix considerably simplifies matters, however. Recalling that
[TABLE]
it is apparent that first, is an eigenvector with null eigenvalue, and second, that any vector in the subspace orthogonal to is an eigenvector with eigenvalue . Hence, any orthonormal set of vectors orthogonal to can be used to represent , and the variation of along each of these eigenvectors is asymptotically of unit magnitude. The procedure for generating is for large thus equivalent to first generating vector whose first elements are independently and whose last element is zero and next, transforming by some unitary matrix rotating the coordinate vector to . This implies that the spectral distribution of approaches that of , where
[TABLE]
consists of an matrix composed of iid normal variates and a final row of zeros; the eigenvalues of are thus those of with an extra zero added. The limiting spectral distribution of can be straightforwardly obtained from the Marchenko-Pastur law by appropriate change of variables. We then obtain the limiting eigenvalue distribution of the matrix to be
[TABLE]
where
[TABLE]
and
[TABLE]
Note that the spike at zero never disappears, a feature that turns out to crucially influence the phase behavior for large . Finally, the spectral distribution of follows by reflection and translation:
[TABLE]
where
[TABLE]
and
[TABLE]
The distribution consists of a continuous component over the support and a persistent function at , where of course .
VII.3 Energy histogram of the Wishart ensemble
Define the length vector of normalized state overlaps with the
[TABLE]
so that and
[TABLE]
The derivation of the marginal energy
[TABLE]
is simplified by decomposing the integration into sums of expectations over subsets of states with a constant number of positive elements and exploiting the exchangeability of . Let be the number of elements in with value ; for a magnetization , . On the constrained subset, the marginal density of is
[TABLE]
where the sums are over states with positive entries. The joint distribution
[TABLE]
by the independence of the columns, but the components of each are correlated. They are however exchangeable, and since the same appears in each term in the sum this implies that
[TABLE]
only depends on . Consider the specific case of whose first elements are 1. Define the sums (one for each column of )
[TABLE]
Because , the sum
[TABLE]
deterministically. By the independence of the the sums are independent random variables and from the properties of linear transformations of Gaussian variables each is distributed according to a zero-mean Gaussian with variance
[TABLE]
When or , the zero-variance Gaussian is defined to be a function as we expect. We express the expectation (44) at fixed as
[TABLE]
Now let
[TABLE]
with density . We then have
[TABLE]
Because is the sum of squares of zero-mean iid Gaussians with variance , it is gamma-distributed, i.e., with density
[TABLE]
with parameters . For large and relating to , . We then obtain the constrained- energy histogram
[TABLE]
for and zero otherwise and the overall energy distribution
[TABLE]
where the sum runs over the values of mapping to . As described, for example, in Refs. Mertens (2006); Mezard and Montanari (2009), applying Stirling’s large- approximation to the binomial coefficients, replacing the sum with an integral and evaluating it with Laplace’s method, we obtain
[TABLE]
or the marginal gamma energy density for the WPE
[TABLE]
VII.4 Annealed approximation
The purpose of the annealed approximation is to provide a simple bound on the typical case behavior of the free energy at leading order in . The free-energy density is defined
[TABLE]
where denotes an average over instances. Discontinuities in the free energy describe the phase transitions, and derivatives describe the order parameter(s) and other statistically significant quantities at thermal equilibrium.
It is convenient for this section to consider a model defined by the Hamiltonian
[TABLE]
where are independent and normally distributed random variables. For the case this Hamiltonian is identical to the main text Hamiltonian up to the choice of the embedding solution (, ), inclusion of a nonzero diagonal term in the coupling matrix (which adds a constant offset to the energy), and corrections of . These restrictions are for the convenience of analysis, and have no significant impact on the analysis method or conclusions of the section.
The parameter is useful in making the connection to the anti-Hopfield model (obtained for the case ) Nokura (1998), and in identifying a variation on the principle of embedding discussed in the main text. Tuning of this parameter allows one to control the energy level of the planted solution at leading order in , allowing an embedding of a ground state with control over the gap at fixed , or implanting an excited state well separated from other stable and metastable states.
The annealed approximation may be used to obtain a lower bound, , on the free-energy density
[TABLE]
The physical interpretation for this approximation is that the quenched degrees of freedom () are treated on an equal footing with the dynamical degrees of freedom (). This means that models of lower energy can be selected disproportionately, since spin and model variables can become correlated to lower the free energy. By this process it is possible that atypical models can dominate the free energy so that typical case is not reflected. However, in this ensemble we show that the free energy is correct through much of the phase space in agreement with the TAP analysis of Sec. III.1.
Owing to the quadratic form of the Hamiltonian in the order parameter, it is rather straightforward to take the disorder average explicitly, which yields:
[TABLE]
where . Following this, we notice that the eigenvalues of are a function only of the sum of spin variables, thus the trace log can be simplified, defining we can write
[TABLE]
Using Stirlings’s approximation, and using a continuum approximation for we can write
[TABLE]
Finally, applying Laplace’s approximation yields the result
[TABLE]
The quantity can be readily associated with the planted state overlap (magnetization, for the case of ). The first term is the standard mean-field entropy term, the latter term being an energy term.
Maximizing this equation involves solving for , also called the saddle-point equation. At high temperature there is only the solution , whereas at low temperature (for large enough ) there are two additional solutions. For the case of the main text [Eq. (55)] is identical to the dominant minima of the TAP equation free energy [Eq. (14)].
At low temperature it is interesting to observe that the crystallization transition occurs due to a competition between the energy term and the entropy term, one dominating in each regime. For the paramagnetic solution is defined by , whereas for the planted solution . Equating these two terms for the case we can find a simple approximation to the first-order transition, the approximation in Eq. (20) proves a very accurate lower bound for the critical temperature shown in Fig. 1 for small values of .
In this section we have found that the annealed approximation recovers the more general TAP result derived in Sec. III.1. For the special case of planting we find that the embedded state remains thermodynamically dominant at low-enough temperature for large-enough , at a diminished critical temperature (and increased ground-state energy). For the case , and at small the annealed approximation is insufficient to predict all features of the phase diagram, for this we can use the replica method of Sec. VII.5.
VII.5 The replica method
The annealed approximation of Sec. VII.4, and TAP analysis presented (Sec. III.1), are insufficient to describe all the features of our model phase space. To go beyond these we here introduce a replica method. The replica method when solved is able to capture non trivial properties of the exact free energy, at leading order in , in a variety of disordered models closely related to our proposed model Amit et al. (1987); Parisi and Potters (1995); Nokura (1998). As presented here, the replica method is a nonrigorous method for purposes of obtaining insight through related models.
The replica method for the anti-Hopfield model () is demonstrated by Nokura et al. Nokura (1998), and may be derived along similar lines as the Hopfield model Amit et al. (1987). Only minor variations to the form of the Hopfield replica method (free energy) are necessary to analyze the WPE, but these changes have significant consequences.
The free-energy density [Eq. (49)] can be rewritten for purposes of the replica method in terms of a replicated partition function
[TABLE]
where is the partition function, and the trick is to solve for the case of positive integer and analytically continue to real . For integer we can write
[TABLE]
using Eq. (50), being a vector of dimension . From here we can follow closely the method in Section 2 of Ref. Amit et al. (1987), which is considered a standard approach. This being understood, we are sparse in our derivation. The main difference in the method is the introduction of an additional order parameter through the identity
[TABLE]
The integral identity, and scaling with , being appropriate for the large system limit saddle-point, to be later identified. A similar identity is introduced for the overlap .
The replicated partition function following these manipulations be written
[TABLE]
where is the identity matrix, and are matrices of the same dimension with zero on diagonal (overlap order parameters), and , and are vectors (alignment with planted solution order parameters).
For the special case (or ) this free energy is identical to that of the anti-Hopfield model Nokura (1998). The interpretation of this result is as follows: since describes the degree of alignment with the planted solution, we can argue that when there is no extensive alignment with the planted solution, such as at high temperature (above the first-order planting transition) or in general for the space orthogonal to the planted solution, the model phenomena will be identical to the anti-Hopfield model. This is interesting, since in the anti-Hopfield model, and in related classes of problems, there is understood to be a replica symmetry breaking phenomena Parisi and Potters (1995). The space becomes divided into modes separated by extensive barriers at sufficiently low temperature. We thus expect this same phenomena to carry over to our model either as a stable or metastable solution. To determine which we must analyze the free energy within an approximation. In this Appendix we will develop only the replica symmetric theory, this being understood as sufficient for the paramagnetic phase, and the crystal phase discussed in the main text.
In the replica symmetric solution we take and that for (the diagonal term is 0). The free energy at leading order can be determined from the saddle-point free energy
[TABLE]
We can attempt to solve this equation as in the annealed case by setting derivatives with respect to , , and to zero. These equations can be written
[TABLE]
First we should note the solution , which is the same (paramagnetic) free energy as in the annealed and TAP analysis, and describes the high-temperature solution. The local stability of this solution at , which determines the critical temperature, is identical to that determined by alternative analysis methods (Secs. VII.4 and III.1).
Next consider the subspace with and . In this case we find the relationship , and , and recover the planted phase with identical properties to the annealed and TAP analysis. The energy of this solution being controlled by . This solution, where it exists is again locally stable.
Finally, for a solution with nonzero is possible in the subspace with , the critical temperature given by . Nevertheless, for the embedding-aligned phase () is dominant. Thus the replica symmetric solution is in agreement with the understanding presented in the main text. For this solution can become relevant for a limited region , and describes a transition similar to that in the SK model.
However, it is known that the replica symmetric solution can be unstable, and it is necessary to go beyond these approximations to first and higher orders of replica symmetry breaking to describe the phase space correctly, this being most important at small . We have not probed the interesting consequences in this part of the phase diagram, but the results of the anti-Hopfield model are understood to apply and there is expected to be dynamical and static transitions in the space orthogonal to the planted solution Nokura (1998); Parisi (1995). It is the existence of these solutions, whether they are stable or metastable, alongside the planted one, that offers the interesting possibility to undertake a population transfer measurement in the context of this model Smelyanskiy et al. (2018). Being able to plant a deep stable or metastable solution may also have other interesting applications, particularly in sampling and inference applications where one must provide more than a single ground-state certificate.
The aim of this paper is to provide practical intermediate scale benchmarks, the replica method describes only the typical case properties at leading order in . A tension therefore exists between the results found in this section, and practical limitations of the ensemble such as precision, and instance to instance fluctuations at finite size. These facts must always be borne in mind when considering such analysis results.
VII.6 Time to solution measurements
In this work the problem hardness is quantified via time to solution (TTS) of parallel tempering Monte Carlo. We do emphasize that we expect similar results using other heuristics. For a single disorder realization, the time to solution is defined as the run time such that there is a success probability to have found the solution at the end of the run.
Many solvers have parameters that affect the success probability which we denote by a set . For parallel tempering Monte Carlo, the parameters considered are the lowest temperature and number of replicas. Often, it is faster to carry out many short attempts taking run time with a lower success probability which motivates the definition Boixo et al. (2014); Rønnow et al. (2014); Albash and Lidar (2018)
[TABLE]
The number of attempts is taken to be a real number for convenience.
In the case of an ensemble, averages may be poorly defined and so the median is used Boixo et al. (2014). For the TTS to be well defined, the minimum value with respect to run time and parameters must be computed Rønnow et al. (2014). The ensemble TTS is defined as
[TABLE]
where the index refers to the TTS of individual disorder samples as a function of run time and parameters.
Due to the nonsequential nature of parallel tempering Monte Carlo, the success probability as a function of run time can be efficiently measured with only a small number of runs Mandrá and Katzgraber (2018): The algorithm is run times until the solution is found. For a given run time , the success probability is estimated as the percentage of runs where the solution is found in time less than .
Due to finite numerical precision, solutions are considered to be any state with energy where is the planted solution energy and is a numerical constant. was used for the TTS study in Fig. 10. The effects of different values of on the location of the TTS maximum are shown in Fig. 14.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Hogg et al. (1996) T. Hogg, B. A. Huberman, and C. P. Williams, Phase Transitions and the Search Problem (Elsevier, Amsterdam, 1996).
- 2Monasson et al. (1999) R. Monasson, R. Zecchina, S. Kirkpatrick, B. Selman, and L. Troyansky, Determining computational complexity from characteristic ‘phase transitions’ , Nature 400 , 133 (1999).
- 3Mézard et al. (2002) M. Mézard, G. Parisi, and R. Zecchina, Analytic and algorithmic solution of random satisfiability problems , Science 297 , 812 (2002).
- 4Zdeborová and Mézard (2008) L. Zdeborová and M. Mézard, Locked Constraint Satisfaction Problems , Phys. Rev. Lett. 101 , 078702 (2008).
- 5Kauffman and Levin (1987) S. Kauffman and S. Levin, Towards a general theory of adaptive walks on rugged landscapes , J. Theor. Biol. 128 , 11 (1987).
- 6Kauffman and Weinberger (1989) S. Kauffman and E. D. Weinberger, The nk model of rugged fitness landscapes and its application to maturation of the immune response , J. Theor. Biol. 141 , 211 (1989).
- 7Johnson et al. (2011) M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, et al., Quantum annealing with manufactured spins , Nature 473 , 194 (2011).
- 8Wang et al. (2013) Z. Wang, A. Marandi, K. Wen, R. L. Byer, and Y. Yamamoto, Coherent Ising machine based on degenerate optical parametric oscillators , Phys. Rev. A 88 , 063853 (2013).
