A Tight Degree 4 Sum-of-Squares Lower Bound for the Sherrington-Kirkpatrick Hamiltonian
Dmitriy Kunisky, Afonso S. Bandeira

TL;DR
This paper proves a tight degree 4 sum-of-squares lower bound for certifying upper bounds on the Sherrington-Kirkpatrick Hamiltonian, showing it cannot do better than the spectral maximum, and proposes a conjecture for higher degrees.
Contribution
It establishes a tight degree 4 SOS lower bound for the SK Hamiltonian and introduces a conjecture for extending lower bounds to higher degrees using pseudomoment constructions.
Findings
Degree 4 SOS cannot certify bounds below the spectral maximum.
With high probability, the lower bound matches the maximum eigenvalue asymptotically.
Proposes a conjecture for lower bounds at any fixed degree as N grows.
Abstract
We show that, if is drawn from the gaussian orthogonal ensemble, then with high probability the degree 4 sum-of-squares relaxation cannot certify an upper bound on the objective under the constraints (i.e. ) that is asymptotically smaller than . We also conjecture a proof technique for lower bounds against sum-of-squares relaxations of any degree held constant as , by proposing an approximate pseudomoment construction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Tight Degree 4 Sum-of-Squares Lower Bound for the Sherrington-Kirkpatrick Hamiltonian
Dmitriy Kunisky Email: [email protected]. Partially supported by NSF grants DMS-1712730 and DMS-1719545. Department of Mathematics, Courant Institute of Mathematical Sciences, New York University
Afonso S. Bandeira Email: [email protected]. Partially supported by NSF grants DMS-1712730 and DMS-1719545, and by a grant from the Sloan Foundation. Department of Mathematics, Courant Institute of Mathematical Sciences, New York University
Center for Data Science, New York University
( First Draft: July 26, 2019
Current Draft: August 31, 2020)
Abstract
We show that, if is drawn from the gaussian orthogonal ensemble, then with high probability the degree 4 sum-of-squares relaxation cannot certify an upper bound on the objective under the constraints (i.e. ) that is asymptotically smaller than . We also conjecture a proof technique for lower bounds against sum-of-squares relaxations of any degree held constant as , by proposing an approximate pseudomoment construction.
Contents
1 Introduction
1.1 Algorithms for the Sherrington-Kirkpatrick Hamiltonian
This paper concerns convex relaxations of the following optimization problem:
[TABLE]
Since the constraint may be written , this is a simple instance of quadratically constrained quadratic programming. We are moreover interested in a random setting, where is a random matrix drawn from the gaussian orthogonal ensemble (GOE): and , with the entries on and above the diagonal distributed independently. We denote this distribution . Under this model, the spectral radius of is of constant order, and the normalization in (1) is such that also remains of constant order as , as we will describe below.
The problem for general includes the problem of finding maximum cuts in graphs (MaxCut), when is taken to be a graph Laplacian. Karp’s classical result [Kar72] therefore implies that computing is -hard in the worst case. The case is a simple and mathematically elegant example with which we hope to probe the average-case complexity of the same problem, seeking to understand whether the worst-case complexity abates for specific random models of .
We are assisted in this task by the rich history of the random optimization problem in statistical physics: up to a change in sign, its value is the ground-state energy of the Sherrington-Kirkpatrick (SK) model, a prominent mean-field model of spin glasses [SK75]. In particular, the asymptotics of its expected value have been well-understood at a non-rigorous level since the seminal work of Parisi [Par79], who developed a system of deep conjectures on the optimization landscape of , which, among other results, allowed him to analytically predict the limit
[TABLE]
(Standard results from general gaussian process theory also imply strong concentration around the expectation.) More recently, the computation of this limit has been made mathematically rigorous as well [Pan13a, Pan13b, Tal06].
From the perspective of computer science and optimization, perhaps the more natural random model of is the case where is the adjacency matrix or graph Laplacian of a random graph, which gives randomized instances of MaxCut. A pair of elegant recent works [MS16, DMS*+*17] showed that, in fact, for sparse random graphs this problem is intimately related to the gaussian setting of the SK model: an interpolation argument may be used to control both the true value and the value of a certain simple semidefinite programming relaxation of for sparse random graphs in terms of the SK model.
Thus, whether motivated by the mathematical interest of the GOE and SK model or the application to MaxCut, we are led to ask:
Question 1.1**.**
Under , can be approximated accurately and efficiently?
Of course, knowing the limiting expectation (2) and concentration around this value, it is simple to produce a vacuous algorithm that outputs the value . To capture the difficulty of solving instances of for specific random draws of , we must therefore refine our question.
One way to do this is to ask instead:
Question 1.2**.**
Under , can be efficiently computed such that ?
Recently, assuming a widely-believed conjecture from the spin glass literature, Montanari answered this question in the affirmative in [Mon19]. Montanari’s result followed a similar one on local search in a simpler “random energy model” [ABM18], and used proof techniques related to those proposed by Subag in [Sub18], who addressed the same question in a continuous setting.
Theorem 1.3** (Theorem 2 of [Mon19]).**
Conditional on the conjecture that the Parisi distribution has continuous support at sufficiently low temperature in the SK model,111See Assumption 1 of [Mon19] and the surrounding citations and discussion for further details. for any , there is a polynomial-time algorithm computing such that
[TABLE]
Another way to refine our question is to ask rather for certificates of upper bounds on :
Question 1.4**.**
Can be efficiently computed with and ?
(Note that we require to hold for every ; the algorithm is not allowed to “cheat” the random setting by merely outputting a number slightly larger than .) One simple but sub-optimal approach is to form the spectral certificate, which amounts to disregarding the constraint by taking . Recently, Montanari asked222The authors learned of this problem through private communications soon after [MS16] was published. More recently, it was also included in the problem list “AimPL: Phase transitions in randomized computational problems,” available online at http://aimpl.org/phaserandom. whether any certification algorithm could improve on this performance, a problem which, besides modest progress that we will review in the following sections, has since remained open to the best of our knowledge.
Our contribution in this paper is to provide evidence that the spectral certificate is asymptotically optimal by showing that the degree 4 sum-of-squares relaxation, a much more sophisticated convex relaxation, achieves the same performance.
1.2 Conjectural hardness of certification
One step towards making a convincing prediction of whether better-than-spectral certification is possible in the SK model was taken in [BKW20], in which the authors participated. In this work, we first showed that, if efficient certification below 2 were possible for the SK model, then it would be possible to efficiently perform a certain hypothesis testing task in a variant of a spiked matrix model. Then, we provided evidence that this hypothesis testing task should be hard using a method based on the low-degree likelihood ratio. Roughly speaking, this technique takes low-degree polynomials as a proxy for all polynomial-time testing statistics and measures their performance in a convenient smoothed sense, which allows the optimal low-degree polynomial statistic to be identified and analyzed using an orthogonal polynomial decomposition.
This suggests the following conjecture, which would hold conditional on another, quite broad conjecture of [HS17, Hop18] that the low-degree likelihood ratio analysis is correct for a large class of hypothesis testing problems.
Conjecture 1.5**.**
For any , there does not exist a polynomial-time certification algorithm for such that with high probability.
Unfortunately, though the low-degree likelihood ratio method predicts many known computational thresholds in random problems correctly, at the moment it is only known to imply rather weak lower bounds against specific algorithms—either only lower bounds in expectation or a smoothed sense, or high-probability lower bounds under quite restrictive assumptions (see, e.g., the recent survey [KWB19] by the authors). In search of further evidence of hardness of certification, we therefore consider concrete algorithms and analyze their performance directly.
1.3 Basic notions of sum-of-squares relaxations
The main algorithmic approach for certifying bounds on a problem like is to form convex relaxations that may be solved efficiently by standard convex optimization techniques. First, note that, defining the cut polytope
[TABLE]
we may rewrite as a linear optimization problem over this set of matrices,
[TABLE]
Though is a convex set, it is complex to describe [DL09], and in particular does not admit a polynomial-time separation oracle unless (by the same result of [Kar72] mentioned before). We thus pursue the idea of expanding to a larger convex set that may be described more simply, and over which convex optimization is tractable.
Specifically, we will study the performance of semidefinite programming (SDP) relaxations of . Perhaps the simplest of these is based on the inclusion of sets
[TABLE]
Replacing with in the definition of just computes , expressing the spectral certificate as an SDP relaxation. Thus one consequence of Conjecture 1.5 is that this naive relaxation is optimal among the wide variety of SDP relaxations that may be applied to .
One broad and successful framework for SDP relaxation of optimization problems through which one might hope to find an improvement is the sum-of-squares (SOS) hierarchy of relaxations [Las01, Lau09, BPT12, BS14]. This generates a sequence of convex sets we will denote by , indexed by a parameter , an even natural number called the degree, which satisfy [Lau03, FSP16] the strict inclusions
[TABLE]
Moreover, is a projection of an affine slice of the positive semidefinite cone of real symmetric matrices, whereby optimization over may be written as an SDP.
Unfortunately, though in this SDP both the dimension of the decision variable and the number of constraints scale polynomially with , and even if we assume (as will be the case in our application) that the SDP coefficients may be well-approximated with an encoding in polynomially many bits (i.e., are of size bounded by ), it still need not be the case that the SDP can be solved to small additive error in polynomial time, as O’Donnell has pointed out [O’D17]. The key point is that one must further ensure that there exists an optimizer of the SDP whose entries are also of size bounded by . Fortunately, the work [RW17] studied this issue for sum-of-squares relaxations of many discrete problems, including our setting of unconstrained optimization over Boolean variables, and showed that this condition is in fact satisfied (see their Corollary 9). Thus optimization over may indeed be performed in time with, for instance, the ellipsoid algorithm.
To give a concrete formulation of this SDP, we now describe in terms of the pseudomoment interpretation of SOS optimization (as derived from the general formulation of SOS in, e.g., [Lau03]). Below we adopt the useful notations of and for the sets of subsets of having size and size at most (including the empty set), respectively. We also use the standard notation for the symmetric difference of the sets and .
Definition 1.6**.**
* is the set of matrices such that there exists having*
[TABLE]
and satisfying the following properties:
. 2. 2.
* only depends on .* 3. 3.
* whenever .*
In this case, we say is a degree pseudomoment matrix for the constraint polynomials , which extends .333Usually, a degree pseudomoment matrix must be indexed by all monomials in of degree at most ; however, in our case, the constraint ensures that the pseudomoments of multilinear monomials fully determine the pseudomoment matrix.
For the sake of brevity, we will simply refer to such as a degree pseudomoment matrix, since we only study optimization over .
Definition 1.7**.**
The degree sum-of-squares relaxation of is
[TABLE]
In addition to Montanari’s general question on certifying bounds on mentioned before, Jain, Risteski, and Koehler have independently posed the more specific question of determining the asymptotic value of when in [JKR19].
One further simplification of the above setup will also be useful to take into account.
Definition 1.8**.**
We call a reduced degree pseudomoment matrix if it satisfies all of the conditions of Definition 1.6, as well as the following additional condition:
* whenever is odd.*
As we show below, because of the invariance of both the constraints and the objective function under the map , may be equivalently defined in terms of reduced pseudomoment matrices.
Proposition 1.9**.**
If there exists a degree pseudomoment matrix extending , then there exists a reduced degree pseudomoment matrix extending .
Proof.
It suffices to show that if is a degree pseudomoment matrix, then the matrix of the same shape with entries is also a degree pseudomoment matrix, since then is a reduced degree pseudomoment matrix whose minor indexed by equals that of .
Clearly all of the linear constraints on are satisfied, so it suffices to check positive semidefiniteness. For this, if , let us define by . Then, since , we have , completing the proof. ∎
We will only study degree 2 and degree 4 pseudomoment matrices in detail, so we give more concrete versions of the above conditions for those cases.
Proposition 1.10**.**
**
Proposition 1.11**.**
Let , with the row and column indices of identified with , ordered first by size and then in lexicographical order.444For instance, is ordered as . Then, is a reduced degree 4 pseudomoment matrix if and only if the following conditions hold:
. 2. 2.
* for all distinct .* 3. 3.
* for all distinct .* 4. 4.
* for all distinct .* 5. 5.
* is invariant under permutations of the indices .*
1.4 Prior work on sum-of-squares lower bounds
The Montanari-Sen degree 2 lower bound
The only previous result on SOS relaxations of under that we are aware of is the following result of Montanari and Sen [MS16], which establishes hardness of certification for degree 2 SOS.
Theorem 1.12** (Theorem 5(a) of [MS16]).**
Let . Then,
[TABLE]
The mechanics of this result will be crucial for proving ours, so let us also review how to construct feasible points achieving this bound. First, fix a parameter , and set .555Following [MS16], we assume for the sake of simplicity that is an integer; recovering the same results for is tedious but straightforward. Then, given , let be the subspace spanned by the eigenvectors of having the largest eigenvalues, and let be the orthogonal projector to . Let be the diagonal matrix with entries , and define
[TABLE]
(Note that if and only if , which almost surely does not occur.) Then, for any almost surely, and for each there exists such that
[TABLE]
We call the Montanari-Sen witness. This construction will be the basis of ours; indeed, we will show that only a small correction is necessary to make this witness feasible for the degree 4 SOS relaxation as well.
Simultaneous work on degree 4 lower bounds
After preparing an earlier version of this manuscript, we learned of the concurrent work [MRX19], the fruit of a parallel research effort in which the same result (Corollary 2.2, stated in the following section) as ours is proved. Their construction uses the pseudocalibration sum-of-squares heuristic introduced in [BHK*+*19], while our construction is based on more geometric and problem-specific considerations of constraints on the pseudomoment extension matrices described above. We remark that one important advantage of their work is that its analysis is general enough to apply immediately to problems besides the SK model, such as MaxCut on sparse random graphs.
On the other hand, the output of pseudocalibration in this situation appears to be quite complicated: while the degree 4 pseudomoments constructed in [MRX19], like ours, are low-degree polynomials in the entries of , they are not stated explicitly, and are sums of many more “graphical” terms, simple polynomials in the entries of described by graphs labelled by matrix indices. For instance, while the formula we will derive for our degree 4 pseudomoments includes just two types of such polynomials (see our Section 4), the analysis of the pseudomoments of [MRX19] involves at least 20 types (see their Section 3). Thus, though it is possible that the degree 4 pseudomoment matrices constructed in our work are close to those constructed in [MRX19] (in spectral norm, for example), our approach gives a more explicit and condensed description of a suitable pseudomoment construction.
Furthermore, while both this paper and [MRX19] use the idea of an “approximate Cholesky decomposition” of the pseudomoment matrix (also present in earlier works such as [BHK*+*19]) for the proofs of positive semidefiniteness, we give in Sections 4.2 and 5 an intuitive probabilistic meaning to the Cholesky factors arising in this argument. To the best of our knowledge, this is a novel interpretation, which we believe may shed new light on the structure of pseudomoment matrices and simplify some of the ad hoc technicalities arising in the proofs of their positive semidefiniteness.
2 Results
Our main result establishes hardness of certification for degree 4 SOS, by showing that a minor variant of the Montanari-Sen witness admits a degree 4 extension.
Theorem 2.1**.**
Let , and let be as in Theorem 1.12. Define
[TABLE]
Then,
[TABLE]
The theorem says that an arbitrarily small adjustment in the direction the identity matrix (the barycenter of the vertices of the cut polytope) suffices to make the degree 2 Montanari-Sen primal witness admit a degree 4 extension with high probability.666The specific choice of the identity matrix is probably not essential, but is convenient because, as we will see, the degree 4 extension of the identity matrix has an especially simple spectral structure. In Conjecture 5.2, we will also propose that the same membership holds with high probability for any SOS relaxation with constant degree as .
Since the degree 2 part of our construction is a simple modification of the Montanari-Sen witness, it is straightforward to apply Theorem 1.12 and obtain a lower bound for the degree 4 SOS objective.
Corollary 2.2**.**
Let . Then,
[TABLE]
Proof.
By Theorem 1.12, take such that with high probability. Let be small that . By Theorem 2.1, with high probability,
[TABLE]
The random variable has law , so with high probability the last term is smaller than , and the result follows. ∎
Organization
The remainder of the paper gives the proof of Theorem 2.1, and conjectures an extension of the same proof technique to higher degree SOS relaxations. In Section 3 we review some preliminary facts and notations. In Section 4 we give the motivation and precise statement of our construction of a matrix that is with high probability a degree 4 pseudomoment matrix extending . In Section 5, we present a natural conjecture for how this construction can be extended to higher degrees. Finally, in Sections 6, 7, and 8, we prove that our construction from Section 4 indeed furnishes a valid degree 4 pseudomoment matrix with high probability.
3 Preliminaries
3.1 Glossary: variables, parameters, and constants
For reference, we summarize various symbols that will appear throughout. The following are the general scalar parameters involved.
- •
is a parameter indicating the size of in the problem statement.
- •
is a fixed parameter not depending on .
- •
, which we assume for the sake of simplicity is an integer. This gives the rank of the Montanari-Sen witness that we extend (before modifications that make it have full rank)
- •
is another fixed parameter not depending on or .
- •
is a constant appearing in concentration inequalities, giving polynomial rates of decay of probabilities of the form . All of the concentration inequalities where appears hold with any choice of , but the other constants appearing in those results depend on both and . Thus a typical inequality will take the form
[TABLE]
One may think of any concrete choice, e.g. , throughout.
The following are vectors and matrices associated to the Montanari-Sen construction applied to a specific instance .
- •
is the matrix having the top (unit norm) eigenvectors of as its rows.
- •
are the columns of (not eigenvectors of ).
- •
is the orthogonal projector to the top -dimensional eigenspace of , and also the Gram matrix of the .
- •
.
- •
is a diagonal matrix with .
- •
is the matrix having the as its columns.
- •
is the Montanari-Sen witness, and also the Gram matrix of the . For the sake of brevity, we will often drop the superscript, as will be a constant carried throughout.
3.2 Other notation
Linear algebra
The identity matrix in dimension is denoted , and the all-ones vector is denoted . The Frobenius or entrywise inner product of matrices having the same shape is denoted . The Hadamard or entrywise product of matrices having the same shape is denoted and has entries . Hadamard powers of a matrix are denoted . The matrix operator norm is denoted . The vectorized supremum norm is denoted . The matrix Frobenius norm is denoted . The group of orthogonal matrices is denoted . The Stiefel manifold of matrices with orthonormal rows is denoted .
Vectorization
We describe several ways of vectorizing symmetric matrices. For , we let and be the vectorized diagonal and strict upper triangle of , respectively, with index sets ordered lexicographically. These two vectors determine a symmetric matrix completely. It will also be useful to define an isometry between endowed with the Frobenius inner product and endowed with the ordinary Euclidean inner product. This is given by
[TABLE]
which indeed satisfies . We denote when the dimension is clear from context, since this is the indicator vector of the diagonal matrix indices.
Probability
We denote the relation of two random variables having the same law by . The Haar measure on the orthogonal group or Stiefel manifold is denoted or , respectively, and refers to Haar measure with respect to the action of by right multiplication in the latter case. We write for the Haar measure on vectors of the unit sphere under the action of by left multiplication, equivalent to (up to transposition).
3.3 Basic properties of the Montanari-Sen witness
It will be useful to establish some preliminary bounds on and distributional properties of the Montanari-Sen construction described in Theorem 1.12.
First, we use an elegant geometric argument mentioned in [MS16] (but which seems to be well-known folklore) to obtain bounds on the entries of .
Proposition 3.1**.**
For all ,
[TABLE]
Proof.
For the diagonal entries, for and the orthogonal projector onto . Letting , we have . Writing and for the first and last coordinates of respectively, we then have
[TABLE]
where and are distributed as independent random variables with and degrees of freedom respectively. The result then follows from the concentration inequalities of [LM00] for random variables.
For the off-diagonal entries, likewise , where now are a Haar-distributed two-dimensional orthonormal frame. If we draw independently, then by performing two steps of Gram-Schmidt orthonormalization,
[TABLE]
Thus computing as before we find
[TABLE]
To control these quantities, we first bound
[TABLE]
In this expression, every norm may be controlled by the concentration inequalities of [LM00] as before, and every inner product may be controlled by observing that, when and are independent standard gaussian vectors, then , thus reducing the task of controlling the inner product to controlling an norm. Applying these bounds then gives the result. ∎
The following related results for follow directly.
Corollary 3.2**.**
For all ,
[TABLE]
Proof.
We have , so the result follows from combining Proposition 3.1 applied to several entries. ∎
Corollary 3.3**.**
For all ,
[TABLE]
Proof.
, so , and the result then follows by Proposition 3.1. ∎
3.4 Moments of
The following gives the low-degree moments of Haar-distributed orthogonal matrices.
Proposition 3.4** (Lemma 9 of [CM08]).**
Let . The moment is zero if any index occurs an odd number of times among either the or . The non-zero degree 2 and 4 moments are given by
[TABLE]
4 Degree 4 pseudomoment construction
In this section we outline the main idea in the proof of Theorem 2.1, detailing the construction of a suitable degree 4 pseudomoment matrix and reducing Theorem 2.1 to verifying the positive semidefiniteness of this matrix.
First, we give two heuristic lines of reasoning supporting the construction we propose. The first derives a formula for the degree 4 pseudomoment extension of a highly structured collection of degree 2 pseudomoments and assumes that this formula may be transferred verbatim to the random case. The second recovers the same formula after applying some simplifying heuristics to a probabilistic argument.
As a preliminary, let us remark that, per Proposition 1.9, we may restrict our attention to reduced degree 4 pseudomoment matrices. If is a reduced degree 4 pseudomoment matrix extending , and we divide into blocks indexed by , , and , where is the block with rows indexed by and columns indexed by , then only the block is not determined by the properties of being reduced and extending :
[TABLE]
Thus in the sequel we will be justified in saying that an extension of is specified just by a particular choice of .
4.1 Heuristic 1: evidence from equiangular tight frames
In this section, we review a result from the works [BK18, BK19] of the authors’, which derived an explicit description of degree 4 extensions of degree 2 pseudomoment matrices for some very structured special cases, that resemble the Montanari-Sen witness in that their degree 2 pseudomoment matrices are constant multiples of orthogonal projectors.
The extra structure that allows this description of the degree 4 pseudomoments to be derived in closed form is described by the following notions from finite frame theory.
Definition 4.1**.**
A collection of vectors forms a unit norm tight frame (UNTF) if the following conditions hold.
(Unit Norm) for all . 2. 2.
(Tight Frame) .
They moreover form an equiangular tight frame (ETF) if the following additional condition holds.
(Equiangular) There is such that whenever .
ETFs are rare and combinatorially structured objects [STDHJ07, CRT08, FM15], which do not seem a priori related to the SK problem and the Montanari-Sen witness. However, it turns out that studying the degree 4 extensions of ETFs gives useful insight into the correct construction of pseudomoments even in the random case.
In general, we showed in [BK18] that degree 4 extensions are related to the following notion from convex geometry.
Definition 4.2**.**
Let be a closed convex set. For , the perturbation of in is the linear subspace
[TABLE]
Equivalently, if is the affine hull of the minimal face of that belongs to, then .
The relevant case for degree 4 extensions is . In this case, the following theorem characterizes the perturbation subspace.
Proposition 4.3** (Theorem 1(a) of [LT94]).**
Let have and for with unit vector columns . Then,
[TABLE]
For ETFs, a degree 4 extension, when any exists, is given explicitly in both spectral and entrywise terms as follows.
Theorem 4.4** (Theorem 2.19 of [BK18]).**
Let be the Gram matrix of an ETF of vectors in . Then, if and only if . In this case, a degree 4 extension is given by
[TABLE]
and the block structure given in (31). The entries of this matrix are given by
[TABLE]
(The spectral description will be of interest later, to draw a connection to the second heuristic presented in the following section.)
The Montanari-Sen witness is close to a constant multiple of a projection matrix, since by Proposition 3.1 all entries of the normalizing diagonal matrix are close to , so is a “near-UNTF Gram matrix.” Also, the off-diagonal entries of are inner products of random unit vectors , which it is reasonable to think are weakly dependent, whereby should moreover behave like “an ETF in expectation.”
Thus to guess a degree 4 extension for the Montanari-Sen witness , we may be justified in simply trying to apply the combinatorial ETF construction directly. Since we take with , we may also simplify the leading coefficients to their asymptotic values, which gives the prediction
[TABLE]
4.2 Heuristic 2: conditional covariance of gaussian matrices
We next show another, perhaps more principled argument through which we arrive at the same prediction of degree 4 pseudomoments. Let us suppose that is exactly the Gram matrix of a UNTF, i.e. and . As mentioned above, this is approximately the case for the Montanari-Sen witness, but our heuristic derivation is much simplified if the correction by the diagonal matrix to form the actual Montanari-Sen witness may be reduced to a constant scaling.
For the computations to come it is more convenient to view as being embedded in a larger pseudomoment matrix, indexed by pairs . We denote this matrix by , and, following the pseudomoment framework discussed before, set to be the “pseudoexpectation” of the monomial . In terms of and , these entries are
[TABLE]
We note that if is positive semidefinite then so is , since, up to removing repeated rows and columns, the latter is equal to the principal minor of the former indexed by .
Now, we view , a positive semidefinite matrix, as the degree 2 moment matrix of the entries of a gaussian random matrix: we suppose there exists some with random jointly gaussian entries such that
[TABLE]
We then design so that automatically satisfies some of the necessary constraints, and hope that the remaining constraints will be approximately satisfied as well.
We begin with a matrix having a canonical gaussian distribution for symmetric matrices, the GOE, suitably rescaled to allow us a normalizing degree of freedom later: and . Taking to be symmetric already ensures some of the symmetry conditions that must satisfy. Next, we take to have the distribution of , conditional on the following two properties:
. 2. 2.
for all .
Property 1 ensures that any symmetric matrix formed from by “freezing” one index pair and letting the other index pair vary, , has row (or column) space contained in that of . Every constructed as above must have this property, because it has a principal minor of the form
[TABLE]
whose positive semidefiniteness gives this condition on . Property 2 ensures that does not depend on the index , which is required by our definition of above, and reflects the application of the polynomial constraint to the monomial .
What is the law of the resulting gaussian matrix ? Conditioning on Property 1 yields the law of . By rotational invariance of the GOE, the inner matrix has the same law as the upper left block of , i.e., a smaller GOE matrix with the same variance scaling of .
Next, we condition on Property 2, or equivalently condition on having . has the law of for a gaussian vector . Since is an isometry, we may equivalently condition on for each . By basic properties of gaussian conditioning, the resulting law is
[TABLE]
where is the Gram matrix of the or equivalently the entrywise square of , and is the orthogonal projector to the span of the . Let be a matrix with the law of applied to the law in (40).
Having finished the conditioning calculations, we may now obtain the statistics of . Recall that . Applying to each matrix and using the expression derived above, we find the mean and covariance
[TABLE]
Next, we make two simplifying approximations. For the means, we approximate
[TABLE]
which gives
[TABLE]
For the covariances, since under our assumptions we have , we approximate
[TABLE]
which gives
[TABLE]
Finally, to recover what this prediction implies for the entries of , we compute
[TABLE]
We then choose such that , which requires , and, restricting to and , we recover the same formula as (36):
[TABLE]
Remark 4.5**.**
It is worth noting the intriguing geometric interpretation of the random matrix we have constructed: we have , deterministically, and fluctuates in the linear subspace (as may be verified from the covariance formula (42) and is intuitive by analogy with the ETF case of Section 4.1). Thus, behaves, roughly speaking, like a random element of (except that there is no enforcement of positive semidefiniteness), which lies on the same face of as and fluctuates gaussianly about along this face.
4.3 Precise construction details
Having intuitively motivated the a priori unusual degree 4 pseudomoment formula given (identically) in (36) and (48) in the previous two sections, we now give the precise details of how this may be adjusted to produce an actually valid degree 4 pseudomoment extension of , the “nudged” Montanari-Sen witness. It is instructive to view our construction as first attempting to build an extension of itself, then introducing the adjustment towards as a necessity to ensure positive semidefiniteness.
Step 1: Heuristic pseudomoments
We first build that is a reasonable prediction of a reduced degree 4 pseudomoment extension of . Viewing as a block matrix as in (31), all blocks but the lower right are prescribed by the properties of being reduced and extending , so we have
[TABLE]
We complete the definition by defining using the heuristics described earlier. Namely, we take
[TABLE]
Reviewing the constraints required of per Proposition 1.11, we see that satisfies the permutation symmetry constraints (Condition 5 in the Proposition) and the normalization and reduction constraints (Conditions 2 and 3, respectively) exactly, and satisfies the other linear constraints which require (Condition 4) approximately. Finally, the discussion of Sections 4.1 and 4.2 suggests that should also be positive semidefinite (Condition 1).
Step 2: Correction to satisfy linear constraints
We next correct to satisfy exactly all linear constraints required for a degree 4 pseudomoment matrix, by adjusting to satisfy Condition 4 of Proposition 1.11. Define an additive correction by
[TABLE]
(Note that the second part of the definition is consistent when regardless of whether we view or as the repeated index.)
Then, we set
[TABLE]
and satisfies Conditions 2 through 5 of Proposition 1.11 exactly. However, as we will see, is a low-rank matrix, while acts non-trivially on components of the null space of . Therefore, even if (which, as we will see, is nearly true), we will still have due to the fluctuations in .
Step 3: Correction to satisfy positive semidefiniteness
Finally, we introduce a second correction to counteract the fluctuations in the spectra of and . We note that the identity matrix is in fact a valid degree 4 pseudomoment matrix, which extends . Indeed, is a natural choice of a point of towards which to “push” in order to regularize our construction, because is the barycenter of the vertices of , i.e., .
Following this intuition, given a choice of the parameter , we set
[TABLE]
Clearly, extends and satisfies all linear constraints on a degree 4 pseudomoment matrix (since both and do so). Thus to show , it suffices to show , in which case will be a degree 4 pseudomoment extension. Theorem 2.1 will then be proved if we show that with high probability.
5 Conjectural higher-degree extension
Before proceeding to the proofs, we mention that there is a natural extension of the heuristic for pseudomoment construction of Section 4.2 that appears promising, though difficult to analyze, for higher-degree SOS relaxations.
The idea is to view higher-order pseudomoments as the second moments of symmetric gaussian tensors, which, as done in Section 4.2 for matrices, are formed by conditioning a certain canonical symmetric tensor distribution on desirable properties. Suppose we want to predict the degree pseudomoment extension of the Montanari-Sen witness (which, as before, we assume to be an exact unit norm tight frame, i.e., a constant multiple of ), where, extending the case from Section 4.2, the entry indexed by is the pseudoexpectation of . We do this by building a symmetric tensor with jointly gaussian entries, and setting, for ,
[TABLE]
To describe the law of , we first define the following tensorial generalization of the GOE (see, e.g., [RM14] for properties of this distribution analogous to those of the GOE).
Definition 5.1**.**
Let have i.i.d. entries distributed as . Then, write for the law of defined by
[TABLE]
Now, we define inductively over as a family of coupled gaussian tensors, and ensure that the pseudomoment matrices thus formed are consistent with one another. Namely, we proceed as follows. Let denote the concatenation of finite strings in the alphabet .
Let . 2. 2.
For , let have the law , conditioned on the following two properties:
- •
(Subspace Property) For , let . Then, for all , .
- •
(Consistency Property) For and , .
The constants remain as free parameters to be tuned to ensure normalization, as in the case from Section 4.2.
Based on this reasonable generalization, we offer two conjectures. First, we believe that whatever adjustments are necessary to this construction are already captured in the simple adjustment of the Montanari-Sen witness towards the identity matrix given in Theorem 2.1.
Conjecture 5.2**.**
For any and ,
[TABLE]
More specifically but less formally, we believe the construction presented above is approximately the correct degree pseudomoment extension.
Conjecture 5.3** (Informal).**
For , let be the Montanari-Sen witness. Then, for fixed and constants depending only on , with high probability as , the entries of as defined above give a “nearly” valid degree pseudomoment extension of .
We have tested Conjecture 5.3 numerically on Laurent’s construction [Lau03] of higher-degree pseudomoment matrices extending a deterministic , which indeed forms the Gram matrix of an ETF. Laurent’s construction shows that certain parity inequalities holding over are not certified by SOS until . We find that the results, for suitable tuning of , agree with Laurent’s construction (with no further adjustment needed). Thus one pleasant consequence of verifying Conjecture 5.3 may be a novel proof of Laurent’s theorem, whose original proof involves first predicting the entries of the pseudomoment matrix and then appealing to a technical analysis of hypergeometric functions to verify positive semidefiniteness.
We remark that we have also verified algebraically in our earlier paper [BK18] that, for degree 4, the construction in Theorem 4.4 (before simplifying in the asymptotic regime) exactly recovers Laurent’s construction. This also would not be the first simplified proof of Laurent’s theorem; for instance, the work [KLM16] gives a general treatment of sum-of-squares relaxations of combinatorial optimization problems with highly symmetric formulations, and also produces a Cholesky-type decomposition of the relevant pseudomoment matrix. However, the idea we outline above both would unify such results with those for less symmetric problems, and would give a natural interpretation of the Cholesky factors in such a decomposition as the random variables that are the entries of the tensors .
Finally, let us remark on what seems to be the major difficulty in analyzing this construction. By analogy with the analysis in Section 4.2, we are eventually led, in conditioning on the Consistency Property, to attempt to approximate the orthogonal projector to the “repeated indices subspace”
[TABLE]
(Here denotes the symmetric product of tensors; see, e.g., [SKM89] for definitions. We mean “orthogonal projection” with respect to the Frobenius or entrywise inner product of general non-symmetric tensors, into which symmetric tensors are embedded by repeating entries.) When , the spanning set consists of the linearly independent and roughly orthogonal tensors , which allows the orthogonal projection to be estimated by the sum of rank one projections as in (45). However, when , there does not appear to be a clear way to choose a convenient approximately-orthogonal basis to carry out the calculation. The collection of tensors is highly overcomplete, since the themselves are an overcomplete set in (and the symmetric tensor product is distributive, so any dependence among the is inherited by the for any symmetric tensor ). Moreover, even the subspaces intersect non-trivially; for instance, . Thus it appears that a deeper understanding of the structure of these subspaces of symmetric tensors is required to form the correct higher-degree analogue of (45).
6 Proof of positive semidefiniteness: first steps
Recall that, in Section 4.3, we built from the Montanari-Sen witness and an additional constant the matrix
[TABLE]
and found that to prove Theorem 2.1 it suffices to show that with high probability. We now give some technical preliminaries for the proof of this.
First, note that, after permuting rows and columns, is the direct sum of , which is positive semidefinite by assumption, with the principal minor of indexed by . Thus to show that it suffices to show that the latter minor is positive semidefinite.
Second, we may reduce the dimensionality of this remaining task by taking the Schur complement criterion for positive semidefiniteness with respect to the upper left entry, indexed by , whose value is 1. The condition of positive semidefiniteness of the Schur complement is then
[TABLE]
We reorganize this expression as
[TABLE]
To show , it then suffices to show that both and . We refer to these as the “main term” and the “correction term,” respectively.
In this decomposition, we split between and the extra term that we introduced when nudging our pseudomoment matrix towards the identity. This term will act as a “barrier” against small fluctuations that might spoil positive semidefiniteness. We will show that without this adjustment and are nearly positive semidefinite already, having the magnitude of their smallest (most negative) eigenvalue tending to zero as for any fixed . Thus any choice of will suffice to ensure that with high probability.
More specifically, we will show the following results.
Lemma 6.1** (Control of main term).**
For all ,
[TABLE]
Lemma 6.2** (Control of correction term).**
For all ,
[TABLE]
From Lemmata 6.1 and 6.2, it follows that with high probability for any fixed as (or for decreasing sufficiently slowly with , though for the sake of simplicity we will not pursue this minor strengthening of the results). Theorem 2.1 then follows. It remains only to prove the Lemmata; in Section 7 we will prove Lemma 6.1, and in Section 8 we will prove Lemma 6.2.
7 Proof of positive semidefiniteness: main term
In this section we prove Lemma 6.1. We have
[TABLE]
Consider the quadratic form , where we think of for some symmetric matrix with (i.e., ). Writing for the columns of ,
[TABLE]
Writing the above as a quadratic form in , we obtain the following.
Proposition 7.1**.**
Define
[TABLE]
(Recall that .) Then,
[TABLE]
Proof.
Since the right-hand side of (79) is at most zero, it suffices to consider the case that . By (77), we have
[TABLE]
Note first that
[TABLE]
Then, recalling that and , by the variational description of the minimum eigenvalue we have
[TABLE]
completing the proof. ∎
We will thus focus our attention on . Analyzing the Wishart-type matrix formed by the third term of (78), , will be our main difficulty. Since , we center the vectors involved, and decompose this term as
[TABLE]
The remaining analysis involves a delicate balance between requirements in controlling and . In order to bound the cross-term , we will rely on the strong concentration of the eigenvalues of that is created by the dependencies among the (this is the “near-UNTF Gram matrix” behavior of ). In particular, this concentration is much stronger than if were replaced with any reasonable distribution of i.i.d. unit vectors, and this portion of our argument would fail for i.i.d. vectors (see Remark 7.3).
On the other hand, in order to bound the term , we will need to take advantage of the weak dependence of the , and formalize the intuition that because and is a sum of weakly dependent rank-one orthogonal projectors, should itself behave approximately as an orthogonal projector to a subspace of dimension (though we will discuss one important caveat to this intuition in Remark 7.3). Technically, we will appeal to Lipschitz concentration inequalities for the Haar measure on Stiefel manifolds, which capture the heuristic weak dependence of entries of blocks of random orthogonal matrices under the Haar measure.
7.1 Bounding the cross-term
Lemma 7.2**.**
For all ,
[TABLE]
Proof.
Applying the matrix arithmetic-geometric mean inequality,
[TABLE]
Rewriting the norm appearing in the first term,
[TABLE]
since . By Proposition 3.1,
[TABLE]
Thus with at least the same probability we have
[TABLE]
and the result follows. ∎
Remark 7.3**.**
Let us contrast the result of this section with the same analysis for i.i.d. vectors. The marginal law of each is uniform over , so consider taking independent. Then, we compute
[TABLE]
Thus the corresponding cross-term would have largest eigenvalue of order , which in particular would not decay with .
Consequently, our previous intuition that we should obtain an approximate projector of rank from cannot be correct, since the putative basis vectors almost sum to zero. In the following section, we will show that this is in fact the only linear near-dependence of these vectors, and is still an approximate orthogonal projector, only of rank .
7.2 Bounding the projection term : unnormalized case
Our strategy for bounding will proceed in two steps: first, we will bound the same matrix but constructed from the approximately normalized vectors in place of the strictly normalized vectors , and then we will show that this replacement does not significantly affect the spectrum. In this section we perform the first, more difficult of these tasks. We will show the following result.
Lemma 7.4**.**
Let have as its columns. Let , the orthogonal projector to the subspace orthgonal to . Then,
[TABLE]
The argument will use the technique of union bounding over a net. Our main technical tool will be the following Lipschitz concentration inequality for the Haar measure of the Stiefel manifolds.
Recall that the Stiefel manifolds are defined as:
[TABLE]
The Haar measure may be viewed as the measure obtained by restricting to the upper matrix block.
These measures enjoy the following concentration inequality when , obtained by standard arguments from logarithmic Sobolev or isoperimetric inequalities for the special orthogonal group , of which is a quotient when (see, e.g., the discussion following Theorem 2.4 of [Led01]).
Proposition 7.5**.**
Suppose , and has Lipschitz constant at most when is endowed with the metric of the Frobenius matrix norm. Then, for an absolute constant ,
[TABLE]
Note that since we have with , we will always satisfy the hypothesis ; we will use this implicitly without further comment for the remainder of the proof.
Proof of Lemma 7.4.
Note that ; thus as suggested already in Remark 7.3, it is impossible for to act on as an approximate isometric embedding, as we might naively expect from its weakly dependent columns. Our argument is more natural to carry out if we remove this caveat; therefore, let us define to have columns . One may check that , and that
[TABLE]
In particular, , so it suffices to show the operator norm bound of (92) for .
For , let us denote for the course of this proof. Then,
[TABLE]
For , define
[TABLE]
Then, recalling that is the orthogonal projector to the row space of ,
[TABLE]
Let us denote balls in Euclidean space by
[TABLE]
Our first goal will be to obtain concentration bounds on when for each fixed pair , by applying the Lipschitz concentration inequality.
Claim 7.6**.**
Let . Then,
[TABLE]
Proof.
For , letting , we have using (96) and the triangle inequality
[TABLE]
Since is an orthogonal projector for ,
[TABLE]
We bound the other term by
[TABLE]
Combining these observations and a symmetric argument with and in opposite roles gives
[TABLE]
Lastly, we bound
[TABLE]
where we have used that , and the result follows. ∎
Therefore, and what is crucial to our argument, while for the worst-case , namely a standard basis vector, will have Lipschitz constant , for typical , will rather have Lipschitz constant . Moreover, the Lipschitz constant of is comparable to the smaller of the Lipschitz constants of and .
Claim 7.7**.**
For , .
Proof.
We have
[TABLE]
View as the top block of . Then, expanding the first term with the moment formulae of Proposition 3.4,
[TABLE]
and since and , the result follows. ∎
Combining Claim 7.6, Claim 7.7, and the concentration result Proposition 7.5, we find the following corollary on pointwise concentration of .
Claim 7.8**.**
There exist constants depending only on such that, for any ,
[TABLE]
This concludes the first part of the argument.
The remaining part of the argument is to apply a union bound of the probabilities controlled in Claim 7.8 over suitable nets of . We divide our task into a bound over sparse vectors and vectors with bounded largest entry, very similar to the technique in [Rud08, RV08] and especially [Ver11]. Introduce a parameter to be chosen later. Define
[TABLE]
For any , we define and by thresholding the entries of , setting and . Then, , , and .
Introduce another parameter to be chosen later. Let and be -nets. By a standard bound (see, e.g., Lemma 9.5 of [LT13]), we may choose , and by the same bound applied to each choice of support coordinates for an element of , we may choose
[TABLE]
To lighten the notation, let us set . The following is an adaptation to our setting of a standard technique for estimating a matrix norm over a net: we first bound
[TABLE]
Rearranging this, we obtain
[TABLE]
Using Claim 7.8 and a union bound, we have that
[TABLE]
Taking , for a large constant , and a small constant, we obtain the result. ∎
7.3 Bounding the projection term : normalization
In this section, we show that the passing from the approximately normalized vectors discussed in the previous section to the exactly normalized vectors does not affect the construction of very much, as measured by operator norm.
Lemma 7.9**.**
Let have as columns, and let have as columns. Then, for all ,
[TABLE]
Proof.
Recall that is diagonal with . Then, we may write
[TABLE]
Therefore,
[TABLE]
By Lemma 7.4 from the previous section, the second term is with super-polynomially high probability. The result then follows by Proposition 3.1. ∎
Next, we translate this result to the Gram matrix .
Corollary 7.10**.**
In the same setting as Lemma 7.9, for all ,
[TABLE]
Proof.
We may bound
[TABLE]
Then, combining Lemma 7.4 and Lemma 7.9 gives the result. ∎
7.4 Final main term bound: proof of Lemma 6.1
We are now ready to complete the proof of Lemma 6.1. First, we combine Lemma 7.4 and Corollary 7.10. Recall that these showed the following bounds, with high probability:
[TABLE]
Combining these, we obtain the following bound on .
Corollary 7.11**.**
For all ,
[TABLE]
Next, we complete the proof of Lemma 6.1. Recall that
[TABLE]
and we have gathered the following bounds, holding with high probability:
[TABLE]
We can therefore control the minimum eigenvalue of by, with high probability,
[TABLE]
The first term is positive semidefinite for sufficiently large since , and thus we find
[TABLE]
Finally, we must convert this to a bound on the smallest eigenvalue of . By Proposition 7.1, letting , we have
[TABLE]
By Proposition 3.1, with high probability for all . Substituting this above, we thus find the result of Lemma 6.1,
[TABLE]
8 Proof of positive semidefiniteness: correction term
Proof of Lemma 6.2.
Recall that is non-zero only on pairs of index sets and that share an index. The non-zero entries are given by
[TABLE]
Let us compute the quadratic form of with , which we view as having entries for some with (i.e. ). We then have, expanding with a correction for double-counting the diagonal terms,
[TABLE]
We will now use the following inequality that allows us to bound inner products with the Schur square of a matrix by its Frobenius norm:
[TABLE]
(We denote by the vectorized supremum norm.) This will result in one term involving , which we bound by
[TABLE]
Combining these inequalities and noting that , we find (for ) that
[TABLE]
Since , we have by Corollary 3.2 that, with high probability,
[TABLE]
and by Corollary 3.3, with high probability.
To control the terms involving , recall that from Lemmata 7.4 and 7.9 it follows that with high probability
[TABLE]
Thus we have, with high probability,
[TABLE]
Combining these results, Lemma 6.2 follows. ∎
Remark 8.1**.**
We outline a simpler argument for Lemma 6.2 which suggests a sharper estimate, though it seems more difficult to formalize due to the dependency structure of . Since is sparse (with non-zero entries only when the row and column index sets and share an element), our strategy will be to apply the Gershgorin circle theorem. Thus we must bound the diagonal and off-diagonal entries of .
The summation defining the diagonal entries contains only positive summands, so we have
[TABLE]
using Proposition 3.1 to control each term of the sum. (We omit bounds expressing “with high probability” statements and indulge in the logarithm-concealing notation in this informal discussion.)
The off-diagonal entries of are more difficult to control. They are
[TABLE]
Here, we must capture the cancellations due to random signs: a naive application of the triangle inequality would give , which would be insufficient for the Gershgorin circle theorem argument since there are non-zero entries in each row of . On the other hand, a scaling like that in the central limit theorem with independent signs would give , which would suffice, and would give , stronger by a factor of than the result of Lemma 6.2 (up to logarithmic factors).
Acknowledgements
We thank Jess Banks, Ankur Moitra, Andrea Montanari, Cristopher Moore, Tselil Schramm, and Ramon van Handel for useful discussions. We also thank the authors of [MRX19] for generously providing an early version of their manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[ABM 18] Louigi Addario-Berry and Pascal Maillard. The algorithmic hardness threshold for continuous random energy models. ar Xiv preprint ar Xiv:1810.05129 , 2018.
- 2[BHK + 19] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pravesh K Kothari, Ankur Moitra, and Aaron Potechin. A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM Journal on Computing , 48(2):687–735, 2019.
- 3[BK 18] Afonso S Bandeira and Dmitriy Kunisky. A Gramian description of the degree 4 generalized elliptope. ar Xiv preprint ar Xiv:1812.11583 , 2018.
- 4[BK 19] Afonso S Bandeira and Dmitriy Kunisky. Sum-of-squares optimization and the sparsity structure of equiangular tight frames. In 2019 International Conference on Sampling Theory and Applications (Samp TA 2019) . IEEE, 2019.
- 5[BKW 20] Afonso S Bandeira, Dmitriy Kunisky, and Alexander S Wein. Computational hardness of certifying bounds on constrained PCA problems. In Thomas Vidick, editor, 11th Innovations in Theoretical Computer Science Conference (ITCS 2020) , volume 151 of Leibniz International Proceedings in Informatics (LIP Ics) , pages 78:1–78:29, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
- 6[BPT 12] Grigoriy Blekherman, Pablo A Parrilo, and Rekha R Thomas. Semidefinite optimization and convex algebraic geometry . SIAM, 2012.
- 7[BS 14] Boaz Barak and David Steurer. Sum-of-squares proofs and the quest toward optimal algorithms. ar Xiv preprint ar Xiv:1404.5236 , 2014.
- 8[CM 08] Sourav Chatterjee and Elizabeth Meckes. Multivariate normal approximation using exchangeable pairs. Alea , 4:257–283, 2008.
