Convergence analysis of a Lasserre hierarchy of upper bounds for polynomial minimization on the sphere
Etienne de Klerk, Monique Laurent

TL;DR
This paper analyzes the convergence rate of Lasserre's hierarchy for polynomial minimization on the sphere, establishing a precise Theta(1/r^2) rate and discussing implications for the generalized moment problem.
Contribution
It provides the first exact convergence rate for Lasserre's hierarchy on the sphere, advancing understanding of its efficiency in polynomial optimization.
Findings
Convergence rate is Theta(1/r^2).
Results apply to the generalized moment problem on the sphere.
Enhances theoretical understanding of hierarchy's efficiency.
Abstract
We study the convergence rate of a hierarchy of upper bounds for polynomial minimization problems, proposed by Lasserre [SIAM J. Optim. 21(3) (2011), pp. 864-885], for the special case when the feasible set is the unit (hyper)sphere. The upper bound at level r of the hierarchy is defined as the minimal expected value of the polynomial over all probability distributions on the sphere, when the probability density function is a sum-of-squares polynomial of degree at most 2r with respect to the surface measure. We show that the exact rate of convergence is Theta(1/r^2), and explore the implications for the related rate of convergence for the generalized problem of moments on the sphere.
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0.1714 | 0.0952 | 0.0519 | 0.0457 | 0.0287 | 0.0283 | 0.0193 | 0.0177 | 0.0139 | 0.0122 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Convergence analysis of a Lasserre hierarchy of upper bounds for polynomial minimization on the sphere
Etienne de Klerk &Monique Laurent Tilburg University and Delft University of Technology, [email protected] Wiskunde & Informatica (CWI), Amsterdam and Tilburg University, [email protected]
Abstract
We study the convergence rate of a hierarchy of upper bounds for polynomial minimization problems, proposed by Lasserre [SIAM J. Optim. , pp. ], for the special case when the feasible set is the unit (hyper)sphere. The upper bound at level of the hierarchy is defined as the minimal expected value of the polynomial over all probability distributions on the sphere, when the probability density function is a sum-of-squares polynomial of degree at most with respect to the surface measure.
We show that the exact rate of convergence is , and explore the implications for the related rate of convergence for the generalized problem of moments on the sphere.
K****eywords polynomial optimization on sphere Lasserre hierarchy semidefinite programming generalized eigenvalue problem
AMS subject classification 90C22; 90C26; 90C30
1 Introduction
We consider the problem of minimizing an -variate polynomial over a compact set , i.e., the problem of computing the parameter:
[TABLE]
In this paper we will focus on the case when is the unit sphere: , in which case we will omit the subscript and simply write
Problem (1) is in general a computationally hard problem, already for simple sets like the hypercube, the standard simplex, and the unit ball or sphere. For instance, the problem of finding the maximum cardinality of a stable set in a graph can be expressed as optimizing a quadratic polynomial over the standard simplex [19], or a degree 3 polynomial over the unit sphere [20]:
[TABLE]
where is the adjacency matrix of , is the set of non-edges of and . Other applications of polynomial optimization over the unit sphere include deciding whether homogeneous polynomials are positive semidefinite. Indeed, a homogeneous polynomial is defined as positive semidefinite precisely if
[TABLE]
and positive definite if the inequality is strict; see e.g. [23]. As special case, one may decide if a symmetric matrix is copositive, by deciding if the associated form is positive semidefinite; see, e.g. [21].
Another special case is to decide the convexity of a homogeneous polynomial , by considering the parameter
[TABLE]
which is nonnegative if and only if is convex. This decision problem is known to be NP-hard, already for degree forms [1].
As shown by Lasserre [16], the parameter (1) can be reformulated via the infinite dimensional program
[TABLE]
where denotes the set of sums of squares of polynomials, and is a given Borel measure supported on . Given an integer , by bounding the degree of the polynomial by , Lasserre [16] defined the parameter:
[TABLE]
where consists of the polynomials in with degree at most . Here we use the ‘overline’ symbol to indicate that the parameters provide upper bounds for , in contrast to the parameters in (9) below, which provide lower bounds for it.
Since sums of squares of polynomials can be formulated using semidefinite programming, the parameter (3) can be expressed via a semidefinite program. In fact, since this program has only one affine constraint, it even admits an eigenvalue reformulation [16], which will be mentioned in (12) in Section 2.2 below. Of course, in order to be able to compute the parameter (3) in practice, one needs to know explicitly (or via some computational procedure) the moments of the reference measure on . These moments are known for simple sets like the simplex, the box, the sphere, the ball and some simple transforms of them (they can be found, e.g., in Table 1 in [10]).
As a direct consequence of the formulation (2), the bounds converge asymptotically to the global minimum when . How fast the bounds converge to the global minimum in terms of the degree has been investigated in the papers [12, 7, 9], which show, respectively, a convergence rate in for general compact (satisfying a minor geometric condition), a convergence rate in when is a convex body, and a convergence rate in when is the box . In these works the reference measure is the Lebesgue measure, except for the box where more general measures are considered (see Theorem 3 below for details).
In this paper we are interested in analyzing the worst-case convergence of the bounds (3) in the case of the unit sphere , when selecting as reference measure the surface (Haar) measure on . We let denote the surface measure of , so that is a probability measure on , with
[TABLE]
(See, e.g., [6, relation (2.2.3)].) To simplify notation we will throughout omit the subscript in the parameters (1) and (3), which we simply denote as
[TABLE]
Example 1**.**
Consider the minimization of the Motzkin form
[TABLE]
on . This form has minimizers on the sphere, namely as well as and , and one has .
In Table 1 we give the bounds for the Motzkin form for .
In Figure 1 we show a contour plot of the Motzkin form on the sphere (top left), as well as a contour plot of the optimal density function for (top right), (bottom left), and (bottom right). In the figure, the red end of the spectrum denotes higher function values. Some local maximimizers of the Motzkin form are visible that correspond to (at the poles) and (on the equator).
When and , the modes of the optimal density are at the global minimizers and (one may see the contours of two of these modes in one hemisphere). On the other hand, when , the mass of the distribution is concentrated at the global minimizers (one may see of these in one hemisphere), and there are no modes at the global minimizers and .
It is also illustrative to do the same plots using spherical coordinates:
[TABLE]
In Figure 2 we plot the Motzkin form in spherical coordinates (top left), as well as the optimal density function that corresponds to (top right), (bottom left), and (bottom right).
For example, when one can see the modes (peaks) of the density that correspond to the global minimizers . (Note that the peaks at and correspond to the same mode of the density, due to periodicity.) Likewise when and one may see modes corresponding to and .
The convergence rate of the bounds was investigated by Doherty and Wehner [4], who showed
[TABLE]
when is a homogeneous polynomial. As we will briefly recap in Section 2.1, their result follows in fact as a byproduct of their analysis of another Lasserre hierarchy of bounds for , namely the lower bounds (9) below.
Our main contribution in this paper is to show that the convergence rate of the bounds is for any polynomial and, moreover, that this analysis is tight for any (nonzero) linear polynomial . This is summarized in the following theorem.
Theorem 1**.**
- (i)
For any polynomial we have
[TABLE]
- (ii)
For any (nonzero) linear polynomial we have
[TABLE]
Let us say a few words about the proof technique. For the first part (i), our analysis relies on the following two basic steps: first, we observe that it suffices to consider the case when is linear (which follows using Taylor’s theorem), and then we show how to reduce to the case of minimizing a linear univariate polynomial over the interval , where we can rely on the analysis completed in [9]. For the second part (ii), by exploiting a connection recently mentioned in [18] between the bounds (3) and cubature rules, we can rely on known results for cubature rules on the unit sphere to show tightness of the bounds.
Organization of the paper. In Section 2 we recall some previously known results that are most relevant to this paper. First we give in Section 2.1 a brief recap of the approach of Doherty and Wehner [4] for analysing bounds for polynomial optimization over the unit sphere. After that, we recall our earlier results about the quality of the bounds (3) in the case of the interval . Section 3 contains our main results about the convergence analysis of the bounds (3) for the unit sphere: after showing in Section 3.1 that the convergence rate is in we prove in Section 3.2 that the analysis is tight for nonzero linear polynomials.
2 Preliminaries
2.1 The approach of Doherty & Wehner for the sphere
Here we briefly sketch the approach followed by Doherty and Wehner [4] for showing the convergence rate mentioned above in (6). Their approach applies to the case when is a homogeneous polynomial, which enables using the tensor analysis framework. A first observation made in [4] is that we may restrict to the case when has even degree, because if is homogeneous with odd degree then we have
[TABLE]
So we now assume that is homogeneous with even degree .
The approach in [4] in fact also permits to analyze the following hierarchy of lower bounds on :
[TABLE]
which are the usual sums-of-squares bounds for polynomial optimization (as introduced in [14, 22]). Here and throughout, denotes the Euclidean norm for real vectors. One can verify that (9) can be reformulated as
[TABLE]
(see [11]). For any integer we have
[TABLE]
The following error estimate is shown on the range in [4].
Theorem 2**.**
[4]** Assume and is a homogeneous polynomial of degree . There exists a constant (depending only on and ) such that, for any integer , we have
[TABLE]
where is the maximum value of taken over .
The starting point in the approach in [4] is reformulating the problem in terms of tensors. For this we need the following notion of ‘maximally symmetric matrix’. Given a real symmetric matrix indexed by sequences , is called maximally symmetric if it is invariant under action of the permutation group after viewing as a -tensor acting on . This notion is the analogue of the ‘moment matrix’ property, when expressed in the tensor setting. To see this, for a sequence , define by letting denote the number of occurrences of within the multi-set for each , so that . Then, the matrix is maximally symmetric if and only if each entry depends only on the -tuple . Following [4] we let denote the set of maximally symmetric matrices acting on .
It is not difficult to see that any degree homogeneous polynomial can be represented in a unique way as
[TABLE]
where the matrix is maximally symmetric.
Given an integer , define the polynomial , thus homogeneous with degree . The parameter (10) can now be reformulated as
[TABLE]
The approach in [4] can be sketched as follows. Let be an optimal solution to the program (11) (which exists since the feasible region is a compact set). Then the polynomial is a sum of squares since . After scaling, we obtain the polynomial
[TABLE]
which defines a probability density function on , i.e., . In this way provides a feasible solution for the program defining the upper bound . This thus implies the chain of inequalities
[TABLE]
The main contribution in [4] is their analysis for bounding the range between the two extreme values in the above chain and showing Theorem 2, which is done by using, in particular, Fourier analysis on the unit sphere.
Using different techniques we will show below a rate of convergence in for the upper bounds , thus stronger than the rate in Theorem 2 above and applying to any polynomial (not necessarily homogeneous). On the other hand, while the constant involved in Theorem 2 depends only on the degree of and the dimension , the constant in our result depends also on other characteristics of (its first and second order derivatives). A key ingredient in our analysis will be to reduce to the univariate case, namely to the optimization of a linear polynomial over the interval . Thus we next recall the relevant known results that we will need in our treatment.
2.2 Convergence analysis for the interval
We start with recalling the following eigenvalue reformulation for the bound (3), which holds for general compact and plays a key role in the analysis for the case . For this consider the following inner product
[TABLE]
on the space of polynomials on and let denote a basis of this polynomial space that is orthonormal with respect to the above inner product; that is, Then the bound (2) can be equivalently rewritten as
[TABLE]
(see [16, 7]). Using this reformulation we could show in [7] that the bounds (3) have a convergence rate in for the case of the interval (and as an application also for the -dimensional box ).
This result holds for a large class of measures on , namely those which admit a weight function (with ) with respect to the Lebesgue measure. The corresponding orthogonal polynomials are known as the Jacobi polynomials where is their degree. The case (resp., ) corresponds to the Chebychev polynomials (resp., the Legendre polynomials), and when , the corresponding polynomials are the Gegenbauer polynomials where is their degree. See, e.g., [6, Chapter 1] for a general reference about orthogonal polynomials.
The key fact is that, in the case of the univariate polynomial , the matrix in (12) has a tri-diagonal shape, which follows from the 3-term recurrence relationship satisfied by the orthogonal polynomials. In fact, coincides with the so-called Jacobi matrix of the orthogonal polynomials in the theory of orthogonal polynomials and its eigenvalues are given by the roots of the degree orthogonal polynomial (see, e.g. [6, Chapter 1]). This fact is key to the following result.
Theorem 3**.**
[7]** Consider the measure on the interval , where . For the univariate polynomial , the parameter is equal to the smallest root of the Jacobi polynomial (with degree ). In particular, \overline{f}^{(r)}=-\cos\Big{(}{\pi\over 2r+2}\Big{)} when . For any we have
[TABLE]
3 Convergence analysis for the unit sphere
In this section we analyze the quality of the bounds when minimizing a polynomial over the unit sphere . In Section 3.1 we show that the range is in and in Section 3.2 we show that the analysis is tight for linear polynomials.
3.1 The bound
We first deal with the -variate linear (coordinate) polynomial and after that we will indicate how the general case can be reduced to this special case. The key idea is to get back to the analysis in Section 2.2, for the interval with an appropriate weight function. We begin with introducing some notation we need.
To simplify notation we set (which also matches the notation customary in the theory of orthogonal polynomials where usually is the number of variables). We let denote the unit ball in , where for . Given a scalar , define the -variate weight function
[TABLE]
(well-defined when ) and set
[TABLE]
so that is a probability measure over the unit ball . See, e.g., [6, Section 2.3.2] or [2, Section 11].
We will use the following simple lemma, which indicates how to integrate the -variate weight function along variables.
Lemma 1**.**
Fix and let . Then we have:
[TABLE]
which is thus equal to .
Proof.
Change variables and set for . Then we have and Putting things together and using relation (14) we obtain the desired result. ∎
We also need the following lemma, which relates integration over the unit sphere and integration over the unit ball and can be found, e.g., in [6, Lemma 3.8.1] and [2, Lemma 11.7.1].
Lemma 2**.**
Let be a -variate integrable function defined on and . Then we have:
[TABLE]
By combining these two lemmas we obtain the following result.
Lemma 3**.**
Let be a univariate polynomial and . Then we have:
[TABLE]
where we set
Proof.
Applying Lemma 2 to the function we get
[TABLE]
If then and the right hand side term in (15) is equal to
[TABLE]
as desired, since using and (by (14) and ). Assume now . Then the right hand side in (15) is equal to
[TABLE]
[TABLE]
where we have used Lemma 1 for the first equality. Finally we verify that the constant is equal to 1:
[TABLE]
(using relations (4) and (14)), and thus we arrive at the desired identity. ∎
We can now complete the convergence analysis for the minimization of on the unit sphere.
Lemma 4**.**
For the minimization of the polynomial over with , the order upper bound (3) satisfies
[TABLE]
Proof.
Let be an optimal univariate sum-of-squares polynomial of degree for the order upper bound corresponding to the minimization of over , when using as reference measure on the measure with weight function and (thus ). Applying Lemma 3 to the univariate polynomials and , we obtain
[TABLE]
and
[TABLE]
Since the function has the same global minimum over and over the sphere , we can apply Theorem 3 to conclude that
[TABLE]
∎
We now indicate how the analysis for an arbitrary polynomial reduces to the case of the linear coordinate polynomial . To see this, suppose is a global minimizer of over . Then, using Taylor’s theorem, we can upper estimate as follows:
[TABLE]
setting . Note that the upper estimate is a linear polynomial, which has the same minimum value as on , namely . From this it follows that and thus we may restrict to analyzing the bounds for a linear polynomial.
Next, assume is a linear polynomial, of the form with (up to scaling) . We can then apply a change of variables to bring into the form . Namely, let be an orthogonal matrix such that . Then the polynomial has the desired form and it has the same minimum value over as . As the sphere is invariant under any orthogonal transformation it follows that (applying Lemma 4 to ). Summarizing, we have shown the following.
Theorem 4**.**
For the minimization of any polynomial over with , the order upper bound (3) satisfies
[TABLE]
Note the difference to Theorem 2 where the constant depends only on the degree of and the number of variables; here the constant in does also depend on the polynomial , namely it depends on the norm of at a global minimizer of in and on .
3.2 The analysis is tight for linear polynomials
In this section we show — through an example — that the convergence rate cannot be better than . The example is simply minimizing over the sphere . The key tool we use is a link between the bounds and properties of some known cubature rules on the unit sphere. This connection, recently mentioned in [18], holds for any compact set . It goes as follows.
Suppose the points and the weights provide a (positive) cubature rule for for a given measure , which is exact up to degree , that is,
[TABLE]
for all polynomials with degree at most . Then, for any polynomial with degree at most , we have
[TABLE]
The argument is simple: if is an optimal sum-of-squares density for the parameter , then we have
[TABLE]
[TABLE]
As a warm-up we consider the case , where we can use the cubature rule in Theorem 5 below for the unit circle. We use spherical coordinates to express a polynomial in as a polynomial in .
Theorem 5**.**
[2, Proposition 6.5.1]* For each , the cubature formula*
[TABLE]
is exact for all , i.e. for all polynomials of degree at most , restricted to the unit circle.
Using this cubature rule on we can lower bound the parameters for the minimization of over . Namely, by setting , we derive directly from the above theorem combined with relation (16) that
[TABLE]
This reasoning extends to any dimension , by using product-type cubature formulas on the sphere . In particular we will use the cubature rule described in [2, Theorem 6.2.3], see Theorem 7 below.
We will need the generalized spherical coordinates given by
[TABLE]
where ( on ), , and ().
To define the nodes of the cubature rule on we need the Gegenbauer polynomials , where . Recall that these are the orthogonal polynomials with respect to the weight function
[TABLE]
on . We will not need the explicit expressions for the polynomials , we only need the following information about their extremal roots, shown in [7] (for general Jacobi polynomials, using results of [3, 5]). It is well known that each has distinct roots, lying in .
Theorem 6**.**
Denote the roots of the polynomial by . Then, .
The cubature rule we will use may now be stated.
Theorem 7**.**
[2, Theorem 6.2.3]* Let be a polynomial of degree at most , and let*
[TABLE]
be the expression of in the generalized spherical coordinates (17). Then
[TABLE]
where and the parameters are positive scalars as in relation (6.2.3) of [2].
We can now show the tightness of the convergence rate for the minimization of a coordinate polynomial on .
Theorem 8**.**
Consider the problem of minimizing the coordinate polynomial on the unit sphere with . The convergence rate for the parameters (3) satisfies
[TABLE]
Proof.
We have , so that . Using (16) we obtain that
[TABLE]
where we use the fact that (Theorem 6).
∎
4 Implications for the generalized problem of moments
In this section, we describe the implications of our results for the generalized problem of moments (GPM), defined as follows for a compact set .
[TABLE]
where
- •
the functions are continuous on ;
- •
denotes the convex cone of probability measures supported on the set ;
- •
the scalars () are given.
As before, we are interested in the special case where . This special case is already of independent interest, since it contains the problem of finding cubature schemes for numerical integration on the sphere, see e.g. [10] and the references therein. Our main result in Theorem 4 has the following implication for the GPM on the sphere, as a corollary of the following result in [13] (which applies to any compact , see also [10] for a sketch of the proof in the setting described here).
Theorem 9** (De Klerk-Postek-Kuhn [13]).**
Assume that are polynomials, is compact, is a Borel measure supported on , and the GPM (19) has an optimal solution. Given , define the parameter
[TABLE]
setting . If, for any polynomial , we have
[TABLE]
where , then the parameters satisfy: .
As a consequence of our main result in Theorem 4, combined with Theorem 10, we immediately obtain the following corollary.
Corollary 1**.**
Assume that are polynomials, , and the GPM (19) has an optimal solution. Then, for any integer , there is an such that
[TABLE]
Minimization of a rational function on is a special case of the GPM where we may prove a better rate of convergence. In particular, we now consider the global optimization problem:
[TABLE]
where are polynomials such that , and is compact.
It is well-known that one may reformulate this problem as the GPM with and , , and , i.e.:
[TABLE]
Analogously to (3), we now define the hierarchy of upper bounds on as follows:
[TABLE]
where is a Borel measure supported on .
Theorem 10**.**
Consider the rational optimization problem (20). If, for any polynomial , it holds that
[TABLE]
where , then one also has . In particular, if , then .
Proof.
Consider the polynomial
[TABLE]
Then for all , and , with global minimizer given by the minimizer of problem (20).
Now, for given , let be such that , and , where is the reference measure for . Setting
[TABLE]
one has and . Thus is feasible for problem (21). Moreover, by construction,
[TABLE]
The final result for the special case and (surface measure) now follows from our main result in Theorem 4. ∎
5 Concluding remarks
In this paper we have improved on the convergence result of Doherty and Wehner [4] for the Lasserre hierarchy of upper bounds (3) for (homogeneous) polynomial optimization on the sphere. Having said that, Doherty and Wehner also showed that the hierarchy of lower bounds (9) of Lasserre satisfies the same rate of convergence, due to Theorem 2. In view of the fact that we could show the improved rate for the upper bounds, and the fact that the lower bounds hierarchy empirically converges much faster in practice, one would expect that the lower bounds (9) also converge at a rate no worse than . However, our analysis does not allow us to analyse the convergence of the lower bound hierarchy, and this remains an interesting open problem.
Another open problem is the exact rate of convergence of the bounds in Theorem 10 for the generalized problem of moments (GPM). In our analysis of the GPM on the sphere in Corollary 1, we could only obtain convergence, which is a square root worse than the special cases for polynomial and rational function minimization. We do not know at the moment if this is a weakness of the analysis or inherent to the GPM.
Note that if we pick another reference measure , where is strictly positive on the sphere, then the convergences rates with respect to both measures and have the same behaviour (up to multiplicative constant). It would be interesting to understand the convergence rate for more general reference measures.
Acknowledgement
This work has been supported by European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement 813211 (POEMA).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. A. Ahmadi, A. Olshevsky, P. A. Parrilo, and J. N. Tsitsiklis, NP-hardness of deciding convexity of quartic polynomials and related problems , Mathematical Programming 137 (2013), no. 1-2, 453–476.
- 2[2] F. Dai and Y. Xu. Approximation Theory and Harmonic Analysis on Spheres and Balls , Springer, New York (2013).
- 3[3] D.K. Dimitrov, G.P. Nikolov. Sharp bounds for the extreme zeros of classical orthogonal polynomials, Journal of Approximation Theory 162 (2010), 1793–1804.
- 4[4] Doherty, A.C., Wehner, S.: Convergence of SDP hierarchies for polynomial optimization on the hypersphere. ar Xiv:1210.5048 v 2 (2013).
- 5[5] K. Driver, K. Jordaan. Bounds for extreme zeros of some classical orthogonal polynomials. Journal of Approximation Theory 164 (2012), 1200–1204.
- 6[6] C.F. Dunkl and Y. Xu. Orthogonal Polynomials of Several Variables. Encyclopedia of Mathematics, Cambridge University Press (2001).
- 7[7] E. de Klerk and M. Laurent. Comparison of Lasserre’s measure-based bounds for polynomial optimization to bounds obtained by simulated annealing. ar Xiv:1703.00744, to appear in Mathematics of Operations Research .
- 8[8] E. de Klerk, R. Hess and M. Laurent. Improved convergence rates for Lasserre-type hierarchies of upper bounds for box-constrained polynomial optimization. SIAM Journal on Optimization 27 (2017), no. 1, 347–367.
