DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization
Amir Ali Ahmadi, Anirudha Majumdar

TL;DR
This paper introduces DSOS and SDSOS optimization as scalable, tractable alternatives to sum of squares methods, enabling larger problem solving with acceptable accuracy tradeoffs in various fields.
Contribution
It proposes DSOS and SDSOS as LP and SOCP-based alternatives to SOS, extending scalability while maintaining key theoretical properties.
Findings
Numerical experiments show scalability beyond traditional SOS methods.
Tradeoffs between accuracy and computational efficiency are demonstrated.
Theoretical results extend SOS properties to DSOS and SDSOS.
Abstract
In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization.…
| n = 10 | n = 15 | n = 20 | n = 25 | n = 30 | n = 40 | n = 50 | n = 60 | n = 70 | |
| DSOS | -5.31 | -10.96 | -18.012 | -26.45 | -36.85 | -62.30 | -94.26 | -133.02 | -178.23 |
| SDSOS | -5.05 | -10.43 | -17.33 | -25.79 | -36.04 | -61.25 | -93.22 | -131.64 | -176.89 |
| 1-DSOS | -4.96 | -9.22 | -15.72 | -23.58 | NA | NA | NA | NA | NA |
| 1-SDSOS | -4.21 | -8.97 | -15.29 | -23.14 | NA | NA | NA | NA | NA |
| SOS | -1.92 | -3.26 | -3.58 | -3.71 | NA | NA | NA | NA | NA |
| BARON | -175.41 | -1079.89 | -5287.88 | - | -28546.1 | - | - | - | - |
| n = 10 | n = 15 | n = 20 | n = 25 | n = 30 | n = 40 | n = 50 | n = 60 | n = 70 | |
| DSOS | 0.30 | 0.38 | 0.74 | 15.51 | 7.88 | 10.68 | 25.99 | 58.10 | 342.76 |
| SDSOS | 0.27 | 0.53 | 1.06 | 8.72 | 5.65 | 18.66 | 47.90 | 109.54 | 295.30 |
| 1-DSOS | 0.92 | 6.26 | 37.98 | 369.08 | |||||
| 1-SDSOS | 1.53 | 14.39 | 82.30 | 538.5389 | |||||
| SOS | 0.24 | 5.60 | 82.22 | 1068.66 | |||||
| BARON | 0.35 | 0.62 | 3.69 | - | - | - | - | - | - |
| Polya LP | r-DSOS | r-SDSOS | SOS | |
| 6.000 | 6.000 | 3.2362 | ||
| 4.333 | 4.333 | NA | ||
| 6.000 | 3.8049 | 3.6964 | NA |
| DSOS | SDSOS | SOS | |
|---|---|---|---|
| 35.11 | 33.92 | 21.28 | |
| 14.86 | 12.94 | NA |
| Exact | SDP | SDP | SDP | SDDP | DDP | |
| MOSEK | SDPNAL+ (tol=) | SDPNAL+ (tol=) | ||||
| 21.51 | 21.51 | 21.72 | 21.51 | 21.51 | 132.63 | |
| 17.17 | 17.17 | 18.07 | 17.17 | 17.17 | 132.63 | |
| 13.20 | 13.20 | 12.37 | 13.20 | 13.20 | 132.63 | |
| 9.84 | 9.84 | 9.66 | 9.84 | 9.85 | 132.63 | |
| 7.30 | 7.30 | 7.90 | 7.30 | 7.30 | 132.63 |
| SDP | SDP | SDP | SDDP | DDP | |
| MOSEK | SDPNAL+ (tol=) | SDPNAL+ (tol=) | |||
| Upper bound | 18.76 | 18.88 | 18.88 | 19.42 | 252.24 |
| Running time | 2502.6 s | 2117.3 s | 6221.7 s | 24.36 s | 11.85 s |
| expl. var. | |||||||||||
| PCA, PC1 | 0.116 | 0.116 | 0.116 | 0.116 | -0.395 | -0.395 | -0.395 | -0.395 | -0.401 | -0.401 | 60.0 % |
| PCA, PC2 | -0.478 | -0.478 | -0.478 | -0.478 | -0.145 | -0.145 | -0.145 | -0.145 | 0.010 | 0.010 | 39.6 % |
| Other, PC1 | 0 | 0 | 0 | 0 | -0.5 | -0.5 | -0.5 | -0.5 | 0 | 0 | 40.9 % |
| Other, PC2 | 0.5 | 0.5 | 0.5 | 0.5 | 0 | 0 | 0 | 0 | 0 | 0 | 39.5 % |
| DDP | SDDP | DSPCA | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. | Time (s) | NNZ | Opt. | Expl. | Time (s) | NNZ | Opt. | Expl. | Time (s) | NNZ | Opt. | Expl. |
| 1 | 0.89 | 4 | 35.5 | 0.067 % | 1.40 | 14 | 30.6 | 0.087 % | 1296.9 | 5 | 30.5 | 0.086 % |
| 2 | 1.18 | 4 | 36.6 | 0.071 % | 1.42 | 14 | 34.4 | 0.099 % | 1847.3 | 6 | 33.6 | 0.092 % |
| 3 | 1.46 | 4 | 49.5 | 0.079 % | 1.40 | 24 | 41.3 | 0.097 % | 1633.0 | 16 | 40.2 | 0.096 % |
| 4 | 1.14 | 4 | 44.1 | 0.072 % | 1.80 | 17 | 38.7 | 0.10 % | 1984.7 | 8 | 37.6 | 0.091 % |
| 5 | 1.10 | 4 | 36.7 | 0.060 % | 1.53 | 34 | 33.0 | 0.068 % | 2179.6 | 10 | 31.7 | 0.105 % |
| 2N ( states) | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20 | 22 |
|---|---|---|---|---|---|---|---|---|---|---|
| DSOS | 0.44 | 2.04 | 3.08 | 9.67 | 25.1 | 74.2 | 200.5 | 492.0 | 823.2 | |
| SDSOS | 0.72 | 6.72 | 7.78 | 25.9 | 92.4 | 189.0 | 424.74 | 846.9 | 1275.6 | |
| SOS (SeDuMi) | 3.97 | 156.9 | 1697.5 | 23676.5 | ||||||
| SOS (MOSEK) | 0.84 | 16.2 | 149.1 | 1526.5 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newsiamremark
remarkRemark \newsiamremarkhypothesisHypothesis
\newsiamthmclaimClaim \headersDSOS and SDSOS OptimizationA. A. Ahmadi and A. Majumdar
\newsiamthmExampleExample
DSOS and SDSOS Optimization:
More Tractable Alternatives to
Sum of Squares and Semidefinite Optimization††thanks: An extended abstract for this work has appeared in [4]. \funding Amir Ali Ahmadi was partially supported by the DARPA Young Faculty Award, the Sloan Fellowship, the NSF CAREER Award, the AFOSR Young Investigator Program Award, and the Google Research Award.
Amir Ali Ahmadi Department of Operations Research and Financial Engineering, Princeton University. (, http://aaa.princeton.edu/). [email protected]
Anirudha Majumdar Department of Mechanical and Aerospace Engineering, Princeton University. (, https://irom-lab.princeton.edu/majumdar/). [email protected]
Abstract
In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization. Furthermore, we show with numerical experiments from diverse application areas—polynomial optimization, statistics and machine learning, derivative pricing, and control theory—that with reasonable tradeoffs in accuracy, we can handle problems at scales that are currently significantly beyond the reach of traditional sum of squares approaches. Finally, we provide a review of recent techniques that bridge the gap between our DSOS/SDSOS approach and the SOS approach at the expense of additional running time. The Supplementary Material of the paper introduces an accompanying MATLAB package for DSOS and SDSOS optimization.
keywords:
Sum of squares optimization, polynomial optimization, nonnegative polynomials, semidefinite programming, linear programming, second order cone programming.
{AMS}
65K05, 90C25, 90C22, 90C05, 90C90, 12Y05, 93C85, 90C27
1 Introduction
For which values of the real coefficients is the polynomial
[TABLE]
nonnegative, i.e., satisfies for all ? The problem of optimizing over nonnegative polynomials—of which our opening question is a toy example—is fundamental to many problems of applied and computational modern mathematics. In such an optimization problem, one would like to impose constraints on the coefficients of an unknown polynomial so as to make it nonnegative, either globally in (as in the above example), or on a certain basic semialgebraic set, i.e., a subset of defined by a finite number of polynomial inequalities. We will demonstrate shortly (see Section 1.1) why optimization problems of this kind are ubiquitous in applications and universal in the study of questions dealing in one way or another with polynomial equations and inequalities.
Closely related to nonnegative polynomials are polynomials that are sums of squares. We say that a polynomial is a sum of squares (sos) if it can be written as for some (finite number of) polynomials . For example, the polynomial in (1) with is sos since it admits the decomposition
[TABLE]
The relationship between nonnegative and sum of squares polynomials has been a classical subject of study in real algebraic geometry. For example, a result of Hilbert from 1888 [47] states that all nonnegative bivariate polynomials of degree four are sums of squares. It follows, as a special case, that the sets of coefficients for which the polynomial in (1) is nonnegative or a sum of squares in fact coincide. In the general situation, however, while sum of squares polynomials are clearly always nonnegative, it is not true that the converse always holds. This was shown in the same paper of Hilbert [47], where he gave a non-constructive proof of existence of nonnegative polynomials that are not sums of squares. Explicit examples of such polynomials appeared many years later, starting with the work of Motzkin [70] in the 1960s. Hilbert’s interest in this line of research is also showcased in his 17th problem, which asks whether every nonnegative polynomial is a sum of squares of rational functions. We refer the interested reader to an outstanding survey paper of Reznick [88], which covers many historical aspects around Hilbert’s 17th problem, including the affirmative solution by Artin [8], as well as several later developments.
The classical questions around nonnegative and sum of squares polynomials have been revisited quite extensively in the past 10-15 years in different communities among applied and computational mathematicians. The reason for this renewed interest is twofold: (i) the discovery that many problems of modern practical interest can be cast as optimization problems over nonnegative polynomials, and (ii) the observation that while optimizing over nonnegative polynomials is generally NP-hard, optimization over the set of sum of squares polynomials can be done via semidefinite programming (SDP); see Theorem 2.1 in Section 2. The latter development, originally explored in the pioneering works of Shor [96], Nesterov [72], Parrilo [78],[79], and Lasserre [54], has led to the creation of sum of squares optimization—a computational framework, with semidefinite programming as its underlying engine, that can tackle many fundamental problems of real algebra and polynomial optimization.
The dependence of sum of squares approaches on semidefinite programming can be viewed as both a strength and a weakness depending on one’s perspective. From a computational complexity viewpoint, semidefinite programs can be solved with arbitrary accuracy in polynomial time using interior point methods (see [102] for a comprehensive survey). As a result, sum of squares techniques offer polynomial time algorithms that approximate a very broad class of NP-hard problems of interest. From a more practical viewpoint however, SDPs are among the most expensive convex relaxations to solve. The speed and reliability of the current SDP solvers lag behind those for other more restricted classes of convex programs (such as linear or second order cone programs) by a wide margin. With the added complication that the SDPs generated by sos problems are large (see Section 2), scalability has become the single most outstanding challenge for sum of squares optimization in practice.
In this paper, we focus on the latter of the two viewpoints mentioned above and offer alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. While these alternatives are more conservative in general, they are significantly more scalable. Our hope is that the proposed approaches will expand the use and applicability of algebraic techniques in optimization to new areas and share its appeal with a broader audience. We call our new computational frameworks, which rely on linear and second order cone programming, DSOS and SDSOS optimization111We use and recommend the pronunciation “d-sauce” and “s-d-sauce”.. These are short for diagonally-dominant-sum-of-squares and scaled-diagonally-dominant-sum-of-squares; see Section 3 for precise definitions. While these tools are primarily designed for sum of squares optimization, they are also applicable to general applications of semidefinite programming where tradeoffs between scalability and performance may be desirable. In the interest of motivating our contributions for a diverse audience, we delay a presentation of these contributions until Section 3 and start instead with a portfolio of problem areas involving nonnegative polynomials. Any such problem area is one to which sum of squares optimization, as well as its new DSOS and SDSOS counterparts, are directly applicable.
1.1 Why optimize over nonnegative polynomials?
We describe several motivating applications in this section at a high level. These will be revisited later in the paper with concrete computational examples; see Section 4.
Polynomial optimization
A polynomial optimization problem (POP) is an optimization problem of the form
[TABLE]
where , , and are multivariate polynomials. The special case of problem (2) where the polynomials all have degree one is of course linear programming, which can be solved very efficiently. For higher degrees, POP contains as special cases many important problems in operations research; e.g., the optimal power flow problem in power engineering [49], the computation of Nash equilibria in game theory [53], [80], problems of Euclidean embedding and distance geometry [61], and a host of problems in combinatorial optimization. We observe that if we could optimize over the set of polynomials that are nonnegative on a closed basic semialgebraic set, then we could solve the POP problem to global optimality. To see this, note that the optimal value of problem (2) is equal to the optimal value of the following problem:
[TABLE]
Here, we are trying to find the largest constant such that the polynomial is nonnegative on the set ; i.e., the largest lower bound on problem (2).
Combinatorial optimization
Proofs of nonnegativity of polynomials of degree as low as four provide infeasibility certificates for all decision problems in the complexity class NP, including many well-known combinatorial optimization problems. Consider, e.g., the simple-to-describe, NP-complete problem of PARTITION [37]: Given a set of integers , decide if they can be split into two sets with equal sums. It is straightforward to see that a PARTITION instance is infeasible if and only if the degree-four polynomial
[TABLE]
satisfies , . Indeed, is by construction a sum of squares and hence nonnegative. If it were to have a zero, each square in would have to evaluate to zero; but this can happen if and only if there is a binary vector , which makes , giving a yes answer to the PARTITION instance. Suppose now that for a given instance, and for some , we could prove that is nonnegative. Then we would have proven that our PARTITION instance is infeasible, potentially without trying out all ways of splitting the integers into two sets.
Control systems and robotics
Numerous fundamental problems in nonlinear dynamics and control, such as stability, robustness, collision avoidance, and controller design can be turned into problems about finding special functions—the so-called Lyapunov functions (see, e.g., [51])—that satisfy certain sign conditions. For example, given a differential equation , where , and with the origin as an equilibrium point (i.e., satisfying ), consider the “region of attraction (ROA) problem”: Determine the set of initial states in from which the trajectories flow to the origin. Lyapunov’s stability theorem (see, e.g., [51, Chap. 4]) tells us that if we can find a (Lyapunov) function , which together with its gradient satisfies
[TABLE]
then the set is part of the region of attraction. If is a polynomial function (a very important case in applications [78]), and if we parameterize as a polynomial function, then the search for the coefficients of satisfying the conditions in (4) is an optimization problem over the set of nonnegative polynomials. Designing stable equilibrium points with large regions of attraction is a fundamental problem in control engineering and robotics; see, e.g., [50], [65].
Statistical regression with shape constraints
A problem that arises frequently in statistics is that of fitting a function to a set of data points with minimum error, while ensuring that it meets some structural properties, such as nonnegativity, monotonicity, or convexity [41], [42]. Requirements of this type are typically imposed either as regularizers (to avoid overfitting), or more importantly as a result of prior knowledge that the true function to be estimated satisfies the same structural properties. In economics, for example, a regression problem for estimating the utility of consumers from sample measurements would come with a natural concavity requirement on the utility function. When the regression functions are polynomials, many such structural properties lead to polynomial nonnegativity constraints. For example, a polynomial can be constrained to be concave by requiring its Hessian matrix , which is an symmetric matrix with polynomial entries, to be negative semidefinite for all . This condition is easily seen to be equivalent to the polynomial in the variables and being nonnegative.
Probability bounds given moments
The well-known inequalities of Markov and Chebyshev in probability theory bound the probability that a univariate random variable takes values in a subset of the real line given the first, or the first and second moments of its distribution [13]. Consider a vast generalization of this problem where one is given a finite number of moments of a multivariate multivariate random variable and is interested in bounding the probability of an event , where is a general basic semialgebraic subset of the sample space . A sharp upper bound on this probability can be obtained by solving the following optimization problem for the coefficients of an -variate degree- polynomial (see [17], [84]):
[TABLE]
It is easy to see that any feasible solution of problem (5) gives an upper bound on the probability of the event as the polynomial is by construction placed above the indicator function of the set and the expected value of this indicator function is precisely the probability of the event . Note that the constraints in (5) are polynomial nonnegativity conditions.
Other applications
The application areas outlined above are only a few examples in a long list of problems that can be tackled by optimizing over the set of nonnegative polynomials. Other interesting problems on this list include: computation of equilibria in games with continuous strategy spaces [80], distinguishing separable and entangled states in quantum information theory [33], computing bounds on sphere packing problems in geometry [9], software verification [92], volume computation [45], robust and stochastic optimization [15], filter design [99], combinatorics [86], [40], and automated theorem proving [44]. A great reference for many applications in this area, as well as the related theoretical and computational background, is a recent volume edited by Blekherman, Parrilo, and Thomas [18].
Organization of the paper
The remainder of this paper is structured as follows. In Section 2, we briefly review the sum of squares approach and its relation to semidefinite programming for the general reader. We also comment on its challenges regarding scalability and prior work that has tackled this issue. The familiar reader can skip to Section 3, where our contributions begin. In Section 3.1, we introduce the cones of dsos and sdsos polynomials and demonstrate that we can optimize over them using linear and second order cone programming respectively. Section 3.2 introduces hierarchies of cones based on dsos and sdsos polynomials that can better approximate the cone of nonnegative polynomials. Asymptotic guarantees on these cones are also presented. Similar to the sum of squares hierarchy, these guarantees can be turned into a hierarchy of converging lower bounds for polynomial optimization problems that use linear or second order cone programming in each step.
Section 4 presents numerical experiments that use our approach on large-scale examples from polynomial optimization (Section 4.1), combinatorial optimization (Section 4.2), statistics and machine learning (Sections 4.3 and 4.5), financial mathematics (Section 4.4), and control theory and robotics (Section 4.6). Section 5 provides a review of recent techniques for bridging the gap between the DSOS/SDSOS approach and the SOS approach at the expense of additional computational costs. Section 6 concludes the paper. The Supplementary Material introduces iSOS (“inside SOS”), an accompanying MATLAB package for DSOS and SDSOS optimization written using the SPOT toolbox [68].
2 Review of the semidefinite programming-based approach and computational considerations
As mentioned earlier, sum of squares optimization handles a (global) nonnegativity constraint on a polynomial by replacing it with the more restrictive requirement that be a sum of squares. The situation where is only constrained to be nonnegative on a certain basic semialgebraic set222In this formulation, we have avoided equality constraints for simplicity. Obviously, there is no loss of generality in doing this as an equality constraint can be imposed by the pair of inequality constraints .
[TABLE]
can also be handled with the help of appropriate sum of squares multipliers. For example, if we succeed in finding sos polynomials , such that
[TABLE]
then we have found a certificate of nonnegativity of on the set . Indeed, if we evaluate the above expression at any , nonnegativity of the polynomials imply that . A Positivstellensatz from real algebraic geometry due to Putinar [85] states that if the set satisfies the so-called Archimedean property, a property only slightly stronger than compactness (see, e.g., [58] for a precise definition), then every polynomial positive on has a representation of the type (6), for some sos polynomials of high enough degree (see also [74] for degree bounds). Even with no qualifications whatsoever regarding the set , there are other Positivstellensätze (e.g., due to Stengle [97]) that certify nonnegativity of a polynomial on a basic semialgebraic set using sos polynomials. These certificates are only slightly more complicated than (6) and involve sos multipliers associated with products among polynomials that define [79]. A great reference for the interested reader is the survey paper by Laurent [58].333Our review material in this section does not do justice to the “dual” theory of generalized moment problems; in addition to the survey paper by Laurent, the interested reader is referred to the recent textbooks by Lasserre [55], [56] and references therein.
The computational advantage of a certificate of (global or local) nonnegativity via sum of squares polynomials is that it can be automatically found by semidefinite programming. The following well-known theorem establishes the link between sos polynomials and SDP. Let us denote the set of symmetric matrices by . Recall that a matrix is positive semidefinite (psd) if , and that semidefinite programming is the problem of optimizing a linear function over psd matrices subject to affine inequalities on their entries [102]. We denote the positive semidefiniteness of a matrix with the standard notation .
Theorem 2.1** (see, e.g., [78],[79]).**
A multivariate polynomial in variables and of degree is a sum of squares if and only if there exists a symmetric matrix (often called the Gram matrix) such that
[TABLE]
where is the vector of monomials of degree up to :
[TABLE]
Searching for a matrix satisfying the positive semidefiniteness constraint, as well as the linear equality constraints coming from (7), amounts to solving a semidefinite program. The size of the matrix in this theorem is
[TABLE]
which approximately equals . While this number is polynomial in for fixed , it can grow rather quickly even for low-degree polynomials. For example, a degree- polynomial () in variables has 316251 coefficients and its Gram matrix, which would need to be positive semidefinite, contains 879801 decision variables. A semidefinite constraint of this size is quite expensive, and in fact a problem with variables is well beyond the current realm of possibilities in SOS optimization. In the absence of problem structure, sum of squares problems involving degree- or polynomials are currently limited, roughly speaking, to a handful or a dozen variables.
There have been many contributions already to improvements in scalability of sum of squares techniques. One approach has been to develop systematic techniques for taking advantage of problem structure, such as sparsity or symmetry of the underlying polynomials, to reduce the size of the SDPs [38, 101, 28, 90]. These techniques have proven to be very useful as problems arising in practice sometimes do come with a lot of structure. Another approach which holds promise has been to design customized solvers for SOS programs that avoid resorting to an off-the-shelf interior point solver. These techniques can often lead to improvements for special classes of problems. Examples in this direction include the works in [14, 75, 104, 60, 105, 103, 76, 46]. There has also been recent work by Lasserre et al. that increases scalability of sum of squares optimization problems at the cost of accuracy of the solutions obtained. This is done by bounding the size of the largest SDP constraint appearing in the sos formulation, and leads to what the authors call the BSOS (bounded SOS) hierarchy [57].
The approach we take in this paper for enhancing scalability is orthogonal to the ones mentioned above (and can potentially later be combined with them). We propose to eschew sum of squares decompositions to begin with. In our view, almost all application areas of this field use sum of squares decompositions not for their own sake, but because they provide a means to polynomial nonnegativity. Hence, this paper is motivated by a natural question: Can we give other sufficient conditions for polynomial nonnegativity that are perhaps more restrictive than a sum of squares decomposition, but cheaper to work with?444Given this motivation, we note that our numerical experiments in Section 4 compare our approach primarily with the standard and widely-used semidefinite programming-based approach to sum of squares optimization. They are certainly not meant to be exhaustive. We believe that comparisons with the special-purpose techniques highlighted above and with other solvers would be interesting but are better suited for a separate work given our space limitations.
In the next section, we identify some subsets of the cone of nonnegative polynomials that one can optimize over using linear programming (LP) and second order cone programming (SOCP) [62], [7]. Not only are these much more efficiently solvable classes of convex programs, but they are also superior to semidefinite programs in terms of numerical conditioning. In addition, working with these classes of convex programs allows us to take advantage of high-performance LP and SOCP solvers (e.g., CPLEX [25]) that have been matured over several decades because of industry applications. We remark that while there have been other approaches to produce LP hierarchies for polynomial optimization problems (e.g., based on the Krivine-Stengle certificates of positivity [52, 97, 55]), these LPs, though theoretically significant, are typically quite weak in practice and often numerically ill-conditioned [57]. Finally, we remark that for a range of combinatorial optimization problems, there exist interesting algebraic approaches that do not rely on SDPs and only involve linear algebraic operations [30], [29]. While our techniques are similar in the spirit of avoiding SDPs, they are not tied to combinatorial (or polynomial) optimization problems.
3 DSOS and SDSOS Optimization
We start with some basic definitions. A monomial in variables is a function of the form , where each is a nonnegative integer. The degree of such a monomial is by definition . A polynomial is a finite linear combination of monomials . The degree of a polynomial is the maximum degree of its monomials. A form or a homogeneous polynomial is a polynomial whose monomials all have the same degree. For any polynomial of degree , one can define its homogenized version by introducing a new variable and defining . One can reverse this operation (dehomogenize) by setting the homogenizing variable equal to one: . The properties of being nonnegative and a sum of squares (as defined earlier) are preserved under homogenization and dehomogenization [88]. We say that a form is positive definite if for all . We denote by and the set of nonnegative (a.k.a. positive semidefinite) and sum of squares polynomials in variables and degree respectively. (Note that odd-degree polynomials can never be nonnegative.) Both these sets form proper (i.e., convex, closed, pointed, and solid) cones in with the obvious inclusion relationship .
The basic idea behind our approach is to approximate from the inside with new sets that are more tractable for optimization purposes. Towards this goal, one may think of several natural sufficient conditions for a polynomial to be a sum of squares. For example, the following may be natural first candidates:
- •
The set of polynomials that are sums of 4-th powers of polynomials:
,
- •
The set of polynomials that are a sum of a constant number of squares of polynomials; e.g., a sum of three squares:
Even though both of these sets clearly reside inside the set of sos polynomials, they are not any easier to optimize over. In fact, they are much harder: testing whether a polynomial is a sum of 4-th powers is NP-hard already for quartics [48] (in fact, the cone of 4-th powers of linear forms is dual to the cone of nonnegative quartic forms [89]) and optimizing over polynomials that are sums of three squares is intractable (as this task even for quadratics subsumes the NP-hard problem of positive semidefinite matrix completion with a rank constraint [59]). These examples illustrate that in general, an inclusion relationship does not imply anything with respect to optimization complexity. Hence, we need to take some care when choosing which subsets of to work with. On the one hand, these subsets have to be “big enough” to be useful in practice; on the other hand, they should be more tractable to optimize over.
3.1 The cones of dsos and sdsos polynomials
We now describe two interesting cones inside that lend themselves to linear and second order cone representations and are hence more tractable for optimization purposes. These cones will also form the building blocks of some more elaborate cones (with improved performance) that we will present in Subsection 3.2 and Section 5.
Definition 3.1**.**
A polynomial is diagonally-dominant-sum-of-squares (dsos) if it can be written as
[TABLE]
*for some monomials and some nonnegative scalars . We denote the set of polynomials in variables and degree that are dsos by . *
Definition 3.2**.**
A polynomial is scaled-diagonally-dominant-sum-of-squares (sdsos) if it can be written as
[TABLE]
*for some monomials and some scalars with . We denote the set of polynomials in variables and degree that are sdsos by . *
From the definition, the following inclusion relationships should be clear:
[TABLE]
In general, all these containment relationships are strict. Figure 1 shows these sets for a family of bivariate quartics parameterized by two coefficients :555It is well known that all psd bivariate forms are sos [88] and hence the outer set exactly characterizes all values of and for which this polynomial is nonnegative.
[TABLE]
Some remarks about the two definitions above are in order. It is not hard to see that if has degree , the monomials in (8) or (9) never need to have degree higher than . In the special case where is homogeneous of degree (i.e., a form), the monomials can be taken to have degree exactly . We also note that decompositions given in (8) or (9) are not unique; there can even be infinitely many such decompositions but the definition requires just one. Finally, we have found out that interestingly, sdsos polynomials were studied in the 1970’s and 1980’s for purely algebraic reasons by Robinson [91], Reznick [87], and Choi, Lam, and Reznick [24]. These papers use the more descriptive terminology of “sum of binomial squares” (sbs), which is consistent with the decomposition in (9). In [87] for example, Reznick characterizes interesting families of forms that are sos if and only if they are sbs (i.e., sdsos). In [24], Choi, Lam, and Reznick provide a complete description of sbs forms and their extremal elements among the family of even symmetric sextics. In the process, they answer questions that Robinson had raised earlier [91] about sbs polynomials.
Our terminology in Definitions 3.1 and 3.2 comes from a connection (which we will soon establish) to the following classes of symmetric matrices.
Definition 3.3**.**
A symmetric matrix is diagonally dominant (dd) if
[TABLE]
*for all . A symmetric matrix is scaled diagonally dominant (sdd) if there exists a diagonal matrix , with positive diagonal entries, such that is dd.666The requirement that the diagonal matrix have positive diagonal entries can be replaced with it being nonsingular without changing the definition. Checking whether a given matrix is sdd can be done by solving a linear program since the definition is equivalent to existence of an element-wise positive vector such that
We denote the set of dd and sdd matrices with and respectively. *
It follows from Gershgorin’s circle theorem [39] that diagonally dominant matrices are positive semidefinite. This implies that scaled diagonally dominant matrices are also positive semidefinite as the eigenvalues of have the same sign as those of . Hence, if we denote the set of positive semidefinite matrices by , the following inclusion relationships are evident:
[TABLE]
These containments are strict for . In Figure 2, we illustrate the set of and for which the matrix
[TABLE]
is diagonally dominant, scaled diagonally dominant, and positive semidefinite. Here, is the identity matrix and the symmetric matrices and were generated randomly with iid entries sampled from the standard normal distribution. Our interest in these inner approximations to the set of psd matrices stems from the fact that optimization over them can be done by linear and second order cone programming respectively (see Theorem 3.12 below, which is more general). For now, let us relate these matrices back to dsos and sdsos polynomials.
Theorem 3.4**.**
*A polynomial of degree is dsos if and only if it admits a representation as , where is the standard monomial vector of degree and is a dd matrix. *
The following lemma by Barker and Carlson provides an extreme ray characterization of the set of diagonally dominant matrices and will be used in the proof of the above theorem.
Lemma 3.5** (Barker and Carlson [11]).**
Let be the set of all nonzero vectors in with at most nonzero components, each equal to . Let be the set of all rank-one matrices of the form for some Then, a symmetric matrix is diagonally dominant if and only if it can be written as
[TABLE]
*for some . *
Proof 3.6** (Proof of Theorem 3.4).**
Suppose first that with diagonally dominant. Lemma 3.5 implies that
[TABLE]
where and is the set of all nonzero vectors in with at most nonzero components, each equal to . In this sum, the vectors that have only one nonzero entry lead to the first sum in the dsos expansion (8). The vectors with two s or two s lead to the second sum, and those with one and one lead to the third.
For the reverse direction, suppose is dsos, i.e., has the representation in (8). We will show that
[TABLE]
*where each is dd and corresponds to a single term in the expansion (8). As the sum of dd matrices is dd, this would establish the proof. For each term of the form we can take to be a matrix of all zeros except for a single diagonal entry corresponding to the monomial in , with this entry equaling . For each term of the form , we can take to be a matrix of all zeros except for four entries corresponding to the monomials and in . All these four entries will be set to . Similarly, for each term of the form , we can take to be a matrix of all zeros except for four entries corresponding to the monomials and in . In this submatrix, the diagonal elements will be and the off-diagonal elements will be . Clearly, all the matrices we have constructed are dd. *
Theorem 3.7**.**
*A polynomial of degree is sdsos if and only if it admits a representation as , where is the standard monomial vector of degree and is a sdd matrix. *
Proof 3.8**.**
Suppose first that with scaled diagonally dominant. Then, there exists a diagonal matrix , with positive diagonal entries, such that is diagonally dominant and
[TABLE]
Now an argument identical to that in the first part of the proof of Theorem 3.4 (after replacing with ) gives the desired sdsos representation in (9).
For the reverse direction, suppose is sdsos, i.e., has the representation in (9). We show that
[TABLE]
where each is sdd and corresponds to a single term in the expansion (9). As the sum of sdd matrices is sdd,777This claim is not obvious from the definition but is apparent from Lemma 3.10 below which implies that is a convex (in fact, proper) cone. this would establish the proof. Indeed, for each term of the form we can take (once again) to be a matrix of all zeros except for a single diagonal entry corresponding to the monomial in , with this entry equaling .
For each term of the form , we can take to be a matrix of all zeros except for four entries corresponding to the monomials and in . This block will then be . Similarly, for each term of the form , we can take to be a matrix of all zeros except for four entries corresponding to the monomials and in . This block will then be . It remains to show that a rank-1 positive semidefinite matrix is sdd.888This also implies that any positive semidefinite matrix is sdd. Let be such a matrix. Observe that
[TABLE]
*is dd and hence is by definition sdd. (Note that if or are zero, then is already dd.) *
The following characterizations of sdd matrices will be important for us.
Theorem 3.9** (see theorems 8 and 9 by Boman et al. [19]).**
*A symmetric matrix is sdd if and only if it has “factor width” at most 2; i.e., a factorization , where each column of has at most two nonzero entries. *
Lemma 3.10**.**
A symmetric matrix is sdd if and only if it can be expressed as
[TABLE]
*where each is an matrix with zeros everywhere except for four entries , , , , which make the matrix symmetric and positive semidefinite. *
Proof 3.11**.**
*This is almost immediate from Theorem 3.9: If for some matrix , then where denotes the -th column of . Since each column has at most two nonzero entries, say in positions and , each matrix will be exactly of the desired form in the statement of the lemma. Conversely, if can be written as in (11), then we can write each as where the vectors and have at most two nonzero entries. If we then construct a matrix which has the collection of the vectors and as columns, we will have *
Theorem 3.12**.**
*For any fixed , optimization over (resp. ) can be done with a linear program (resp. second order cone program) of size polynomial in . *
Proof 3.13**.**
We will use the characterizations of dsos/sdsos polynomials given in theorems 3.4 and 3.7. In both cases, the equality can be imposed by a finite set of linear equations in the coefficients of and the entries of (these match the coefficients of with those of ). The constraint that be dd can be imposed, e.g., by a set of linear inequalities
[TABLE]
in variables and . This gives a linear program.
The constraint that be sdd can be imposed via Lemma 3.10 by a set of equality constraints that enforce equation (11). The constraint that each matrix be psd is a “rotated quadratic cone” constraint and can be imposed using SOCP [7, 62]:
[TABLE]
*In both cases, the final LP and SOCP are of polynomial size because the size of the Gram matrix is , which is polynomial in for fixed . *
We have written publicly-available code that automates the process of generating LPs (resp. SOCPs) from dsos (resp. sdsos) constraints on a polynomial. More information about this can be found in the Supplementary Material.
The ability to replace SDPs with LPs and SOCPs is what results in significant speedups in our numerical experiments (see Section 4). For the special case where the polynomials involved are quadratic forms, we get a systematic way of inner approximating semidefinite programs. A generic SDP that minimizes subject to the constraints and can be replaced by999We are grateful to Leo Liberti for suggesting the terminology “DDP and SDDP”.
- •
a diagonally dominant program (DDP); i.e., a problem of the form
[TABLE]
which is an LP, or
- •
a scaled diagonally dominant program (SDDP); i.e., a problem of the form
[TABLE]
which is an SOCP.
DDP and SDDP fit nicely within the framework of conic programming [12, Chap. 2] as they are optimization problems over the intersection of an affine subspace with a proper cone ( or ). The description of the associated dual cones can be found e.g. in [82], [2]. In Section 4.4 and Section 4.5, we show some applications of DDP/SDDP to problems in finance and statistics.
3.2 The cone of r-dsos and r-sdsos polynomials and asymptotic guarantees
We now present a hierarchy of cones based on the notions of dsos and sdsos polynomials that can be used to better approximate the cone of nonnegative polynomials.
Definition 3.14**.**
For an integer , we say that a polynomial is r-dsos (resp. r-sdsos) if
[TABLE]
*is dsos (resp. sdsos). We denote the set of polynomials in variables and degree that are r-dsos (resp. r-sdsos) by (resp. ). *
Note that for we recover our dsos/sdsos definitions. Moreover, because the multiplier is nonnegative, we see that for any , the property of being r-dsos or r-sdsos is a sufficient condition for nonnegativity:
[TABLE]
Optimizing over r-dsos (resp. r-sdsos) polynomials is still an LP (resp. SOCP). The proof of the following theorem is identical to that of Theorem 3.12 and hence omitted.
Theorem 3.15**.**
*For any fixed and , optimization over the set (respectively ) can be done with linear programming (resp. second order cone programming) of size polynomial in . *
If we revisit the parametric family of bivariate quartics in (10), the improvements obtained by going one level higher in the hierarchy are illustrated in Figure 3. Interestingly, for , r-dsos and r-sdsos tests for nonnegativity can sometimes outperform the sos test. The following two examples illustrate this.
{Example}
Consider the Motzkin polynomial, which historically is the first known example of a nonnegative polynomial that is not a sum of squares [70]. The following decomposition (found by solving an LP) shows that and thus that is nonnegative:
[TABLE]
This proves that .
{Example}
Consider the polynomial . Once again this polynomial is nonnegative but not a sum of squares [88]. The following decomposition, which was obtained by solving an LP, shows that and thus that is nonnegative:
[TABLE]
This proves that .
It is natural to ask whether every nonnegative polynomial is r-dsos (or r-sdsos) for some ? The following theorems (Theorem 3.16 and 3.20) deal with this question and provide asymptotic guarantees on r-dsos (and hence r-sdsos) hierarchies.
Theorem 3.16**.**
*Let be an even101010An even polynomial (resp. monomial) is a polynomial (resp. monomial) whose individual variables are raised only to even degrees. positive definite form. Then, there exists an integer for which is r-dsos (and hence r-sdsos). *
Proof 3.17**.**
A well-known theorem of Pólya [83] states that if a form is positive on the simplex (i.e., the set ), then there exists an integer such that all coefficients of
[TABLE]
are positive. Under the assumptions of the theorem, this implies that there is an for which the form and hence the form
[TABLE]
*have positive coefficients. But this means that is a nonnegative weighted sum of squared monomials and therefore clearly dsos (see (8)). (In a Gram matrix representation , the argument we just gave shows that we can take to be diagonal.) *
We remark that even forms already constitute a very interesting class of polynomials since they include, e.g., all polynomials coming from copositive programming; see Subsection 4.2. In fact, any NP-complete problem can be reduced in polynomial time to the problem of checking whether a degree-4 even form is nonnegative [71]. An application of this idea to the independent set problem in combinatorial optimization is presented in Subsection 4.2.
The next proposition shows that the evenness assumption cannot be removed from Theorem 3.16.
Proposition 3.18**.**
For any , the quadratic form
[TABLE]
*is positive definite but not r-sdsos for any . *
Proof 3.19**.**
Positive definiteness is immediate from having . We show that for the quadratic form is not r-sdsos for any by first presenting a general separating hyperplane for the cone of sdsos forms in any degree and dimension. For a form , construct a vector which has an entry per monomial in , with entry equal to if the -th monomial in is even and if it is not. We claim that if is sdsos, then the (standard) inner product of its coefficients with is nonnegative. This can be seen by noting that this inner product is equal to the value that a related sdsos form takes at the all ones vector. The form is obtained from by leaving its even monomials untouched and flipping the sign of the coefficients of its non-even monomials. It is straightforward to see (from the expansion in the definition of an sdsos polynomial in (9)) that constructed as such will be sdsos as well. Hence, .
*The latter claim of the proposition now follows from the observation that for any , the inner product of the coefficients of the form with the vector that we described above is negative when (and exactly when ). *
We remark that the form in (14) can be proved to be positive using the improvements presented in Section 5, in particular with a “factor width 3” proof system from Subsection 5.3. Alternatively, one can work with the next theorem, which removes the assumption of evenness from Theorem 3.16 at the cost of doubling the number of variables and the degree.
Theorem 3.20** (see Section 4 of [3]).**
An -variate form of degree is positive definite if and only if there exists a positive integer that makes the following form -dsos:
[TABLE]
In [3, Section 4], this theorem is used to construct LP and SOCP-based converging hierarchies of lower bounds111111As implied by Proposition 3.18, not every convergent SOS hierarchy has an analogous convergent (S)DSOS hierarchy; see [6, Section 2.2] for more detail.for the general polynomial optimization problem (2) when the feasible set is compact.
4 Numerical examples and applications
In this section, we consider several numerical examples that highlight the scalability of the approach presented in this paper. Comparisons to examples considered in the literature are provided whenever possible. A software package written using the Systems Polynomial Optimization Toolbox (SPOT) [68] includes a complete implementation of the presented methods and is available online121212Link to spotless_isos software package: https://github.com/anirudhamajumdar/spotless/tree/spotless_isos. The toolbox features efficient polynomial algebra and allows us to setup the large-scale LPs and SOCPs arising from our examples. Our Supplementary Material provides a brief introduction to the software package and the associated functionality required for setting up the DSOS/SDSOS programs considered in this paper. The LPs and SOCPs resulting from our programs, as well as the SDPs that are considered for comparison, are solved using the solver MOSEK [69] on a machine with 4 processors and a clock speed of 3.4 GHz and 16GB of RAM. While using special-purpose solvers could lead to speedups for any of the three methods (LP/SOCP/SDP) for specific problems, we believe that a widely-employed general-purpose solver such as MOSEK provides a strong benchmark. Further, for our example on options pricing (Section 4.4), we also present results using the SDPNAL+ solver [103]. As brought to our attention by a referee, the constraints in this example are particularly amenable to the semismooth Newton method employed by SDPNAL+. In our experiments, we have also tried the solver SeDuMi [98], which is arguably the most commonly-used free SDP solver. We do not report these run times however (except in Section 4.6.1) as they are often significantly slower than those produced by the SDP solver of MOSEK.
4.1 Lower bounds on polynomial optimization problems
An important application of sum of squares optimization is to obtain lower bounds on polynomial optimization problems (see e.g. [54], [81], [73]). In this subsection, we consider the particular problem of minimizing a homogeneous polynomial of degree on the unit sphere (i.e., the set ). This well-studied problem is strongly NP-hard even when . Its optimal value is easily seen to be equivalent to the optimal value of the following problem:
[TABLE]
By replacing the nonnegativity condition with a SOS/DSOS/SDSOS constraint, the resulting problem becomes an SDP/LP/SOCP. Tables 1 and 2 compare optimal values and running times respectively on problems of increasing size. We restrict ourselves to quartic forms and vary the number of variables between and . Table 1 compares the optimal values for the DSOS, SDSOS, 1-DSOS, 1-SDSOS and SOS relaxations of problem (15) on quartic forms whose coefficients are fully dense and drawn independently from the standard normal distribution (we consider a single random instance for each ). We note that the optimal values presented here are representative of numerical experiments we performed for a broader set of random instances (including coefficients drawn from a uniform distribution instead of the normal distribution). Also, in all our experiments, we could verify that the SOS bound actually coincides with the true global optimal value of the problem. This was done by finding a point on the sphere that produced a matching upper bound.
As the tables illustrate, both DSOS and SDSOS scale up to problems of dimension . This is well beyond the size of problems handled by SOS programming, which is unable to deal with problems with larger than due to memory (RAM) constraints. While the price we pay for this increased scalability is suboptimality, this is tolerable in many applications (as we show in other examples in this section).
Table 1 also compares the approach presented in this paper to the lower bound obtained at the root node of the branch-and-bound procedure in the popular global optimization package BARON [93].131313We thank Aida Khajavirad for running the BARON experiments for us. This comparison was only done up to dimension 30 since beyond this we ran into difficulties passing the large polynomials to BARON’s user interface.
Table 2 presents running times141414The BARON instances were implemented on a 64-bit Intel Xeon X5650 2.66Ghz processor using the CPLEX LP solver. averaged over random instances for each . There is a significant difference in running times between DSOS/SDSOS and SOS beginning at . While we are unable to run SOS programs beyond , DSOS and SDSOS programs for take only a few minutes to execute. Memory constraints prevent us from executing programs for larger than . The results indicate, however, that it may be possible to run larger programs within a reasonable amount of time on a machine equipped with more RAM.
4.2 Copositive programming and combinatorial optimization
A symmetric matrix is copositive if for all (i.e., for all in the nonnegative orthant). The problem of optimizing a linear function over affine sections of the cone of copositive matrices has recently received a lot of attention from the optimization community [35], since it can exactly model several problems of combinatorial and nonconvex optimization [20], [35], [23]. For instance, the size of the largest independent set 151515An independent set of a graph is a subset of its nodes no two of which are connected. The problem of finding upper bounds on has many applications in scheduling and coding theory [63]. of a graph on nodes is equal to the optimal value of the following copositive program [27]:
[TABLE]
where is the adjacency matrix of the graph, is the identity matrix, and is the matrix of all ones.
It is easy to see that a matrix is copositive if and only if the quartic form , with , is nonnegative. Hence, by requiring this form to be r-sos/r-sdsos/r-dsos, one obtains inner approximations to the cone that can be optimized over using semidefinite, second order cone, and linear programming. The sos version of this idea goes back to the PhD thesis of Parrilo [78].
In this section, we compare the quality of these bounds using another hierarchical LP-based method in the literature [20], [27]. The -th level of this LP hierarchy (referred to as the Pólya LP from here onwards) requires that the coefficients of the form be nonnegative. This is motivated by a theorem by Pólya, which states that this constraint will be satisfied for large enough (assuming for all nonzero ). It is not difficult to see that the Pólya LP is always dominated by the LP which requires to be r-dsos. This is because the latter requires the underlying Gram matrix to be diagonally dominant, while the former requires it to be diagonal, with nonnegative diagonal entries.
In [27], de Klerk and Pasechnik show that when the Pólya LP is applied to approximate the copositive program in (4.2), then the precise independent set number of the graph is obtained within steps of the hierarchy. By the previous argument, the same claim immediately holds for the r-dsos (and hence r-sdsos) hierarchies.
Table 3 revisits Example 5.2 of [20], where upper bounds on the independent set number of the complement of the graph of an icosahedron are computed. (This is a graph on 12 nodes with independent set number equal to 3; see [20] for its adjacency matrix.) As the results illustrate, SOS programming provides an exact upper bound (since the size of the largest stable set is necessarily an integer). The second levels of the r-dsos and r-sdsos hierarchies also provide an exact upper bound. In contrast, the LP based on Polya’s theorem in [20] gives the upper bound for the [math]-th and the -st level, and an upper bound of at the second level. In contrast to the Pólya LP, one can show that the [math]-th level of the r-dsos LP always produces a finite upper bound on the independent set number. In fact, this bound will always be smaller than , where is the degree of node [1, Theorem 5.1] .
4.3 Convex regression in statistics
Next, we consider the problem of fitting a function to data subject to a constraint on the function’s convexity. This is an important problem in statistics and has a wide domain of applications including value function approximation in reinforcement learning, options pricing and modeling of circuits for geometric programming based circuit design [42, 43]. Formally, we are given pairs of data points with , and our task is to find a convex function from a family that minimizes an appropriate notion of fitting error (e.g., error):
[TABLE]
The convexity constraint here can be imposed by requiring that the function in variables associated with the Hessian matrix of be nonnegative. Restricting ourselves to polynomial functions of bounded degree, we obtain the following optimization problem:
[TABLE]
where is again the Hessian of . As before, we can replace the nonnegativity constraint with a dsos/sdsos/sos constraint. For our numerical experiment, we generated random vectors in drawn i.i.d. from the standard normal distribution. The function values were computed as follows:
[TABLE]
where was chosen i.i.d. from the standard normal distribution. Tables 5 and 5 present the fitting errors and running times for the DSOS/SDSOS/SOS programs resulting from restricting the class of functions to polynomials of degree and . As the results illustrate, we are able to obtain significantly smaller errors with polynomials of degree using DSOS and SDSOS (compared to SOS with ), while the SOS program for does not run due to memory constraints.
4.4 Options pricing
An important problem in financial economics is that of determining the price of a derivative security such as an option given the price of the underlying asset. Significant progress was made in tackling this problem with the advent of the Black-Scholes formula, which operates under two assumptions: (i) the underlying stock price is governed by geometric Brownian motion, and (ii) there is no arbitrage. A natural question that has been considered in financial mathematics is to see if one can provide bounds on the price of an option in the no-arbitrage setting given much more minimal assumptions on the dynamics of the stock price. One approach to this question—studied, e.g., in [16], [22], —is to assume that we are only given the first moments of the stock price and want to optimally bound the price of the option.
More precisely, we consider that we are given stocks, an associated option with payoff function (which will typically depend on the strike price of the option), a vector of moment functions, , ( is assumed to be 1), and the corresponding vector of moments (here ). The problem of upper bounding the price of the option can then be formulated as follows [16]:
[TABLE]
Here, the expectation is taken over all Martingale measures , defined on . We consider here the case where one is given the mean and covariance matrix of the stock prices. Problem (17) can then be cast as the following problem (see [16, Section 6.2])
[TABLE]
which for many common functions of interest (see e.g. below) gives rise to a copositive program. A well-known and obvious sufficient condition for a matrix to be copositive is for it to have a decomposition , where is psd and is element-wise nonnegative [78]. This allows one to apply SDP to problem (18) and obtain upper bounds on the option price. By replacing the psd condition on the matrix with a dd/sdd condition, we obtain a DDP/SDDP.
We first compare the DDP/SDDP/SDP approaches on an example from [22, Section 4], which considers the problem of upper bounding the price of a European call option with underlying assets. The vector of means of the assets is and the covariance matrix is:
[TABLE]
Further, we have the following payoff function, which depends on the strike price :
[TABLE]
Table 6 compares the upper bounds for different strike prices using DDP/SDDP/SDP. For the SDP experiments, we present results using both MOSEK and SDPNAL+ [103]. For each strike price, the upper bound obtained using SDP (solved with MOSEK) is exact. This can be verified by finding a distribution that achieves the upper bound (i.e., finding a matching lower bound). For example, when , the distribution supported on the set of four points , , and in with probability masses , , , respectively does the job. As the table illustrates, the upper bounds obtained using SDPNAL+ are slightly numerically inaccurate for the default accuracy tolerance of , but are accurate with a tolerance of (this was the largest tolerance that resulted in bounds accurate to two decimal places). The upper bound obtained using SDDP is almost identical to the exact bound for each strike price. The DDP bound is loose, and interestingly does not change with the strike price. Running times for the different methods on this small example are negligible and hence not presented.
In order to demonstrate the scalability of DDP and SDDP, we consider a larger scale randomized example with underlying assets. The mean and covariance are taken to be the sample mean and covariance of instances of vectors belonging to with elements drawn uniformly and independently from the interval . The strike price is and the payoff function is again
[TABLE]
Table 7 compares upper bounds and running times for the different methods. SDDP provides us with a bound that is very close to the one obtained using SDP, with a significant speedup in computation (approximately a factor of 100 compared to MOSEK and a factor of 90 compared to SDPNAL+). The running time for DDP is comparable to SDDP, but the bound is not as tight. SDPNAL+ (tolerance = ) is faster as compared to MOSEK, but yields a slightly larger bound. In contrast to the smaller-scale example above, the bound does not change when the SDPNAL+ tolerance is set to .
4.5 Sparse PCA
Next, we consider the problem of sparse principal component analysis (sparse PCA). In contrast to standard PCA, where the principal components (PCs) in general depend on all the observed variables, the goal of sparse PCA is to identify principal components that only depend on small subsets of the variables [106]. While the statistical fidelity of the resulting representation of the data in terms of sparse PCs will in general be lower than the standard PCs, sparse PCs can significantly enhance interpretability of the results. This feature has proved to be very useful in applications such as finance and analysis of gene expressions; see, e.g., [26] and references therein.
Given a covariance matrix , the problem of finding sparse principal components can be written as the following optimization problem:
[TABLE]
Here, is the cardinality (number of non-zero entries) of and is a given threshold. The problem can be reformulated as the following rank-constrained matrix optimization problem [26]:
[TABLE]
In [26], the authors propose “DSPCA”, an SDP relaxation of this problem that is obtained by dropping the rank constraint and replacing the cardinality constraint with a constraint on the norm:
[TABLE]
The optimal value of problem (24) is an upper bound on the optimal value of (22). Further, when the solution to (24) has rank equal to one, the SDP relaxation is tight and the dominant eigenvector of is the optimal loading of the first sparse PC [26]. When a rank one solution is not obtained, the dominant eigenvector can still be retained as an approximate solution to the problem. Further sparse PCs can be obtained by deflating to obtain:
[TABLE]
and re-solving the SDP with in place of (and iterating this procedure).
The framework presented in this paper can be used to obtain LP and SOCP relaxations of (22) by replacing the constraint by the constraint or respectively, where and are the dual cones of and (see [2, Section 3.3] for a description of these dual cones). Since , we are guaranteed that the resulting optimal solutions will be upper bounds on the SDP solution. Further, as in the SDP case, the relaxation is tight when a rank-one solution is obtained, and once again, if a rank-one solution is not obtained, we can still use the dominant eigenvector as an approximate solution.
We first consider Example from [26]. In this example, there are three hidden variables distributed normally:
[TABLE]
These hidden variables generate observed variables:
[TABLE]
with for , for and for and independent for and . This knowledge of the distributions of the hidden and observed variables allows us to compute the exact covariance matrix for the observed variables. The sparse PCA algorithm described above can then be applied to this covariance matrix. Table 8 presents the first two principal components computed using standard PCA, DDP, SDDP and DSPCA (corresponding to SDP). As the table illustrates, the loadings corresponding to the first two PCs computed using standard PCA are not sparse. All other methods (DDP, SDDP, and SDP) give exactly the same answer and correspond to a rank-one solution, i.e., the optimal sparse solution. As expected, the sparsity comes at the cost of a reduction in the variance explained by the PCs (see [106] for computation of the explained variance).
Next, we consider a larger-scale example. We generate five random covariance matrices of rank . Table 9 presents the running times, number of non-zero entries (NNZ) in the top principal component161616Note that the entries of the PCs were thresholded lightly in order to remove spurious non-zero elements arising from numerical inaccuracies in the solvers., optimal value of the program (24) (Opt.), and the explained variance (Expl.). The optimal values for DDP and SDDP upper bound the SDP solution as expected, and are quite close in value. The number of non-zero entries and explained variances are comparable across the different methods. The running times are between and times faster for DDP in comparison to SDP, and between and times faster for SDDP. Hence the example illustrates that one can obtain a very large speedup with the approach presented here, with only a small sacrifice in quality of the approximate sparse PCs.
4.6 Applications in control theory
As a final application of the methods presented in this paper, we consider two examples from control theory and robotics. The applications of SOS programming in control theory are numerous and include the computation of regions of attraction of polynomial systems [78], feedback control synthesis [50], robustness analysis [100], and computation of the joint spectral radius for uncertain linear systems [77]. A thorough treatment of the application of the (S)DSOS approach to control problems is beyond the scope of this paper but can be found in a different paper [64], which is joint work with Tedrake. Here we briefly highlight two examples from [64] that we consider particularly representative.
Most of the applications in control theory mentioned above rely on SOS programming for checking Lyapunov inequalities that certify stability of a nonlinear system. As discussed in Section 1.1, one can compute (inner approximations of) the region of attraction (ROA) of a dynamical system by finding a Lyapunov function that satisfies the following conditions:
[TABLE]
This guarantees that the set is a subset of the ROA of the system, i.e. initial conditions that begin in this sublevel set converge to the origin. For polynomial dynamical systems, one can specify sufficient conditions for (25) using SOS programming [78]. By replacing the SOS constraints with (S)DSOS constraints, we can compute inner approximations to the ROA more efficiently. While the approximations will be more conservative in general, the ability to tradeoff conservatism with computation time and scalability is an important one in control applications.
4.6.1 Region of attraction for an inverted -link pendulum
As an illustration, we consider the problem of computing ROAs for the underactuated -link pendulum depicted in Figure 5. This system has states composed of the joint angles (with the vertical line) and their derivatives. There are control inputs (the joint closest to the base is not actuated). We take the unstable “upright” position of the system to be the origin of our state space and design a Linear Quadratic Regulator (LQR) controller in order to stabilize this equilibrium. A polynomial approximation of the dynamics of the closed loop system is obtained by a Taylor expansion of order . Using Lyapunov functions of degree 2 results in the time derivative of the Lyapunov function being quartic and hence yields dsos/sdsos/sos constraints on a polynomial of degree 4 in variables.
Figures 5(a) and 5(b) compare projections of the ROAs computed for the system with onto two dimensional subspaces of the state space. As the plots suggest, the ROA computed using SDSOS is only slightly more conservative than the ROA computed using SOS programming. In fact, comparing the volumes of the two ROAs, we find:
[TABLE]
Here, the volumes have been correctly rescaled in the standard manner by the dimension of the ambient space. The ROA computed using DSOS is much smaller (but still potentially useful in practice).
Comparing running times of the different approaches on this example, we find that the LP corresponding to the DSOS program takes 9.67 seconds, and the SOCP corresponding to the SDSOS program takes 25.9 seconds. The running time of the SOS program is 1526.5 seconds using MOSEK and 23676.5 seconds using SeDuMi. Hence, in particular DSOS is approximately times faster than SOS using SeDuMi and times faster than SOS using MOSEK, while SDSOS is times faster in comparison to SeDuMi and times faster than MOSEK for SOS.
Of course the real benefit of our approach is that we can scale to problems where SOS programming ceases to run due to memory/computation constraints. Table 10 illustrates this ability by comparing running times of the programs obtained using our approach with SOS programming for different values of (number of states). As expected, for cases where the SOS programs do run, the DSOS and SDSOS programs are significantly faster. Further, the SOS programs obtained for are too large to run (due to memory constraints). In contrast, our approach allows us to handle almost twice as many states.
4.6.2 Control synthesis for a humanoid robot
In our final example, we highlight a robotics application considered in joint work with Tedrake in [64]. Control of humanoid robots is an important problem in robotics and presents a significant challenge due to the nonlinear dynamics of the system and high dimensionality of the state space. Here we show how the approach described in this paper can be used to design a balancing controller for a model of the ATLAS robot shown in Figure 6. This robot was designed and built by Boston Dynamics Inc. and was used for the 2015 DARPA Robotics Challenge.
Our model of the robot is based on physical parameters of the hardware platform and has states and inputs. The task considered here is to balance the robot on its right toe. The balancing controller is constructed by searching for both a quadratic Lyapunov function and a linear feedback control law in order to maximize the size of the resulting region of attraction. The Lyapunov conditions are imposed using SDSOS programming. We Taylor expand the dynamics about the equilibrium to degree in order to obtain polynomial dynamics. The total computation time is approximately minutes. We note that SOS programming is unable to handle this system due to memory (RAM) constraints.
Figure 7 demonstrates the performance of the resulting controller from SDSOS programming by plotting initial configurations of the robot that are stabilized to the fixed point. As the plot illustrates, the controller is able to stabilize a very wide range of initial conditions. A video of simulations of the closed loop system started from different initial conditions is available online at http://youtu.be/lmAT556Ar5c.
5 Improvements on DSOS and SDSOS optimization
While DSOS and SDSOS techniques result in significant gains in terms of solving times and scalability, they inevitably lead to some loss in solution accuracy when compared to the SOS approach. In this section, we briefly outline three possible strategies to mitigate this loss. The first two (Sections 5.1 and 5.2) generate sequences of linear and second order cone programs, while the third (Section 5.3) works with “small” (fixed-size) semidefinite programs. The reader may recall that the r-DSOS and r-SDSOS hierarchies of Section 3.2 could also be used to improve on DSOS and SDSOS techniques. Like the first two methods that we present here, they did so while staying in the realm of LP and SOCP. They differ from these methods on two crucial points however: (i) they are more expensive to implement, and (ii) they approximate the cone of SOS polynomials “blindly” irrespective of a particular objective function. This is in contrast to the methods that we present in Section 5.1 and 5.2, which approximate the cone in the direction of a specific linear objective function.
For brevity of exposition, we explain how all three strategies can be applied to approximate a generic semidefinite program:
[TABLE]
A treatment tailored to the case of sum of squares programs can be found in the references we provide.
5.1 Iterative change of basis
In [2], Ahmadi and Hall build on the notions of diagonal and scaled diagonal dominance to provide a sequence of improving inner approximations to the cone of psd matrices in the direction of the objective function of an SDP at hand. The idea is simple: define a family of cones171717One can think of as the set of matrices that are dd after a change of coordinates via the matrix .
[TABLE]
parametrized by an matrix . Optimizing over the set is an LP since is fixed, and the defining constraints are linear in the coefficients of the two unknown matrices and . Furthermore, the matrices in are all psd; i.e., .
The proposal in [2] is to solve a sequence of LPs, indexed by , by replacing the condition by :
[TABLE]
The sequence of matrices is defined as follows
[TABLE]
where is an optimal solution to the LP in (28).
Note that the first LP in the sequence optimizes over the set of diagonally dominant matrices. By defining as a Cholesky factor of , improvement of the optimal value is guaranteed in each iteration. Indeed, as , and the identity matrix is diagonally dominant, we see that and hence is feasible for iteration . This implies that the optimal value at iteration is at least as good as the optimal value at the previous iteration; i.e., (in fact, the inequality is strict under mild assumptions; see [2]).
In an analogous fashion, one can construct a sequence of SOCPs that inner approximate with increasing quality. This time, we define a family of cones
[TABLE]
parameterized again by an matrix . For any , optimizing over the set is an SOCP and we have . This leads us to the following iterative SOCP sequence:
[TABLE]
Assuming existence of an optimal solution at each iteration, we can define the sequence iteratively in the same way as was done in (29). Using similar reasoning, we have . In practice, the sequence of upper bounds approaches faster to the SDP optimal value than the sequence of the LP upper bounds .
Figure 8 shows the improvement (in every direction) obtained just by a single iteration of this approach. The outer set in green in both subfigures is the feasible set of a randomly generated semidefinite program. The sets in black are the DD (left) and the SDD (right) inner approximations. What is shown in dark blue in both cases is the boundary of the improved inner approximation after one iteration. Note that the SOCP in particular fills up almost the entire spectrahedron in a single iteration.
We refer the interested reader to [2] for more details, in particular for an explanation of how the same techniques can be used (via duality) to outer approximate feasible sets of semidefinite programs.
5.2 Column generation
In [1], Ahmadi, Dash, and Hall design another iterative method for inner approximating the set of psd matrices using linear and second order cone programming. Their approach combines DSOS/SDSOS techniques with ideas from the theory of column generation in large-scale linear and integer programming. The high-level idea is to approximate the SDP in (27) by a sequence of LPs (parameterized by ):
[TABLE]
where are fixed psd matrices. These matrices are initialized to be the extreme rays of (recall Lemma 3.5), i.e., all rank one matrices , where the vector has at most two nonzero components, each equal to . Once this initial LP is solved, then one adds one (or sometimes several) new psd matrices to problem (31) and resolves. This process then continues. In each step, the new matrices are picked carefully to bring the optimal value of the LP closer to that of the SDP. Usually, the construction of involves solving a “pricing subproblem” (in terminology of the column generation literature), which adds appropriate cutting planes to the dual of (31); see [1] for more details.
The SOCP analogue of this process is similar. The SDP in (27) is inner approximated by a sequence of SOCPs (parameterized by ):
[TABLE]
where are fixed matrices. They are initialized as the set of matrices that have zeros everywhere, except for a 1 in the first column in position and a 1 in the second column in position . This gives exactly (via Lemma 3.10). In subsequent steps, one (or sometimes several) appropriately-chosen matrices are added to problem (33). These matrices are obtained by solving pricing subproblems and help bring the optimal value of the SOCP closer to that of the SDP in each iteration.
Figure 9 shows the improvement obtained by five iterations of this process on a randomly generated SDP, where the objective is to maximize a linear function in the northeast direction. 5
5.3 Factor-width matrices with
A relevant generalization of sdd matrices is through the notion of factor-width, defined by Boman et al. [19]. The factor-width of a symmetric psd matrix is the smallest integer for which can be written as , where each column of contains at most nonzeros. Equivalently, the factor-width of is the smallest for which can be written as a sum of psd matrices that are nonzero only on a single principal submatrix.
If we denote the cone of symmetric matrices of factor-width by , we have that for all , (cf. Lemma 3.10), and . For values of , one gets an increasingly more accurate inner approximation to the set of psd matrices, all outperforming the approximation given by sdd matrices. We did not consider these cones in this paper as it is already known that for , a representation based on second order cone programming is not possible [36]. Nevertheless, working with can be useful as it involves semidefinite constraints that are small () and hence efficiently solvable. Unfortunately though, optimizing over requires such semidefinite constraints which in many applications can be prohibitive. We believe that a promising future direction would be to use the column generation framework on Section 5.2 to work with a subset of the extreme rays of (for small ) and generate additional ones on the fly based on problem structure.
Some initial experiments with have been performed by Ding and Lim [32]. The authors also develop self-concordant barrier functions for these cones and show that any second order cone program can be written as an optimization problem over , i.e., a scaled diagonally dominant program (recall the definition from Section 3.1). In related work, Permenter and Parrilo [82] consider optimization problems over (and their dual cones) for facial reduction in semidefinite programming. We believe the notion of factor-width is likely to receive more attention in upcoming years in the theory of matrix optimization.
6 Conclusions
We have proposed more scalable alternatives to SOS programming by introducing inner approximations to the sum of squares cone in the form of the cones of dsos and sdsos polynomials. These cones can be optimized over using LP and SOCP respectively and afford us considerable gains in terms of scalability by trading off solution quality. Our numerical examples from a diverse range of applications including polynomial optimization, combinatorial optimization, statistics and machine learning, derivative pricing, control theory, and robotics demonstrate that with reasonable tradeoffs in solution quality, we can handle problem sizes that are significantly beyond the current capabilities of SOS programming (at least without exploiting structure or resorting to specialized solvers). In particular, we have shown that our approach is able to produce useful lower bounds on dense polynomial optimization problems with up to polynomial variables (with degree-4 polynomials), or design stabilizing controllers for realistic robotic control systems with as many as 30 state variables. The Supplementary Material of this paper provides a brief but complete tutorial on the toolbox that was used to generate the numerical results in the paper. In addition, since the extended abstract of this work first appeared, our techniques have been applied by other authors to areas such as conic optimization [21], distance geometry [31], portfolio selection in finance [10], and power engineering [95, 94].
On the theoretical front, we have shown that the (S)DSOS approach shares some of the theoretical asymptotic guarantees usually associated with SOS programming (such as certificates of nonnegativity of forms, arbitrarily tight approximation of copositive programs, and a converging hierarchy of lower bounds for polynomial optimization problems). Finally, in the last section of this paper, we reviewed recent approaches that were developed with the idea of bridging the gap between the (S)DSOS approach and the SOS one. One can obtain improved approximations using these methods at the cost of additional computation time. This raises as a natural research direction the possibility of quantifying these tradeoffs, either for one-shot or for adaptive approximation schemes.
To conclude, we would like to emphasize that a key benefit of our approach is that it can be applied in any application where SOS programming is used. Our hope is that the (S)DSOS approach may open up application areas that have previously been beyond reach due to the limitations in scalability of SOS programming. While it is true that improvements to SDP solver technology will always push the boundary of problems that are within the reach of SOS programming, we believe that concurrent improvements in LP and SOCP solver technology will always leave a range of additional problems that can be handled by relaxations such as ours. For example, we foresee exciting possibilities for real-time applications [5], [66] using technology for real-time linear and second order cone programming that is already coming to fruition [34], [67].
Acknowledgments
We would like to thank Pablo Parrilo for acquainting us with scaled diagonally dominant matrices, and Georgina Hall for many corrections and simplifications on the first draft of this work. Our gratitude extends to Russ Tedrake and the members of the Robot Locomotion Group at MIT for several insightful discussions, particularly around control applications. We thank Frank Permenter and Mark Tobenkin for their help with the numerical implementation of DSOS/SDSOS programs in SPOT, and Aida Khajavirad for running the experiments of Section 4.1 with BARON. We thank the Editorial Board of SIAGA and the anonymous referees for their constructive feedback and insightful comments. Finally, this work has benefited from questions and comments from several colleagues, among whom we gladly acknowledge Greg Blekherman, Stephen Boyd, Sanjeeb Dash, Etienne de Klerk, Jesus de Loera, Lijun Ding, João Gouveia, Didier Henrion, Jean Bernard Lasserre, Leo Liberti, Lek-Heng Lim, Bruce Reznick, James Saunderson, and Bernd Sturmfels.
7 Supplementary Material: Software Toolbox
A complete implementation of the code used to generate the numerical results presented in this paper was written using the Systems Polynomial Optimization Toolbox (SPOT) [68] and is freely available online181818Link to the software: https://github.com/anirudhamajumdar/spotless/tree/spotless_isos. The toolbox features efficient polynomial algebra and allows us to setup the large-scale LPs and SOCPs arising from our examples. Here we provide a brief introduction to the software. This is not meant as a comprehensive tutorial on the SPOT toolbox (for this one may refer to SPOT’s documentation191919Link to SPOT’s documentation: https://github.com/spot-toolbox/spotless/blob/master/doc/manual.pdf). Rather, the goal here is to provide an introduction sufficient for setting up and solving the DSOS and SDSOS programs in this paper. The code relevant for this purpose is included in a branch of SPOT that we have named iSOS (for “inside sum of squares”).
7.1 Installing SPOT
Download the software package from Github available here:
https://github.com/anirudhamajumdar/spotless/tree/spotless_isos
Next, start MATLAB and run the spot_install.m script. This script will setup the MATLAB path and compile a few mex functions. The user may wish to save the new MATLAB path for future use.
7.2 Variables and polynomials
Polynomials are defined and manipulated by the
@msspoly class of SPOT. In order to define a new variable, one can use the msspoly.m function:
[TABLE]
which creates the polynomial . The argument to this function is the name of the created variable and is restricted to four characters chosen from the alphabet (lower case and upper case). A MATLAB vector of variables can be created by passing a second argument to the function:
[TABLE]
This will create a vector of variables that can be accessed using standard array indexing. Multivariate polynomials can then be constructed as follows:
[TABLE]
Variables can be manipulated and operated on using a variety of functions. These include standard arithmetic operations (e.g. addition, multiplication, dot product) and operations for manipulating vectors (e.g. concatenating, reshaping, transposing). Other useful functions include:
deg.m: Returns the total degree of a polynomial. If a second argument in the form of a msspoly is provided, the degree with respect to these variables is returned.
diff.m: Differentiates a polynomial (first argument) with respect to a set of msspoly variables (second argument). The result is the matrix of partial derivatives.
subs.m: Substitutes the third argument in place of the second argument wherever it appears in the first argument.
7.3 Programs
Programs involving DSOS/SDSOS/SOS constraints are constructed using the spotsosprog class. In this section, we demonstrate the workings of spotsosprog with the help of an example. In particular, we take the problem of minimizing a form on the unit sphere considered in this paper. The following example is also available in doc/examples/sdsos_example.m.
[TABLE]
This block of code constructs the polynomial that is to be lower bounded. In order to do this, we create a six dimensional vector x of variables using the msspoly command introduced before. Next, we construct a vector of monomials using the monomials function. The first input to this function is the msspoly variable over which the monomials are defined. The second input to the function is the range of degrees the monomials should have. In this case, since we are considering homogeneous quartics, the range is simply (i.e., just ).
[TABLE]
This block of code sets up the program and constraints. First, the program is initialized in the form of the variable prog. This object will contain information about constraints and decision variables. Next, we declare the variable x to be an indeterminate or abstract variable (thus distinguishing it from a decision variable). Decision variables are created using the newFree function in the class spotsosprog. The input to this function is the number of new decision variables to be created. The outputs are the updated program and a variable corresponding to the decision variable. In our case, we need a single variable gamma to be declared. Finally, we specify a DSOS constraint on p - gamma*(x’*x)^2 using the withDSOS command. There are corresponding functions withSDSOS and withSOS that can be used to setup SDSOS or SOS constraints.
[TABLE]
Finally, in this block of code, we dictate an options structure for the program. In particular, the field options.solveroptions contains options that are specific to the solver to be used (MOSEK in our case). The minimize command is used to specify the objective of the program (which must be linear in the decision variables), the function handle corresponding to the solver to be used, and the structure of options. Currently, MOSEK, Gurobi and SeDuMi are the supported solvers. So, for example, in order to use the Gurobi solver for a (S)DSOS program, we would specify the function handle @spot_gurobi.
The output of the minimize command is a solution structure, which contains diagnostic information and allows one to access the optimized decision variables. The last line of our example code demonstrates how to obtain the optimized variable gamma and convert it to a MATLAB double type.
7.4 Additional functionality
Next, we review additional functionality not covered in the example above. While this section is not meant to be an exhaustive list of all the functionality available in SPOT, it should be sufficient to reproduce the numerical results presented in this paper. In the lines of code presented in the following sections, it is assumed that prog is an object of the spotsosprog class, x is a msspoly variable of size n and an indeterminate variable of prog. These variables can be initialized as in the example above with the following lines of code:
[TABLE]
7.4.1 Creating decision variables
It is often useful to construct a polynomial whose coefficients are decision variables. This can be achieved with the newFree and monomials functions introduced above. In particular, one can create a vector of monomials in a msspoly variable x of a given degree d using the command v = monomials(x,0:d). Then, the set of coefficients can be declared using [prog,c] = prog.newFree(length(v)). Finally, one can obtain the desired polynomial by multiplying these two together: p = c’*v.
It is also possible to create other types of decision variables. For example, the following functions can be used to create various types of matrix decision variables:
newSym: New symmetric matrix.
newDD: New symmetric matrix constrained to be diagonally dominant.
newSDD: New symmetric matrix constrained to be scaled diagonally dominant.
newPSD: New symmetric positive semidefinite matrix.
newDDdual: New symmetric matrix constrained to lie in the dual of the cone of diagonally dominant matrices.
newSDDdual: New symmetric matrix constrained to lie in the dual of the cone of scaled diagonally dominant matrices.
The input to these functions is the size of the desired (square) matrix. For example, in order to create a symmetric matrix Q constrained to be diagonally dominant, one can use the following lines of code:
[TABLE]
7.4.2 Specifying constraints
In addition to the withDSOS, withSDSOS and withSOS functions introduced previously, constraints on existing decision variables can be specified using the following functions:
withEqs: Sets up constraints of the form expr = 0, where expr is the input to the function and is a matrix whose elements are to be constrained to be equal to 0.
withPos: Sets up constraints of the form expr 0, where expr is the input to the function and is a matrix whose elements are to be constrained to be nonnegative.
Note that the functions withEqs and withPos allow one to specify constraints in a vectorized manner, thus avoiding MATLAB’s potentially-slow for-loops.
withDD: Constrains a symmetric matrix (the input to the function) to be diagonally dominant.
withSDD: Constrains a symmetric matrix (the input to the function) to be scaled diagonally dominant.
withPSD: Constrains a symmetric matrix (the input to the function) to be positive semidefinite.
In each case above, the inputs must be affine functions of the decision variables of the program. The output of these functions is an updated spotsosprog object that contains the new constraints. We illustrate the use of the function withEqs with the help of a simple example. The other functions can be used in a similar manner.
[TABLE]
In this example, the elements of the decision vector tau1 are constrained to be equal to the elements of the vector tau2.
withPolyEqs: In order to constrain two polynomials and to be equal to each other for all , one may use the function withPolyEqs. This function will constrain the coefficients of the two polynomials to be equal. Here is a simple example:
[TABLE]
This will constrain c(1) + c(2) = 2 and c(3) = c(4).
7.4.3 Checking if a polynomial is dsos
The three utility functions isDSOS, isSDSOS and isSOS allow one to check if a given polynomial is dsos, sdsos, or sos respectively. The only input to the functions is the polynomial to be checked. The outputs are a boolean variable indicating whether the polynomial is in fact dsos/ sdsos/ sos, the Gram matrix, and the monomial basis corresponding to the Gram matrix (these last two are non-empty only if the given polynomial is in fact dsos/sdsos/sos). The following lines of code provide a simple example:
[TABLE]
One can check that the polynomial p - v’Qv is the zero polynomial (up to numerical tolerances).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. A. Ahmadi, S. Dash, and G. Hall , Optimization over structured subsets of positive semidefinite matrices via column generation , Discrete Optimization, 24 (2017), pp. 129–151.
- 2[2] A. A. Ahmadi and G. Hall , Sum of squares basis pursuit with linear and second order cone programming , Contemporary Mathematics, (2017).
- 3[3] A. A. Ahmadi and G. Hall , On the construction of converging hierarchies for polynomial optimization based on certificates of global positivity , Mathematics of Operations Research, (2018). To appear. Online version available at https://arxiv.org/abs/1709.09307 .
- 4[4] A. A. Ahmadi and A. Majumdar , DSOS and SDSOS optimization: LP and SOCP-based alternatives to sum of squares optimization , in preceedings of the 48th annual Conference on Information Sciences and Systems (CISS), IEEE, 2014.
- 5[5] A. A. Ahmadi and A. Majumdar , Some applications of polynomial optimization in operations research and real-time decision making , Optimization Letters, 10 (2016), pp. 709–729.
- 6[6] A. A. Ahmadi and A. Majumdar , Response to “Counterexample to global convergence of DSOS and SDSOS hierarchies” , ar Xiv preprint ar Xiv:1710.02901, (2017).
- 7[7] F. Alizadeh and D. Goldfarb , Second-order cone programming , Mathematical programming, 95 (2003), pp. 3–51.
- 8[8] E. Artin , Über die Zerlegung Definiter Funktionen in Quadrate , Hamb. Abh., 5 (1927), pp. 100–115.
