DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of   Squares and Semidefinite Optimization

Amir Ali Ahmadi; Anirudha Majumdar

arXiv:1706.02586·math.OC·August 31, 2018

DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization

Amir Ali Ahmadi, Anirudha Majumdar

PDF

TL;DR

This paper introduces DSOS and SDSOS optimization as scalable, tractable alternatives to sum of squares methods, enabling larger problem solving with acceptable accuracy tradeoffs in various fields.

Contribution

It proposes DSOS and SDSOS as LP and SOCP-based alternatives to SOS, extending scalability while maintaining key theoretical properties.

Findings

01

Numerical experiments show scalability beyond traditional SOS methods.

02

Tradeoffs between accuracy and computational efficiency are demonstrated.

03

Theoretical results extend SOS properties to DSOS and SDSOS.

Abstract

In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization.…

Tables9

Table 1. Table 1: Comparison of lower bounds on the minimum of a quartic form on the sphere for varying number of variables.

	n = 10	n = 15	n = 20	n = 25	n = 30	n = 40	n = 50	n = 60	n = 70
DSOS	-5.31	-10.96	-18.012	-26.45	-36.85	-62.30	-94.26	-133.02	-178.23
SDSOS	-5.05	-10.43	-17.33	-25.79	-36.04	-61.25	-93.22	-131.64	-176.89
1-DSOS	-4.96	-9.22	-15.72	-23.58	NA	NA	NA	NA	NA
1-SDSOS	-4.21	-8.97	-15.29	-23.14	NA	NA	NA	NA	NA
SOS	-1.92	-3.26	-3.58	-3.71	NA	NA	NA	NA	NA
BARON	-175.41	-1079.89	-5287.88	-	-28546.1	-	-	-	-

Table 2. Table 2: Comparison of running times (in seconds) averaged over 10 10 10 instances for lower bounding a quartic form on the sphere for varying number of variables.

	n = 10	n = 15	n = 20	n = 25	n = 30	n = 40	n = 50	n = 60	n = 70
DSOS	0.30	0.38	0.74	15.51	7.88	10.68	25.99	58.10	342.76
SDSOS	0.27	0.53	1.06	8.72	5.65	18.66	47.90	109.54	295.30
1-DSOS	0.92	6.26	37.98	369.08	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$
1-SDSOS	1.53	14.39	82.30	538.5389	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$
SOS	0.24	5.60	82.22	1068.66	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$
BARON	0.35	0.62	3.69	-	-	-	-	-	-

Table 3. Table 3: Comparison of upper bounds on the size of the largest independent set in the complement of the graph of an icosahedron.

	Polya LP	r-DSOS	r-SDSOS	SOS
$r = 0$	$\infty$	6.000	6.000	3.2362
$r = 1$	$\infty$	4.333	4.333	NA
$r = 2$	6.000	3.8049	3.6964	NA

Table 4. Table 4: Comparison of fitting errors for convex regression.

	DSOS	SDSOS	SOS
$d = 2$	35.11	33.92	21.28
$d = 4$	14.86	12.94	NA

Table 5. Table 6: Comparison of upper bounds on options prices for different strike prices obtained using SDP, SDDP and DDP. Here, the number of underlying assets is m = 3 𝑚 3 m=3 . For the SDP experiments, we present results using both MOSEK and SDPNAL+ (with two different tolerance levels).

	Exact	SDP	SDP	SDP	SDDP	DDP
		MOSEK	SDPNAL+ (tol= $10^{- 6}$ )	SDPNAL+ (tol= $10^{- 9}$ )
$K = 30$	21.51	21.51	21.72	21.51	21.51	132.63
$K = 35$	17.17	17.17	18.07	17.17	17.17	132.63
$K = 40$	13.20	13.20	12.37	13.20	13.20	132.63
$K = 45$	9.84	9.84	9.66	9.84	9.85	132.63
$K = 50$	7.30	7.30	7.90	7.30	7.30	132.63

Table 6. Table 7: Comparison of upper bounds and running times for a large ( m = 50 𝑚 50 m=50 ) options pricing example using SDP, SDDP and DDP. For the SDP experiments, we present results using both MOSEK and SDPNAL+ (with two different tolerance levels).

	SDP	SDP	SDP	SDDP	DDP
	MOSEK	SDPNAL+ (tol= $10^{- 6}$ )	SDPNAL+ (tol= $10^{- 9}$ )
Upper bound	18.76	18.88	18.88	19.42	252.24
Running time	2502.6 s	2117.3 s	6221.7 s	24.36 s	11.85 s

Table 7. Table 8: Comparison of loadings of principal components and explained variances for standard PCA and sparse versions. Here, the label “Other” denotes sparse PCA based on SDP, DDP, and SDDP. Each of these gives the optimal sparse solution.

	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	expl. var.
PCA, PC1	0.116	0.116	0.116	0.116	-0.395	-0.395	-0.395	-0.395	-0.401	-0.401	60.0 %
PCA, PC2	-0.478	-0.478	-0.478	-0.478	-0.145	-0.145	-0.145	-0.145	0.010	0.010	39.6 %
Other, PC1	0	0	0	0	-0.5	-0.5	-0.5	-0.5	0	0	40.9 %
Other, PC2	0.5	0.5	0.5	0.5	0	0	0	0	0	0	39.5 %

Table 8. Table 9: Comparison of running times, number of non-zero elements, optimal values, and explained variances for five large-scale sparse PCA examples using covariance matrices of size 100 × 100 100 100 100\times 100 .

	DDP				SDDP				DSPCA
No.	Time (s)	NNZ	Opt.	Expl.	Time (s)	NNZ	Opt.	Expl.	Time (s)	NNZ	Opt.	Expl.
1	0.89	4	35.5	0.067 %	1.40	14	30.6	0.087 %	1296.9	5	30.5	0.086 %
2	1.18	4	36.6	0.071 %	1.42	14	34.4	0.099 %	1847.3	6	33.6	0.092 %
3	1.46	4	49.5	0.079 %	1.40	24	41.3	0.097 %	1633.0	16	40.2	0.096 %
4	1.14	4	44.1	0.072 %	1.80	17	38.7	0.10 %	1984.7	8	37.6	0.091 %
5	1.10	4	36.7	0.060 %	1.53	34	33.0	0.068 %	2179.6	10	31.7	0.105 %

Table 9. Table 10: Table reproduced from [ 64 ] showing runtime comparisons (in seconds) for ROA computations on N-link system.

2N ( $#$ states)	4	6	8	10	12	14	16	18	20	22
DSOS	$< 1$	0.44	2.04	3.08	9.67	25.1	74.2	200.5	492.0	823.2
SDSOS	$< 1$	0.72	6.72	7.78	25.9	92.4	189.0	424.74	846.9	1275.6
SOS (SeDuMi)	$< 1$	3.97	156.9	1697.5	23676.5	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$
SOS (MOSEK)	$< 1$	0.84	16.2	149.1	1526.5	$\infty$	$\infty$	$\infty$	$\infty$	$\infty$

Equations174

p (x_{1}, x_{2}) = c_{1} x_{1}^{4} - 6 x_{1}^{3} x_{2} - 4 x_{1}^{3} + c_{2} x_{1}^{2} x_{2}^{2} + 10 x_{1}^{2} + 12 x_{1} x_{2}^{2} + c_{3} x_{2}^{4}

p (x_{1}, x_{2}) = c_{1} x_{1}^{4} - 6 x_{1}^{3} x_{2} - 4 x_{1}^{3} + c_{2} x_{1}^{2} x_{2}^{2} + 10 x_{1}^{2} + 12 x_{1} x_{2}^{2} + c_{3} x_{2}^{4}

p (x_{1}, x_{2}) = (x_{1} - 2 x_{1}^{2})^{2} + (3 x_{1} + 2 x_{2}^{2})^{2} + (x_{1} x_{2} - 3 x_{1}^{2})^{2} .

p (x_{1}, x_{2}) = (x_{1} - 2 x_{1}^{2})^{2} + (3 x_{1} + 2 x_{2}^{2})^{2} + (x_{1} x_{2} - 3 x_{1}^{2})^{2} .

\begin{array}[]{lll}\mbox{minimize}&p(x)&\\ \mbox{subject to}&x\in K\mathrel{\mathop{:}}=\{x\in\mathbb{R}^{n}\ |\ g_{i}(x)\geq 0,h_{i}(x)=0\},&\end{array}

\begin{array}[]{lll}\mbox{minimize}&p(x)&\\ \mbox{subject to}&x\in K\mathrel{\mathop{:}}=\{x\in\mathbb{R}^{n}\ |\ g_{i}(x)\geq 0,h_{i}(x)=0\},&\end{array}

\begin{array}[]{lll}\mbox{maximize}&\gamma&\\ \mbox{subject to}&p(x)-\gamma\geq 0,\ \forall x\in K.&\end{array}

\begin{array}[]{lll}\mbox{maximize}&\gamma&\\ \mbox{subject to}&p(x)-\gamma\geq 0,\ \forall x\in K.&\end{array}

p (x) = i = 1 \sum n (x_{i}^{2} - 1)^{2} + (i = 1 \sum n x_{i} a_{i})^{2}

p (x) = i = 1 \sum n (x_{i}^{2} - 1)^{2} + (i = 1 \sum n x_{i} a_{i})^{2}

V (x) > 0 \forall x \neq = 0, \mbox an d ⟨ \nabla V (x), f (x)⟩ < 0 \forall x \in {x ∣ V (x) \leq β, x \neq = 0},

V (x) > 0 \forall x \neq = 0, \mbox an d ⟨ \nabla V (x), f (x)⟩ < 0 \forall x \in {x ∣ V (x) \leq β, x \neq = 0},

\begin{array}[]{lll}\mbox{minimize}&E[p(X)]=\sum c_{k_{1}\ldots k_{n}}\mu_{k_{1}\ldots k_{n}}&\\ \mbox{subject to}&p(x)\geq 1,\forall x\in\mathcal{S}&\\ &p(x)\geq 0,\forall x\in\Omega.&\end{array}

\begin{array}[]{lll}\mbox{minimize}&E[p(X)]=\sum c_{k_{1}\ldots k_{n}}\mu_{k_{1}\ldots k_{n}}&\\ \mbox{subject to}&p(x)\geq 1,\forall x\in\mathcal{S}&\\ &p(x)\geq 0,\forall x\in\Omega.&\end{array}

S : = {x \in R^{n} ∣ g_{1} (x) \geq 0, \dots, g_{m} (x) \geq 0}

S : = {x \in R^{n} ∣ g_{1} (x) \geq 0, \dots, g_{m} (x) \geq 0}

p (x) = s_{0} (x) + i = 1 \sum m s_{i} (x) g_{i} (x),

p (x) = s_{0} (x) + i = 1 \sum m s_{i} (x) g_{i} (x),

\begin{array}[]{rll}p(x)&=&z^{T}Qz,\\ Q&\succeq&0,\end{array}

\begin{array}[]{rll}p(x)&=&z^{T}Qz,\\ Q&\succeq&0,\end{array}

z = [1, x_{1}, x_{2}, \dots, x_{n}, x_{1} x_{2}, \dots, x_{n}^{d}] .

z = [1, x_{1}, x_{2}, \dots, x_{n}, x_{1} x_{2}, \dots, x_{n}^{d}] .

(d n + d) \times (d n + d),

(d n + d) \times (d n + d),

p (x) = i \sum α_{i} m_{i}^{2} (x) + i, j \sum β_{ij}^{+} (m_{i} (x) + m_{j} (x))^{2} + i, j \sum β_{ij}^{-} (m_{i} (x) - m_{j} (x))^{2},

p (x) = i \sum α_{i} m_{i}^{2} (x) + i, j \sum β_{ij}^{+} (m_{i} (x) + m_{j} (x))^{2} + i, j \sum β_{ij}^{-} (m_{i} (x) - m_{j} (x))^{2},

p (x) = i \sum α_{i} m_{i}^{2} (x) + i, j \sum (\hat{β}_{ij}^{+} m_{i} (x) + \tilde{β}_{ij}^{+} m_{j} (x))^{2} + i, j \sum (\hat{β}_{ij}^{-} m_{i} (x) - \tilde{β}_{ij}^{-} m_{j} (x))^{2},

p (x) = i \sum α_{i} m_{i}^{2} (x) + i, j \sum (\hat{β}_{ij}^{+} m_{i} (x) + \tilde{β}_{ij}^{+} m_{j} (x))^{2} + i, j \sum (\hat{β}_{ij}^{-} m_{i} (x) - \tilde{β}_{ij}^{-} m_{j} (x))^{2},

D S O S_{n, 2 d} \subseteq S D S O S_{n, 2 d} \subseteq S O S_{n, 2 d} \subseteq P S D_{n, 2 d} .

D S O S_{n, 2 d} \subseteq S D S O S_{n, 2 d} \subseteq S O S_{n, 2 d} \subseteq P S D_{n, 2 d} .

p (x_{1}, x_{2}) = x_{1}^{4} + x_{2}^{4} + a x_{1}^{3} x_{2} + (1 - \frac{1}{2} a - \frac{1}{2} b) x_{1}^{2} x_{2}^{2} + 2 b x_{1} x_{2}^{3} .

p (x_{1}, x_{2}) = x_{1}^{4} + x_{2}^{4} + a x_{1}^{3} x_{2} + (1 - \frac{1}{2} a - \frac{1}{2} b) x_{1}^{2} x_{2}^{2} + 2 b x_{1} x_{2}^{3} .

a_{ii} \geq j \neq = i \sum ∣ a_{ij} ∣

a_{ii} \geq j \neq = i \sum ∣ a_{ij} ∣

a_{ii} d_{i} \geq j \neq = i \sum ∣ a_{ij} ∣ d_{j}, \forall i .

a_{ii} d_{i} \geq j \neq = i \sum ∣ a_{ij} ∣ d_{j}, \forall i .

D D_{n} \subseteq S D D_{n} \subseteq P_{n} .

D D_{n} \subseteq S D D_{n} \subseteq P_{n} .

I + x A + y B

I + x A + y B

M=\sum_{i=1}^{n^{2}}\eta_{i}{\color[rgb]{0,0,0}V_{i}},

M=\sum_{i=1}^{n^{2}}\eta_{i}{\color[rgb]{0,0,0}V_{i}},

p (x) = i = 1 \sum n^{2} (η_{i} z^{T} (x) v_{i} v_{i}^{T} z (x)) = i = 1 \sum n^{2} η_{i} (v_{i}^{T} z (x))^{2},

p (x) = i = 1 \sum n^{2} (η_{i} z^{T} (x) v_{i} v_{i}^{T} z (x)) = i = 1 \sum n^{2} η_{i} (v_{i}^{T} z (x))^{2},

p (x) = z^{T} (x) k \sum Q_{k} z (x),

p (x) = z^{T} (x) k \sum Q_{k} z (x),

p (x) = (D^{- 1} z (x))^{T} D Q D (D^{- 1} z (x)) .

p (x) = (D^{- 1} z (x))^{T} D Q D (D^{- 1} z (x)) .

p (x) = z^{T} (x) k \sum Q_{k} z (x),

p (x) = z^{T} (x) k \sum Q_{k} z (x),

(\frac{1}{a} 0 0 \frac{1}{b}) A (\frac{1}{a} 0 0 \frac{1}{b}) = (1111)

(\frac{1}{a} 0 0 \frac{1}{b}) A (\frac{1}{a} 0 0 \frac{1}{b}) = (1111)

Q = i < j \sum M^{ij},

Q = i < j \sum M^{ij},

\begin{array}[]{ll}&Q_{ii}\geq\sum_{j\neq i}z_{ij},\forall i,\\ &-z_{ij}\leq Q_{ij}\leq z_{ij},\forall i,j,i\neq j\end{array}

\begin{array}[]{ll}&Q_{ii}\geq\sum_{j\neq i}z_{ij},\forall i,\\ &-z_{ij}\leq Q_{ij}\leq z_{ij},\forall i,j,i\neq j\end{array}

(M^{ij})_{ii}+(M^{ij})_{jj}\geq 0,~{}\Bigl{\lvert}\Bigl{\lvert}\begin{pmatrix}2(M^{ij})_{ij}\\ (M^{ij})_{ii}-(M^{ij})_{jj}\end{pmatrix}\Bigl{\lvert}\Bigl{\lvert}\leq(M^{ij})_{ii}+(M^{ij})_{jj}.

(M^{ij})_{ii}+(M^{ij})_{jj}\geq 0,~{}\Bigl{\lvert}\Bigl{\lvert}\begin{pmatrix}2(M^{ij})_{ij}\\ (M^{ij})_{ii}-(M^{ij})_{jj}\end{pmatrix}\Bigl{\lvert}\Bigl{\lvert}\leq(M^{ij})_{ii}+(M^{ij})_{jj}.

X \in S_{n} minimize

X \in S_{n} minimize

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis

\newsiamthmclaimClaim \headersDSOS and SDSOS OptimizationA. A. Ahmadi and A. Majumdar

\newsiamthmExampleExample

DSOS and SDSOS Optimization:

More Tractable Alternatives to

Sum of Squares and Semidefinite Optimization††thanks: An extended abstract for this work has appeared in [4]. \funding Amir Ali Ahmadi was partially supported by the DARPA Young Faculty Award, the Sloan Fellowship, the NSF CAREER Award, the AFOSR Young Investigator Program Award, and the Google Research Award.

Amir Ali Ahmadi Department of Operations Research and Financial Engineering, Princeton University. (, http://aaa.princeton.edu/). [email protected]

Anirudha Majumdar Department of Mechanical and Aerospace Engineering, Princeton University. (, https://irom-lab.princeton.edu/majumdar/). [email protected]

Abstract

In recent years, optimization theory has been greatly impacted by the advent of sum of squares (SOS) optimization. The reliance of this technique on large-scale semidefinite programs however, has limited the scale of problems to which it can be applied. In this paper, we introduce DSOS and SDSOS optimization as linear programming and second-order cone programming-based alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. These are optimization problems over certain subsets of sum of squares polynomials (or equivalently subsets of positive semidefinite matrices), which can be of interest in general applications of semidefinite programming where scalability is a limitation. We show that some basic theorems from SOS optimization which rely on results from real algebraic geometry are still valid for DSOS and SDSOS optimization. Furthermore, we show with numerical experiments from diverse application areas—polynomial optimization, statistics and machine learning, derivative pricing, and control theory—that with reasonable tradeoffs in accuracy, we can handle problems at scales that are currently significantly beyond the reach of traditional sum of squares approaches. Finally, we provide a review of recent techniques that bridge the gap between our DSOS/SDSOS approach and the SOS approach at the expense of additional running time. The Supplementary Material of the paper introduces an accompanying MATLAB package for DSOS and SDSOS optimization.

keywords:

Sum of squares optimization, polynomial optimization, nonnegative polynomials, semidefinite programming, linear programming, second order cone programming.

{AMS}

65K05, 90C25, 90C22, 90C05, 90C90, 12Y05, 93C85, 90C27

1 Introduction

For which values of the real coefficients $c_{1},c_{2},c_{3},$ is the polynomial

[TABLE]

nonnegative, i.e., satisfies $p(x_{1},x_{2})\geq 0$ for all $(x_{1},x_{2})\in\mathbb{R}^{2}$ ? The problem of optimizing over nonnegative polynomials—of which our opening question is a toy example—is fundamental to many problems of applied and computational modern mathematics. In such an optimization problem, one would like to impose constraints on the coefficients of an unknown polynomial so as to make it nonnegative, either globally in $\mathbb{R}^{n}$ (as in the above example), or on a certain basic semialgebraic set, i.e., a subset of $\mathbb{R}^{n}$ defined by a finite number of polynomial inequalities. We will demonstrate shortly (see Section 1.1) why optimization problems of this kind are ubiquitous in applications and universal in the study of questions dealing in one way or another with polynomial equations and inequalities.

Closely related to nonnegative polynomials are polynomials that are sums of squares. We say that a polynomial $p$ is a sum of squares (sos) if it can be written as $p=\sum_{i}q_{i}^{2}$ for some (finite number of) polynomials $q_{i}$ . For example, the polynomial in (1) with $c_{1}=13,c_{2}=1,c_{3}=4$ is sos since it admits the decomposition

[TABLE]

The relationship between nonnegative and sum of squares polynomials has been a classical subject of study in real algebraic geometry. For example, a result of Hilbert from 1888 [47] states that all nonnegative bivariate polynomials of degree four are sums of squares. It follows, as a special case, that the sets of coefficients for which the polynomial $p$ in (1) is nonnegative or a sum of squares in fact coincide. In the general situation, however, while sum of squares polynomials are clearly always nonnegative, it is not true that the converse always holds. This was shown in the same paper of Hilbert [47], where he gave a non-constructive proof of existence of nonnegative polynomials that are not sums of squares. Explicit examples of such polynomials appeared many years later, starting with the work of Motzkin [70] in the 1960s. Hilbert’s interest in this line of research is also showcased in his 17th problem, which asks whether every nonnegative polynomial is a sum of squares of rational functions. We refer the interested reader to an outstanding survey paper of Reznick [88], which covers many historical aspects around Hilbert’s 17th problem, including the affirmative solution by Artin [8], as well as several later developments.

The classical questions around nonnegative and sum of squares polynomials have been revisited quite extensively in the past 10-15 years in different communities among applied and computational mathematicians. The reason for this renewed interest is twofold: (i) the discovery that many problems of modern practical interest can be cast as optimization problems over nonnegative polynomials, and (ii) the observation that while optimizing over nonnegative polynomials is generally NP-hard, optimization over the set of sum of squares polynomials can be done via semidefinite programming (SDP); see Theorem 2.1 in Section 2. The latter development, originally explored in the pioneering works of Shor [96], Nesterov [72], Parrilo [78],[79], and Lasserre [54], has led to the creation of sum of squares optimization—a computational framework, with semidefinite programming as its underlying engine, that can tackle many fundamental problems of real algebra and polynomial optimization.

The dependence of sum of squares approaches on semidefinite programming can be viewed as both a strength and a weakness depending on one’s perspective. From a computational complexity viewpoint, semidefinite programs can be solved with arbitrary accuracy in polynomial time using interior point methods (see [102] for a comprehensive survey). As a result, sum of squares techniques offer polynomial time algorithms that approximate a very broad class of NP-hard problems of interest. From a more practical viewpoint however, SDPs are among the most expensive convex relaxations to solve. The speed and reliability of the current SDP solvers lag behind those for other more restricted classes of convex programs (such as linear or second order cone programs) by a wide margin. With the added complication that the SDPs generated by sos problems are large (see Section 2), scalability has become the single most outstanding challenge for sum of squares optimization in practice.

In this paper, we focus on the latter of the two viewpoints mentioned above and offer alternatives to sum of squares optimization that allow one to trade off computation time with solution quality. While these alternatives are more conservative in general, they are significantly more scalable. Our hope is that the proposed approaches will expand the use and applicability of algebraic techniques in optimization to new areas and share its appeal with a broader audience. We call our new computational frameworks, which rely on linear and second order cone programming, DSOS and SDSOS optimization111We use and recommend the pronunciation “d-sauce” and “s-d-sauce”.. These are short for diagonally-dominant-sum-of-squares and scaled-diagonally-dominant-sum-of-squares; see Section 3 for precise definitions. While these tools are primarily designed for sum of squares optimization, they are also applicable to general applications of semidefinite programming where tradeoffs between scalability and performance may be desirable. In the interest of motivating our contributions for a diverse audience, we delay a presentation of these contributions until Section 3 and start instead with a portfolio of problem areas involving nonnegative polynomials. Any such problem area is one to which sum of squares optimization, as well as its new DSOS and SDSOS counterparts, are directly applicable.

1.1 Why optimize over nonnegative polynomials?

We describe several motivating applications in this section at a high level. These will be revisited later in the paper with concrete computational examples; see Section 4.

Polynomial optimization

A polynomial optimization problem (POP) is an optimization problem of the form

[TABLE]

where $p$ , $g_{i}$ , and $h_{i}$ are multivariate polynomials. The special case of problem (2) where the polynomials $p,g_{i},h_{i}$ all have degree one is of course linear programming, which can be solved very efficiently. For higher degrees, POP contains as special cases many important problems in operations research; e.g., the optimal power flow problem in power engineering [49], the computation of Nash equilibria in game theory [53], [80], problems of Euclidean embedding and distance geometry [61], and a host of problems in combinatorial optimization. We observe that if we could optimize over the set of polynomials that are nonnegative on a closed basic semialgebraic set, then we could solve the POP problem to global optimality. To see this, note that the optimal value of problem (2) is equal to the optimal value of the following problem:

[TABLE]

Here, we are trying to find the largest constant $\gamma$ such that the polynomial $p(x)-\gamma$ is nonnegative on the set $K$ ; i.e., the largest lower bound on problem (2).

Combinatorial optimization

Proofs of nonnegativity of polynomials of degree as low as four provide infeasibility certificates for all decision problems in the complexity class NP, including many well-known combinatorial optimization problems. Consider, e.g., the simple-to-describe, NP-complete problem of PARTITION [37]: Given a set of integers $a_{1},\ldots,a_{n}$ , decide if they can be split into two sets with equal sums. It is straightforward to see that a PARTITION instance is infeasible if and only if the degree-four polynomial

[TABLE]

satisfies $p(x)>0$ , $\forall x\in\mathbb{R}^{n}$ . Indeed, $p$ is by construction a sum of squares and hence nonnegative. If it were to have a zero, each square in $p$ would have to evaluate to zero; but this can happen if and only if there is a binary vector $x\in\{-1,1\}^{n}$ , which makes $\sum_{i=1}^{n}x_{i}a_{i}=0$ , giving a yes answer to the PARTITION instance. Suppose now that for a given instance, and for some $\epsilon>0$ , we could prove that $p(x)-\epsilon$ is nonnegative. Then we would have proven that our PARTITION instance is infeasible, potentially without trying out all $2^{n}$ ways of splitting the integers into two sets.

Control systems and robotics

Numerous fundamental problems in nonlinear dynamics and control, such as stability, robustness, collision avoidance, and controller design can be turned into problems about finding special functions—the so-called Lyapunov functions (see, e.g., [51])—that satisfy certain sign conditions. For example, given a differential equation $\dot{x}=f(x)$ , where $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ , and with the origin as an equilibrium point (i.e., satisfying $f(0)=0$ ), consider the “region of attraction (ROA) problem”: Determine the set of initial states in $\mathbb{R}^{n}$ from which the trajectories flow to the origin. Lyapunov’s stability theorem (see, e.g., [51, Chap. 4]) tells us that if we can find a (Lyapunov) function $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , which together with its gradient $\nabla V$ satisfies

[TABLE]

then the set $\{x\in\mathbb{R}^{n}|\ V(x)\leq\beta\}$ is part of the region of attraction. If $f$ is a polynomial function (a very important case in applications [78]), and if we parameterize $V$ as a polynomial function, then the search for the coefficients of $V$ satisfying the conditions in (4) is an optimization problem over the set of nonnegative polynomials. Designing stable equilibrium points with large regions of attraction is a fundamental problem in control engineering and robotics; see, e.g., [50], [65].

Statistical regression with shape constraints

A problem that arises frequently in statistics is that of fitting a function to a set of data points with minimum error, while ensuring that it meets some structural properties, such as nonnegativity, monotonicity, or convexity [41], [42]. Requirements of this type are typically imposed either as regularizers (to avoid overfitting), or more importantly as a result of prior knowledge that the true function to be estimated satisfies the same structural properties. In economics, for example, a regression problem for estimating the utility of consumers from sample measurements would come with a natural concavity requirement on the utility function. When the regression functions are polynomials, many such structural properties lead to polynomial nonnegativity constraints. For example, a polynomial $p(x)\mathrel{\mathop{:}}=p(x_{1},\ldots,x_{n})$ can be constrained to be concave by requiring its Hessian matrix $\nabla^{2}p(x)$ , which is an $n\times n$ symmetric matrix with polynomial entries, to be negative semidefinite for all $x$ . This condition is easily seen to be equivalent to the polynomial $-y^{T}H(x)y$ in the $2n$ variables $x\mathrel{\mathop{:}}=(x_{1},\ldots,x_{n})$ and $y\mathrel{\mathop{:}}=(y_{1},\ldots,y_{n})$ being nonnegative.

Probability bounds given moments

The well-known inequalities of Markov and Chebyshev in probability theory bound the probability that a univariate random variable takes values in a subset of the real line given the first, or the first and second moments of its distribution [13]. Consider a vast generalization of this problem where one is given a finite number of moments $\{\mu_{k_{1},\ldots,k_{n}}\mathrel{\mathop{:}}=E[X_{1}^{k_{1}}\ldots X_{n}^{k_{n}}],k_{1}+\cdots,k_{n}\leq d\}$ of a multivariate multivariate random variable $X\mathrel{\mathop{:}}=(X_{1},\ldots,X_{n})$ and is interested in bounding the probability of an event $X\in\mathcal{S}$ , where $\mathcal{S}$ is a general basic semialgebraic subset of the sample space $\Omega$ . A sharp upper bound on this probability can be obtained by solving the following optimization problem for the coefficients $c_{k_{1}\ldots k_{n}}$ of an $n$ -variate degree- $d$ polynomial $p$ (see [17], [84]):

[TABLE]

It is easy to see that any feasible solution of problem (5) gives an upper bound on the probability of the event $X\in\mathcal{S}$ as the polynomial $p$ is by construction placed above the indicator function of the set $S$ and the expected value of this indicator function is precisely the probability of the event $X\in\mathcal{S}$ . Note that the constraints in (5) are polynomial nonnegativity conditions.

Other applications

The application areas outlined above are only a few examples in a long list of problems that can be tackled by optimizing over the set of nonnegative polynomials. Other interesting problems on this list include: computation of equilibria in games with continuous strategy spaces [80], distinguishing separable and entangled states in quantum information theory [33], computing bounds on sphere packing problems in geometry [9], software verification [92], volume computation [45], robust and stochastic optimization [15], filter design [99], combinatorics [86], [40], and automated theorem proving [44]. A great reference for many applications in this area, as well as the related theoretical and computational background, is a recent volume edited by Blekherman, Parrilo, and Thomas [18].

Organization of the paper

The remainder of this paper is structured as follows. In Section 2, we briefly review the sum of squares approach and its relation to semidefinite programming for the general reader. We also comment on its challenges regarding scalability and prior work that has tackled this issue. The familiar reader can skip to Section 3, where our contributions begin. In Section 3.1, we introduce the cones of dsos and sdsos polynomials and demonstrate that we can optimize over them using linear and second order cone programming respectively. Section 3.2 introduces hierarchies of cones based on dsos and sdsos polynomials that can better approximate the cone of nonnegative polynomials. Asymptotic guarantees on these cones are also presented. Similar to the sum of squares hierarchy, these guarantees can be turned into a hierarchy of converging lower bounds for polynomial optimization problems that use linear or second order cone programming in each step.

Section 4 presents numerical experiments that use our approach on large-scale examples from polynomial optimization (Section 4.1), combinatorial optimization (Section 4.2), statistics and machine learning (Sections 4.3 and 4.5), financial mathematics (Section 4.4), and control theory and robotics (Section 4.6). Section 5 provides a review of recent techniques for bridging the gap between the DSOS/SDSOS approach and the SOS approach at the expense of additional computational costs. Section 6 concludes the paper. The Supplementary Material introduces iSOS (“inside SOS”), an accompanying MATLAB package for DSOS and SDSOS optimization written using the SPOT toolbox [68].

2 Review of the semidefinite programming-based approach and computational considerations

As mentioned earlier, sum of squares optimization handles a (global) nonnegativity constraint on a polynomial $p$ by replacing it with the more restrictive requirement that $p$ be a sum of squares. The situation where $p$ is only constrained to be nonnegative on a certain basic semialgebraic set222In this formulation, we have avoided equality constraints for simplicity. Obviously, there is no loss of generality in doing this as an equality constraint $h(x)=0$ can be imposed by the pair of inequality constraints $h(x)\geq 0,-h(x)\geq 0$ .

[TABLE]

can also be handled with the help of appropriate sum of squares multipliers. For example, if we succeed in finding sos polynomials $s_{0},s_{1},\ldots,s_{m}$ , such that

[TABLE]

then we have found a certificate of nonnegativity of $p$ on the set $\mathcal{S}$ . Indeed, if we evaluate the above expression at any $x\in\mathcal{S}$ , nonnegativity of the polynomials $s_{0},s_{1}\ldots,s_{m}$ imply that $p(x)\geq 0$ . A Positivstellensatz from real algebraic geometry due to Putinar [85] states that if the set $\mathcal{S}$ satisfies the so-called Archimedean property, a property only slightly stronger than compactness (see, e.g., [58] for a precise definition), then every polynomial positive on $\mathcal{S}$ has a representation of the type (6), for some sos polynomials $s_{0},s_{1},\ldots,s_{m}$ of high enough degree (see also [74] for degree bounds). Even with no qualifications whatsoever regarding the set $\mathcal{S}$ , there are other Positivstellensätze (e.g., due to Stengle [97]) that certify nonnegativity of a polynomial on a basic semialgebraic set using sos polynomials. These certificates are only slightly more complicated than (6) and involve sos multipliers associated with products among polynomials $g_{i}$ that define $\mathcal{S}$ [79]. A great reference for the interested reader is the survey paper by Laurent [58].333Our review material in this section does not do justice to the “dual” theory of generalized moment problems; in addition to the survey paper by Laurent, the interested reader is referred to the recent textbooks by Lasserre [55], [56] and references therein.

The computational advantage of a certificate of (global or local) nonnegativity via sum of squares polynomials is that it can be automatically found by semidefinite programming. The following well-known theorem establishes the link between sos polynomials and SDP. Let us denote the set of $n\times n$ symmetric matrices by $\mathcal{S}_{n}$ . Recall that a matrix $A\in\mathcal{S}_{n}$ is positive semidefinite (psd) if $x^{T}Ax\geq 0,\forall x\in\mathbb{R}^{n}$ , and that semidefinite programming is the problem of optimizing a linear function over psd matrices subject to affine inequalities on their entries [102]. We denote the positive semidefiniteness of a matrix $A$ with the standard notation $A\succeq 0$ .

Theorem 2.1 (see, e.g., [78],[79]).

A multivariate polynomial $p\mathrel{\mathop{:}}=p(x)$ in $n$ variables and of degree $2d$ is a sum of squares if and only if there exists a symmetric matrix $Q$ (often called the Gram matrix) such that

[TABLE]

where $z$ is the vector of monomials of degree up to $d$ :

[TABLE]

Searching for a matrix $Q$ satisfying the positive semidefiniteness constraint, as well as the linear equality constraints coming from (7), amounts to solving a semidefinite program. The size of the matrix $Q$ in this theorem is

[TABLE]

which approximately equals $n^{d}\times n^{d}$ . While this number is polynomial in $n$ for fixed $d$ , it can grow rather quickly even for low-degree polynomials. For example, a degree- $4$ polynomial ( $d=2$ ) in $50$ variables has 316251 coefficients and its Gram matrix, which would need to be positive semidefinite, contains 879801 decision variables. A semidefinite constraint of this size is quite expensive, and in fact a problem with $50$ variables is well beyond the current realm of possibilities in SOS optimization. In the absence of problem structure, sum of squares problems involving degree- $4$ or $6$ polynomials are currently limited, roughly speaking, to a handful or a dozen variables.

There have been many contributions already to improvements in scalability of sum of squares techniques. One approach has been to develop systematic techniques for taking advantage of problem structure, such as sparsity or symmetry of the underlying polynomials, to reduce the size of the SDPs [38, 101, 28, 90]. These techniques have proven to be very useful as problems arising in practice sometimes do come with a lot of structure. Another approach which holds promise has been to design customized solvers for SOS programs that avoid resorting to an off-the-shelf interior point solver. These techniques can often lead to improvements for special classes of problems. Examples in this direction include the works in [14, 75, 104, 60, 105, 103, 76, 46]. There has also been recent work by Lasserre et al. that increases scalability of sum of squares optimization problems at the cost of accuracy of the solutions obtained. This is done by bounding the size of the largest SDP constraint appearing in the sos formulation, and leads to what the authors call the BSOS (bounded SOS) hierarchy [57].

The approach we take in this paper for enhancing scalability is orthogonal to the ones mentioned above (and can potentially later be combined with them). We propose to eschew sum of squares decompositions to begin with. In our view, almost all application areas of this field use sum of squares decompositions not for their own sake, but because they provide a means to polynomial nonnegativity. Hence, this paper is motivated by a natural question: Can we give other sufficient conditions for polynomial nonnegativity that are perhaps more restrictive than a sum of squares decomposition, but cheaper to work with?444Given this motivation, we note that our numerical experiments in Section 4 compare our approach primarily with the standard and widely-used semidefinite programming-based approach to sum of squares optimization. They are certainly not meant to be exhaustive. We believe that comparisons with the special-purpose techniques highlighted above and with other solvers would be interesting but are better suited for a separate work given our space limitations.

In the next section, we identify some subsets of the cone of nonnegative polynomials that one can optimize over using linear programming (LP) and second order cone programming (SOCP) [62], [7]. Not only are these much more efficiently solvable classes of convex programs, but they are also superior to semidefinite programs in terms of numerical conditioning. In addition, working with these classes of convex programs allows us to take advantage of high-performance LP and SOCP solvers (e.g., CPLEX [25]) that have been matured over several decades because of industry applications. We remark that while there have been other approaches to produce LP hierarchies for polynomial optimization problems (e.g., based on the Krivine-Stengle certificates of positivity [52, 97, 55]), these LPs, though theoretically significant, are typically quite weak in practice and often numerically ill-conditioned [57]. Finally, we remark that for a range of combinatorial optimization problems, there exist interesting algebraic approaches that do not rely on SDPs and only involve linear algebraic operations [30], [29]. While our techniques are similar in the spirit of avoiding SDPs, they are not tied to combinatorial (or polynomial) optimization problems.

3 DSOS and SDSOS Optimization

We start with some basic definitions. A monomial $m:\mathbb{R}^{n}\rightarrow\mathbb{R}$ in variables $x\mathrel{\mathop{:}}=(x_{1},\ldots,x_{n})$ is a function of the form $m(x)=x_{1}^{\alpha_{1}}\ldots x_{n}^{\alpha_{n}}$ , where each $\alpha_{i}$ is a nonnegative integer. The degree of such a monomial is by definition $\alpha_{1}+\cdots+\alpha_{n}$ . A polynomial $p:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a finite linear combination of monomials $p(x)=\sum_{{\alpha_{1}}\ldots{\alpha_{n}}}c_{{\alpha_{1}}\ldots{\alpha_{n}}}x_{1}^{\alpha_{1}}\ldots x_{n}^{\alpha_{n}}$ . The degree of a polynomial is the maximum degree of its monomials. A form or a homogeneous polynomial is a polynomial whose monomials all have the same degree. For any polynomial $p\mathrel{\mathop{:}}=p(x_{1},\ldots,x_{n})$ of degree $d$ , one can define its homogenized version $p_{h}$ by introducing a new variable $y$ and defining $p_{h}(x_{1},\ldots,x_{n},y)=y^{d}p(\frac{x}{y})$ . One can reverse this operation (dehomogenize) by setting the homogenizing variable $y$ equal to one: $p(x_{1},\ldots,x_{n})=p_{h}(x_{1},\ldots,x_{n},1)$ . The properties of being nonnegative and a sum of squares (as defined earlier) are preserved under homogenization and dehomogenization [88]. We say that a form $f$ is positive definite if $f(x)>0$ for all $x\neq 0$ . We denote by $PSD_{n,2d}$ and $SOS_{n,2d}$ the set of nonnegative (a.k.a. positive semidefinite) and sum of squares polynomials in $n$ variables and degree $2d$ respectively. (Note that odd-degree polynomials can never be nonnegative.) Both these sets form proper (i.e., convex, closed, pointed, and solid) cones in $\mathbb{R}^{n+2d\choose 2d}$ with the obvious inclusion relationship $SOS_{n,2d}\subseteq\ PSD_{n,2d}$ .

The basic idea behind our approach is to approximate $SOS_{n,2d}$ from the inside with new sets that are more tractable for optimization purposes. Towards this goal, one may think of several natural sufficient conditions for a polynomial to be a sum of squares. For example, the following may be natural first candidates:

•

The set of polynomials that are sums of 4-th powers of polynomials:

$\{p|\ p=\sum q_{i}^{4}\}$ ,

•

The set of polynomials that are a sum of a constant number of squares of polynomials; e.g., a sum of three squares: $\{p|\ p=q_{1}^{2}+q_{2}^{2}+q_{3}^{2}\}.$

Even though both of these sets clearly reside inside the set of sos polynomials, they are not any easier to optimize over. In fact, they are much harder: testing whether a polynomial is a sum of 4-th powers is NP-hard already for quartics [48] (in fact, the cone of 4-th powers of linear forms is dual to the cone of nonnegative quartic forms [89]) and optimizing over polynomials that are sums of three squares is intractable (as this task even for quadratics subsumes the NP-hard problem of positive semidefinite matrix completion with a rank constraint [59]). These examples illustrate that in general, an inclusion relationship does not imply anything with respect to optimization complexity. Hence, we need to take some care when choosing which subsets of $SOS_{n,2d}$ to work with. On the one hand, these subsets have to be “big enough” to be useful in practice; on the other hand, they should be more tractable to optimize over.

3.1 The cones of dsos and sdsos polynomials

We now describe two interesting cones inside $SOS_{n,2d}$ that lend themselves to linear and second order cone representations and are hence more tractable for optimization purposes. These cones will also form the building blocks of some more elaborate cones (with improved performance) that we will present in Subsection 3.2 and Section 5.

Definition 3.1.

A polynomial $p\mathrel{\mathop{:}}=p(x)$ is diagonally-dominant-sum-of-squares (dsos) if it can be written as

[TABLE]

*for some monomials $m_{i}(x),m_{j}(x)$ and some nonnegative scalars $\alpha_{i},\beta_{ij}^{+},\beta_{ij}^{-}$ . We denote the set of polynomials in $n$ variables and degree $2d$ that are dsos by $DSOS_{n,2d}$ . *

Definition 3.2.

A polynomial $p\mathrel{\mathop{:}}=p(x)$ is scaled-diagonally-dominant-sum-of-squares (sdsos) if it can be written as

[TABLE]

*for some monomials $m_{i}(x),m_{j}(x)$ and some scalars $\alpha_{i},\hat{\beta}_{ij}^{+},\tilde{\beta}_{ij}^{+},\hat{\beta}_{ij}^{-},\tilde{\beta}_{ij}^{-}$ with $\alpha_{i}\geq~{}0$ . We denote the set of polynomials in $n$ variables and degree $2d$ that are sdsos by $SDSOS_{n,2d}$ . *

From the definition, the following inclusion relationships should be clear:

[TABLE]

In general, all these containment relationships are strict. Figure 1 shows these sets for a family of bivariate quartics parameterized by two coefficients $a,b$ :555It is well known that all psd bivariate forms are sos [88] and hence the outer set exactly characterizes all values of $a$ and $b$ for which this polynomial is nonnegative.

[TABLE]

Some remarks about the two definitions above are in order. It is not hard to see that if $p$ has degree $2d$ , the monomials $m_{i},m_{j}$ in (8) or (9) never need to have degree higher than $d$ . In the special case where $p$ is homogeneous of degree $2d$ (i.e., a form), the monomials $m_{i},m_{j}$ can be taken to have degree exactly $d$ . We also note that decompositions given in (8) or (9) are not unique; there can even be infinitely many such decompositions but the definition requires just one. Finally, we have found out that interestingly, sdsos polynomials were studied in the 1970’s and 1980’s for purely algebraic reasons by Robinson [91], Reznick [87], and Choi, Lam, and Reznick [24]. These papers use the more descriptive terminology of “sum of binomial squares” (sbs), which is consistent with the decomposition in (9). In [87] for example, Reznick characterizes interesting families of forms that are sos if and only if they are sbs (i.e., sdsos). In [24], Choi, Lam, and Reznick provide a complete description of sbs forms and their extremal elements among the family of even symmetric sextics. In the process, they answer questions that Robinson had raised earlier [91] about sbs polynomials.

Our terminology in Definitions 3.1 and 3.2 comes from a connection (which we will soon establish) to the following classes of symmetric matrices.

Definition 3.3.

A symmetric matrix $A=(a_{ij})$ is diagonally dominant (dd) if

[TABLE]

*for all $i$ . A symmetric matrix $A$ is scaled diagonally dominant (sdd) if there exists a diagonal matrix $D$ , with positive diagonal entries, such that $DAD$ is dd.666The requirement that the diagonal matrix $D$ have positive diagonal entries can be replaced with it being nonsingular without changing the definition. Checking whether a given matrix $A=(a_{ij})$ is sdd can be done by solving a linear program since the definition is equivalent to existence of an element-wise positive vector $d$ such that

$a_{ii}d_{i}\geq\sum_{j\neq i}|a_{ij}|d_{j},\forall i.$

We denote the set of $n\times n$ dd and sdd matrices with $DD_{n}$ and $SDD_{n}$ respectively. *

It follows from Gershgorin’s circle theorem [39] that diagonally dominant matrices are positive semidefinite. This implies that scaled diagonally dominant matrices are also positive semidefinite as the eigenvalues of $DAD$ have the same sign as those of $A$ . Hence, if we denote the set of $n\times n$ positive semidefinite matrices by $P_{n}$ , the following inclusion relationships are evident:

[TABLE]

These containments are strict for $n>2$ . In Figure 2, we illustrate the set of $x$ and $y$ for which the matrix

[TABLE]

is diagonally dominant, scaled diagonally dominant, and positive semidefinite. Here, $I$ is the identity matrix and the $5\times 5$ symmetric matrices $A$ and $B$ were generated randomly with iid entries sampled from the standard normal distribution. Our interest in these inner approximations to the set of psd matrices stems from the fact that optimization over them can be done by linear and second order cone programming respectively (see Theorem 3.12 below, which is more general). For now, let us relate these matrices back to dsos and sdsos polynomials.

Theorem 3.4.

*A polynomial $p$ of degree $2d$ is dsos if and only if it admits a representation as $p(x)=z^{T}(x)Qz(x)$ , where $z(x)$ is the standard monomial vector of degree $\leq d$ and $Q$ is a dd matrix. *

The following lemma by Barker and Carlson provides an extreme ray characterization of the set of diagonally dominant matrices and will be used in the proof of the above theorem.

Lemma 3.5 (Barker and Carlson [11]).

Let $\mathcal{V}=\{v_{j}\}_{j=1}^{2n^{2}}$ be the set of all nonzero vectors in $\mathbb{R}^{n}$ with at most $2$ nonzero components, each equal to $\pm 1$ . Let $\{V_{i}\}_{i=1}^{n^{2}}$ be the set of all rank-one matrices of the form $V_{i}=vv^{T}$ for some $v\in\mathcal{V}.$ Then, a symmetric $n\times n$ matrix $M$ is diagonally dominant if and only if it can be written as

[TABLE]

*for some $\eta_{i}\geq 0$ . *

Proof 3.6 (Proof of Theorem 3.4).

Suppose first that $p(x)=z^{T}(x)Qz(x)$ with $Q$ diagonally dominant. Lemma 3.5 implies that

[TABLE]

where $\eta_{i}\geq 0$ and $\{v_{i}\}$ is the set of all nonzero vectors in $\mathbb{R}^{n}$ with at most $2$ nonzero components, each equal to $\pm 1$ . In this sum, the vectors $v_{i}$ that have only one nonzero entry lead to the first sum in the dsos expansion (8). The vectors $v_{i}$ with two $1$ s or two $-1$ s lead to the second sum, and those with one $1$ and one $-1$ lead to the third.

For the reverse direction, suppose $p$ is dsos, i.e., has the representation in (8). We will show that

[TABLE]

*where each $Q_{k}$ is dd and corresponds to a single term in the expansion (8). As the sum of dd matrices is dd, this would establish the proof. For each term of the form $\alpha_{i}m_{i}^{2}(x)$ we can take $Q_{k}$ to be a matrix of all zeros except for a single diagonal entry corresponding to the monomial $m_{i}$ in $z(x)$ , with this entry equaling $\alpha_{i}$ . For each term of the form $\beta_{ij}^{+}(m_{i}(x)+m_{j}(x))^{2}$ , we can take $Q_{k}$ to be a matrix of all zeros except for four entries corresponding to the monomials $m_{i}(x)$ and $m_{j}(x)$ in $z(x)$ . All these four entries will be set to $\beta_{ij}^{+}$ . Similarly, for each term of the form $\beta_{ij}^{-}(m_{i}(x)-m_{j}(x))^{2}$ , we can take $Q_{k}$ to be a matrix of all zeros except for four entries corresponding to the monomials $m_{i}$ and $m_{j}(x)$ in $z(x)$ . In this $2\times 2$ submatrix, the diagonal elements will be $\beta_{ij}^{-}$ and the off-diagonal elements will be $-\beta_{ij}^{-}$ . Clearly, all the $Q_{k}$ matrices we have constructed are dd. *

Theorem 3.7.

*A polynomial $p$ of degree $2d$ is sdsos if and only if it admits a representation as $p(x)=z^{T}(x)Qz(x)$ , where $z(x)$ is the standard monomial vector of degree $\leq d$ and $Q$ is a sdd matrix. *

Proof 3.8.

Suppose first that $p(x)=z^{T}(x)Qz(x)$ with $Q$ scaled diagonally dominant. Then, there exists a diagonal matrix $D$ , with positive diagonal entries, such that $DQD$ is diagonally dominant and

[TABLE]

Now an argument identical to that in the first part of the proof of Theorem 3.4 (after replacing $z(x)$ with $D^{-1}z(x)$ ) gives the desired sdsos representation in (9).

For the reverse direction, suppose $p$ is sdsos, i.e., has the representation in (9). We show that

[TABLE]

where each $Q_{k}$ is sdd and corresponds to a single term in the expansion (9). As the sum of sdd matrices is sdd,777This claim is not obvious from the definition but is apparent from Lemma 3.10 below which implies that $SDD_{n}$ is a convex (in fact, proper) cone. this would establish the proof. Indeed, for each term of the form $\alpha_{i}m_{i}^{2}(x)$ we can take $Q_{k}$ (once again) to be a matrix of all zeros except for a single diagonal entry corresponding to the monomial $m_{i}$ in $z(x)$ , with this entry equaling $\alpha_{i}$ .

For each term of the form $(\hat{\beta}_{ij}^{+}m_{i}(x)+\tilde{\beta}_{ij}^{+}m_{j}(x))^{2}$ , we can take $Q_{k}$ to be a matrix of all zeros except for four entries corresponding to the monomials $m_{i}(x)$ and $m_{j}(x)$ in $z(x)$ . This $2\times 2$ block will then be $\begin{pmatrix}\hat{\beta}_{ij}^{+}\\ \tilde{\beta}_{ij}^{+}\end{pmatrix}\begin{pmatrix}\hat{\beta}_{ij}^{+}&\tilde{\beta}_{ij}^{+}\end{pmatrix}$ . Similarly, for each term of the form $(\hat{\beta}_{ij}^{-}m_{i}(x)-\tilde{\beta}_{ij}^{-}m_{j}(x))^{2}$ , we can take $Q_{k}$ to be a matrix of all zeros except for four entries corresponding to the monomials $m_{i}$ and $m_{j}(x)$ in $z(x)$ . This $2\times 2$ block will then be $\begin{pmatrix}\hat{\beta}_{ij}^{-}\\ -\tilde{\beta}_{ij}^{-}\end{pmatrix}\begin{pmatrix}\hat{\beta}_{ij}^{-}&-\tilde{\beta}_{ij}^{-}\end{pmatrix}$ . It remains to show that a $2\times 2$ rank-1 positive semidefinite matrix is sdd.888This also implies that any $2\times 2$ positive semidefinite matrix is sdd. Let $A=\begin{pmatrix}a\\ b\end{pmatrix}\begin{pmatrix}a&b\end{pmatrix}$ be such a matrix. Observe that

[TABLE]

*is dd and hence $A$ is by definition sdd. (Note that if $a$ or $b$ are zero, then $A$ is already dd.) *

The following characterizations of sdd matrices will be important for us.

Theorem 3.9 (see theorems 8 and 9 by Boman et al. [19]).

*A symmetric matrix $Q$ is sdd if and only if it has “factor width” at most 2; i.e., a factorization $Q=VV^{T}$ , where each column of $V$ has at most two nonzero entries. *

Lemma 3.10.

A symmetric $n\times n$ matrix $Q$ is sdd if and only if it can be expressed as

[TABLE]

*where each $M^{ij}$ is an $n\times n$ matrix with zeros everywhere except for four entries $(M^{ij})_{ii}$ , $(M^{ij})_{ij}$ , $(M^{ij})_{ji}$ , $(M^{ij})_{jj}$ , which make the $2\times 2$ matrix $\begin{bmatrix}(M^{ij})_{ii}&(M^{ij})_{ij}\\ (M^{ij})_{ji}&(M^{ij})_{jj}\end{bmatrix}$ symmetric and positive semidefinite. *

Proof 3.11.

*This is almost immediate from Theorem 3.9: If $Q=VV^{T}$ for some $n\times k$ matrix $V$ , then $Q=\sum_{l=1}^{k}v_{l}v_{l}^{T},$ where $v_{l}$ denotes the $l$ -th column of $V$ . Since each column $v_{l}$ has at most two nonzero entries, say in positions $i$ and $j$ , each matrix $v_{l}v_{l}^{T}$ will be exactly of the desired form $M^{ij}$ in the statement of the lemma. Conversely, if $Q$ can be written as in (11), then we can write each $M^{ij}$ as $M^{ij}=w_{ij,1}w_{ij,1}^{T}+w_{ij,2}w_{ij,2}^{T},$ where the vectors $w_{ij,1}$ and $w_{ij,2}$ have at most two nonzero entries. If we then construct a matrix $V$ which has the collection of the vectors $w_{ij,1}$ and $w_{ij,2}$ as columns, we will have $Q=VV^{T}.$ *

Theorem 3.12.

*For any fixed $d$ , optimization over $DSOS_{n,2d}$ (resp. $SDSOS_{n,2d}$ ) can be done with a linear program (resp. second order cone program) of size polynomial in $n$ . *

Proof 3.13.

We will use the characterizations of dsos/sdsos polynomials given in theorems 3.4 and 3.7. In both cases, the equality $p(x)=z^{T}(x)Qz(x),\forall x$ can be imposed by a finite set of linear equations in the coefficients of $p$ and the entries of $Q$ (these match the coefficients of $p(x)$ with those of $z^{T}(x)Qz(x)$ ). The constraint that $Q$ be dd can be imposed, e.g., by a set of linear inequalities

[TABLE]

in variables $Q_{ij}$ and $z_{ij}$ . This gives a linear program.

The constraint that $Q$ be sdd can be imposed via Lemma 3.10 by a set of equality constraints that enforce equation (11). The constraint that each $2\times 2$ matrix $\begin{bmatrix}(M^{ij})_{ii}&(M^{ij})_{ij}\\ (M^{ij})_{ji}&(M^{ij})_{jj}\end{bmatrix}$ be psd is a “rotated quadratic cone” constraint and can be imposed using SOCP [7, 62]:

[TABLE]

*In both cases, the final LP and SOCP are of polynomial size because the size of the Gram matrix is ${n+d\choose d}$ , which is polynomial in $n$ for fixed $d$ . *

We have written publicly-available code that automates the process of generating LPs (resp. SOCPs) from dsos (resp. sdsos) constraints on a polynomial. More information about this can be found in the Supplementary Material.

The ability to replace SDPs with LPs and SOCPs is what results in significant speedups in our numerical experiments (see Section 4). For the special case where the polynomials involved are quadratic forms, we get a systematic way of inner approximating semidefinite programs. A generic SDP that minimizes $\mbox{Tr}(CX)$ subject to the constraints $\mbox{Tr}(A_{i}X)=b_{i},i=1,\ldots,m,$ and $X\succeq 0,$ can be replaced by999We are grateful to Leo Liberti for suggesting the terminology “DDP and SDDP”.

•

a diagonally dominant program (DDP); i.e., a problem of the form

[TABLE]

which is an LP, or

•

a scaled diagonally dominant program (SDDP); i.e., a problem of the form

[TABLE]

which is an SOCP.

DDP and SDDP fit nicely within the framework of conic programming [12, Chap. 2] as they are optimization problems over the intersection of an affine subspace with a proper cone ( $DD_{n}$ or $SDD_{n}$ ). The description of the associated dual cones can be found e.g. in [82], [2]. In Section 4.4 and Section 4.5, we show some applications of DDP/SDDP to problems in finance and statistics.

3.2 The cone of r-dsos and r-sdsos polynomials and asymptotic guarantees

We now present a hierarchy of cones based on the notions of dsos and sdsos polynomials that can be used to better approximate the cone of nonnegative polynomials.

Definition 3.14.

For an integer $r\geq 0$ , we say that a polynomial $p\mathrel{\mathop{:}}=p(x_{1},\ldots,x_{n})$ is r-dsos (resp. r-sdsos) if

[TABLE]

*is dsos (resp. sdsos). We denote the set of polynomials in $n$ variables and degree $2d$ that are r-dsos (resp. r-sdsos) by $rDSOS_{n,2d}$ (resp. $rDSOS_{n,2d}$ ). *

Note that for $r=0$ we recover our dsos/sdsos definitions. Moreover, because the multiplier $(\sum_{i=1}^{n}x_{i}^{2})^{r}$ is nonnegative, we see that for any $r$ , the property of being r-dsos or r-sdsos is a sufficient condition for nonnegativity:

[TABLE]

Optimizing over r-dsos (resp. r-sdsos) polynomials is still an LP (resp. SOCP). The proof of the following theorem is identical to that of Theorem 3.12 and hence omitted.

Theorem 3.15.

*For any fixed $d$ and $r$ , optimization over the set $rDSOS_{n,d}$ (respectively $rSDSOS_{n,d}$ ) can be done with linear programming (resp. second order cone programming) of size polynomial in $n$ . *

If we revisit the parametric family of bivariate quartics in (10), the improvements obtained by going one level higher in the hierarchy are illustrated in Figure 3. Interestingly, for $r\geq 1$ , r-dsos and r-sdsos tests for nonnegativity can sometimes outperform the sos test. The following two examples illustrate this.

{Example}

Consider the Motzkin polynomial, $M(x)=x_{1}^{4}x_{2}^{2}+x_{1}^{2}x_{2}^{4}-3x_{1}^{2}x_{2}^{2}x_{3}^{2}+x_{3}^{6},$ which historically is the first known example of a nonnegative polynomial that is not a sum of squares [70]. The following decomposition (found by solving an LP) shows that $M\in 2DSOS_{3,6}$ and thus that $M$ is nonnegative:

[TABLE]

This proves that $2DSOS_{3,6}\nsubseteq SOS_{3,6}$ .

{Example}

Consider the polynomial $p(x)=x_{1}^{4}x_{2}^{2}+x_{2}^{4}x_{3}^{2}+x_{3}^{4}x_{1}^{2}-3x_{1}^{2}x_{2}^{2}x_{3}^{2}$ . Once again this polynomial is nonnegative but not a sum of squares [88]. The following decomposition, which was obtained by solving an LP, shows that $p\in 1DSOS_{3,6}$ and thus that $p$ is nonnegative:

[TABLE]

This proves that $1DSOS_{3,6}\nsubseteq SOS_{3,6}$ .

It is natural to ask whether every nonnegative polynomial is r-dsos (or r-sdsos) for some $r$ ? The following theorems (Theorem 3.16 and 3.20) deal with this question and provide asymptotic guarantees on r-dsos (and hence r-sdsos) hierarchies.

Theorem 3.16.

*Let $p$ be an even101010An even polynomial (resp. monomial) is a polynomial (resp. monomial) whose individual variables are raised only to even degrees. positive definite form. Then, there exists an integer $r$ for which $p$ is r-dsos (and hence r-sdsos). *

Proof 3.17.

A well-known theorem of Pólya [83] states that if a form $f(x_{1},\ldots,x_{n})$ is positive on the simplex (i.e., the set $\Delta_{n}\mathrel{\mathop{:}}=\{x\in\mathbb{R}^{n}|x_{i}\geq 0,\sum_{i}x_{i}=1\}$ ), then there exists an integer $r$ such that all coefficients of

[TABLE]

are positive. Under the assumptions of the theorem, this implies that there is an $r$ for which the form $p(\sqrt{x_{1}},\ldots,\sqrt{x_{n}})(x_{1}+\cdots+x_{n})^{r}$ and hence the form

[TABLE]

*have positive coefficients. But this means that $p(x)(\sum_{i}x_{i}^{2})$ is a nonnegative weighted sum of squared monomials and therefore clearly dsos (see (8)). (In a Gram matrix representation $p(x)(\sum_{i}x_{i}^{2})=z^{T}(x)Qz(x)$ , the argument we just gave shows that we can take $Q$ to be diagonal.) *

We remark that even forms already constitute a very interesting class of polynomials since they include, e.g., all polynomials coming from copositive programming; see Subsection 4.2. In fact, any NP-complete problem can be reduced in polynomial time to the problem of checking whether a degree-4 even form is nonnegative [71]. An application of this idea to the independent set problem in combinatorial optimization is presented in Subsection 4.2.

The next proposition shows that the evenness assumption cannot be removed from Theorem 3.16.

Proposition 3.18.

For any $0<a<1$ , the quadratic form

[TABLE]

*is positive definite but not r-sdsos for any $r$ . *

Proof 3.19.

Positive definiteness is immediate from having $a>0$ . We show that for $a<1,$ the quadratic form $p$ is not r-sdsos for any $r$ by first presenting a general separating hyperplane for the cone of sdsos forms in any degree and dimension. For a form $f$ , construct a vector $v_{f}$ which has an entry per monomial in $f$ , with entry $i$ equal to $+1$ if the $i$ -th monomial in $f$ is even and $-1$ if it is not. We claim that if $f$ is sdsos, then the (standard) inner product of its coefficients with $v_{f}$ is nonnegative. This can be seen by noting that this inner product is equal to the value that a related sdsos form $q$ takes at the all ones vector. The form $q$ is obtained from $f$ by leaving its even monomials untouched and flipping the sign of the coefficients of its non-even monomials. It is straightforward to see (from the expansion in the definition of an sdsos polynomial in (9)) that $q$ constructed as such will be sdsos as well. Hence, $q(1,\ldots,1)\geq 0$ .

*The latter claim of the proposition now follows from the observation that for any $r$ , the inner product of the coefficients of the form $f_{r}(x)\mathrel{\mathop{:}}=p(x)\cdot(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{r}$ with the $\pm 1$ vector $v_{f_{r}}$ that we described above is negative when $a<1$ (and exactly when $a<1$ ). *

We remark that the form in (14) can be proved to be positive using the improvements presented in Section 5, in particular with a “factor width 3” proof system from Subsection 5.3. Alternatively, one can work with the next theorem, which removes the assumption of evenness from Theorem 3.16 at the cost of doubling the number of variables and the degree.

Theorem 3.20 (see Section 4 of [3]).

An $n$ -variate form $p\mathop{\mathrel{:}}=p(x)$ of degree $2d$ is positive definite if and only if there exists a positive integer $r$ that makes the following form $r$ -dsos:

[TABLE]

In [3, Section 4], this theorem is used to construct LP and SOCP-based converging hierarchies of lower bounds111111As implied by Proposition 3.18, not every convergent SOS hierarchy has an analogous convergent (S)DSOS hierarchy; see [6, Section 2.2] for more detail.for the general polynomial optimization problem (2) when the feasible set is compact.

4 Numerical examples and applications

In this section, we consider several numerical examples that highlight the scalability of the approach presented in this paper. Comparisons to examples considered in the literature are provided whenever possible. A software package written using the Systems Polynomial Optimization Toolbox (SPOT) [68] includes a complete implementation of the presented methods and is available online121212Link to spotless_isos software package: https://github.com/anirudhamajumdar/spotless/tree/spotless_isos. The toolbox features efficient polynomial algebra and allows us to setup the large-scale LPs and SOCPs arising from our examples. Our Supplementary Material provides a brief introduction to the software package and the associated functionality required for setting up the DSOS/SDSOS programs considered in this paper. The LPs and SOCPs resulting from our programs, as well as the SDPs that are considered for comparison, are solved using the solver MOSEK [69] on a machine with 4 processors and a clock speed of 3.4 GHz and 16GB of RAM. While using special-purpose solvers could lead to speedups for any of the three methods (LP/SOCP/SDP) for specific problems, we believe that a widely-employed general-purpose solver such as MOSEK provides a strong benchmark. Further, for our example on options pricing (Section 4.4), we also present results using the SDPNAL+ solver [103]. As brought to our attention by a referee, the constraints in this example are particularly amenable to the semismooth Newton method employed by SDPNAL+. In our experiments, we have also tried the solver SeDuMi [98], which is arguably the most commonly-used free SDP solver. We do not report these run times however (except in Section 4.6.1) as they are often significantly slower than those produced by the SDP solver of MOSEK.

4.1 Lower bounds on polynomial optimization problems

An important application of sum of squares optimization is to obtain lower bounds on polynomial optimization problems (see e.g. [54], [81], [73]). In this subsection, we consider the particular problem of minimizing a homogeneous polynomial $p(x)$ of degree $2d$ on the unit sphere (i.e., the set $\{x\in\mathbb{R}^{n}|\ x^{T}x=1\}$ ). This well-studied problem is strongly NP-hard even when $d=2$ . Its optimal value is easily seen to be equivalent to the optimal value of the following problem:

[TABLE]

By replacing the nonnegativity condition with a SOS/DSOS/SDSOS constraint, the resulting problem becomes an SDP/LP/SOCP. Tables 1 and 2 compare optimal values and running times respectively on problems of increasing size. We restrict ourselves to quartic forms and vary the number of variables $n$ between $5$ and $70$ . Table 1 compares the optimal values for the DSOS, SDSOS, 1-DSOS, 1-SDSOS and SOS relaxations of problem (15) on quartic forms whose coefficients are fully dense and drawn independently from the standard normal distribution (we consider a single random instance for each $n$ ). We note that the optimal values presented here are representative of numerical experiments we performed for a broader set of random instances (including coefficients drawn from a uniform distribution instead of the normal distribution). Also, in all our experiments, we could verify that the SOS bound actually coincides with the true global optimal value of the problem. This was done by finding a point on the sphere that produced a matching upper bound.

As the tables illustrate, both DSOS and SDSOS scale up to problems of dimension $n=70$ . This is well beyond the size of problems handled by SOS programming, which is unable to deal with problems with $n$ larger than $25$ due to memory (RAM) constraints. While the price we pay for this increased scalability is suboptimality, this is tolerable in many applications (as we show in other examples in this section).

Table 1 also compares the approach presented in this paper to the lower bound obtained at the root node of the branch-and-bound procedure in the popular global optimization package BARON [93].131313We thank Aida Khajavirad for running the BARON experiments for us. This comparison was only done up to dimension 30 since beyond this we ran into difficulties passing the large polynomials to BARON’s user interface.

Table 2 presents running times141414The BARON instances were implemented on a 64-bit Intel Xeon X5650 2.66Ghz processor using the CPLEX LP solver. averaged over $10$ random instances for each $n$ . There is a significant difference in running times between DSOS/SDSOS and SOS beginning at $n=15$ . While we are unable to run SOS programs beyond $n=25$ , DSOS and SDSOS programs for $n=70$ take only a few minutes to execute. Memory constraints prevent us from executing programs for $n$ larger than $70$ . The results indicate, however, that it may be possible to run larger programs within a reasonable amount of time on a machine equipped with more RAM.

4.2 Copositive programming and combinatorial optimization

A symmetric matrix $Q\in\mathbb{R}^{n\times n}$ is copositive if $x^{T}Qx\geq 0$ for all $x\geq 0$ (i.e., for all $x$ in the nonnegative orthant). The problem of optimizing a linear function over affine sections of the cone $\mathcal{C}_{n}$ of $n\times n$ copositive matrices has recently received a lot of attention from the optimization community [35], since it can exactly model several problems of combinatorial and nonconvex optimization [20], [35], [23]. For instance, the size $\alpha(G)$ of the largest independent set 151515An independent set of a graph is a subset of its nodes no two of which are connected. The problem of finding upper bounds on $\alpha$ has many applications in scheduling and coding theory [63]. of a graph $G$ on $n$ nodes is equal to the optimal value of the following copositive program [27]:

[TABLE]

where $A$ is the adjacency matrix of the graph, $I$ is the identity matrix, and $J$ is the matrix of all ones.

It is easy to see that a matrix $Q$ is copositive if and only if the quartic form $(x.^{2})^{T}Q(x.^{2})$ , with $x.^{2}\mathrel{\mathop{:}}=(x_{1}^{2},x_{2}^{2},\dots,x_{n}^{2})^{T}$ , is nonnegative. Hence, by requiring this form to be r-sos/r-sdsos/r-dsos, one obtains inner approximations to the cone $\mathcal{C}_{n}$ that can be optimized over using semidefinite, second order cone, and linear programming. The sos version of this idea goes back to the PhD thesis of Parrilo [78].

In this section, we compare the quality of these bounds using another hierarchical LP-based method in the literature [20], [27]. The $r$ -th level of this LP hierarchy (referred to as the Pólya LP from here onwards) requires that the coefficients of the form $(x.^{2})^{T}Q(x.^{2})\cdot(x_{1}^{2}+\cdots+x_{n}^{2})^{r}$ be nonnegative. This is motivated by a theorem by Pólya, which states that this constraint will be satisfied for $r$ large enough (assuming $x^{T}Qx>0$ for all nonzero $x\geq 0$ ). It is not difficult to see that the Pólya LP is always dominated by the LP which requires $(x.^{2})^{T}Q(x.^{2})$ to be r-dsos. This is because the latter requires the underlying Gram matrix to be diagonally dominant, while the former requires it to be diagonal, with nonnegative diagonal entries.

In [27], de Klerk and Pasechnik show that when the Pólya LP is applied to approximate the copositive program in (4.2), then the precise independent set number of the graph is obtained within $\alpha(G)^{2}$ steps of the hierarchy. By the previous argument, the same claim immediately holds for the r-dsos (and hence r-sdsos) hierarchies.

Table 3 revisits Example 5.2 of [20], where upper bounds on the independent set number of the complement of the graph of an icosahedron are computed. (This is a graph on 12 nodes with independent set number equal to 3; see [20] for its adjacency matrix.) As the results illustrate, SOS programming provides an exact upper bound (since the size of the largest stable set is necessarily an integer). The second levels of the r-dsos and r-sdsos hierarchies also provide an exact upper bound. In contrast, the LP based on Polya’s theorem in [20] gives the upper bound $\infty$ for the [math]-th and the $1$ -st level, and an upper bound of $6$ at the second level. In contrast to the Pólya LP, one can show that the [math]-th level of the r-dsos LP always produces a finite upper bound on the independent set number. In fact, this bound will always be smaller than $n-\min d_{i}+1$ , where $d_{i}$ is the degree of node $i$ [1, Theorem 5.1] .

4.3 Convex regression in statistics

Next, we consider the problem of fitting a function to data subject to a constraint on the function’s convexity. This is an important problem in statistics and has a wide domain of applications including value function approximation in reinforcement learning, options pricing and modeling of circuits for geometric programming based circuit design [42, 43]. Formally, we are given $N$ pairs of data points $(x_{i},y_{i})$ with $x_{i}\in\mathbb{R}^{n}$ , $y_{i}\in\mathbb{R}$ and our task is to find a convex function $f$ from a family $\mathcal{F}$ that minimizes an appropriate notion of fitting error (e.g., $L_{1}$ error):

[TABLE]

The convexity constraint here can be imposed by requiring that the function $w^{T}H(x)w$ in $2n$ variables $(x,w)$ associated with the Hessian matrix $H(x)$ of $f(x)$ be nonnegative. Restricting ourselves to polynomial functions $f$ of bounded degree, we obtain the following optimization problem:

[TABLE]

where $H(x)$ is again the Hessian of $f(x)$ . As before, we can replace the nonnegativity constraint with a dsos/sdsos/sos constraint. For our numerical experiment, we generated $300$ random vectors $x_{i}$ in $\mathbb{R}^{20}$ drawn i.i.d. from the standard normal distribution. The function values $y_{i}$ were computed as follows:

[TABLE]

where $\eta_{i}$ was chosen i.i.d. from the standard normal distribution. Tables 5 and 5 present the fitting errors and running times for the DSOS/SDSOS/SOS programs resulting from restricting the class of functions $\mathcal{F}$ to polynomials of degree $d=2$ and $d=4$ . As the results illustrate, we are able to obtain significantly smaller errors with polynomials of degree $4$ using DSOS and SDSOS (compared to SOS with $d=2$ ), while the SOS program for $d=4$ does not run due to memory constraints.

4.4 Options pricing

An important problem in financial economics is that of determining the price of a derivative security such as an option given the price of the underlying asset. Significant progress was made in tackling this problem with the advent of the Black-Scholes formula, which operates under two assumptions: (i) the underlying stock price is governed by geometric Brownian motion, and (ii) there is no arbitrage. A natural question that has been considered in financial mathematics is to see if one can provide bounds on the price of an option in the no-arbitrage setting given much more minimal assumptions on the dynamics of the stock price. One approach to this question—studied, e.g., in [16], [22], —is to assume that we are only given the first $k$ moments of the stock price and want to optimally bound the price of the option.

More precisely, we consider that we are given $m$ stocks, an associated option with payoff function $\phi:\mathbb{R}_{+}^{m}\rightarrow\mathbb{R}$ (which will typically depend on the strike price of the option), a vector of $n$ moment functions, $f_{i}:\mathbb{R}_{m}^{+}\rightarrow\mathbb{R}$ , $i=0,1,\dots,n$ ( $f_{0}$ is assumed to be 1), and the corresponding vector of moments $q=(q_{0},\dots,q_{n})$ (here $q_{0}=1$ ). The problem of upper bounding the price of the option can then be formulated as follows [16]:

[TABLE]

Here, the expectation is taken over all Martingale measures $\pi$ , defined on $\mathbb{R}_{+}^{m}$ . We consider here the case where one is given the mean $\mu$ and covariance matrix $\sigma$ of the stock prices. Problem (17) can then be cast as the following problem (see [16, Section 6.2])

[TABLE]

which for many common functions $\phi$ of interest (see e.g. below) gives rise to a copositive program. A well-known and obvious sufficient condition for a matrix $M$ to be copositive is for it to have a decomposition $M=P+N$ , where $P$ is psd and $N$ is element-wise nonnegative [78]. This allows one to apply SDP to problem (18) and obtain upper bounds on the option price. By replacing the psd condition on the matrix $P$ with a dd/sdd condition, we obtain a DDP/SDDP.

We first compare the DDP/SDDP/SDP approaches on an example from [22, Section 4], which considers the problem of upper bounding the price of a European call option with $m=3$ underlying assets. The vector of means of the assets is $\mu=[44.21,44.21,44.21]^{T}$ and the covariance matrix is:

[TABLE]

Further, we have the following payoff function, which depends on the strike price $K$ :

[TABLE]

Table 6 compares the upper bounds for different strike prices using DDP/SDDP/SDP. For the SDP experiments, we present results using both MOSEK and SDPNAL+ [103]. For each strike price, the upper bound obtained using SDP (solved with MOSEK) is exact. This can be verified by finding a distribution that achieves the upper bound (i.e., finding a matching lower bound). For example, when $K=30$ , the distribution supported on the set of four points $(5.971,5.971,5.971)$ , $(54.03,46.02,46.02,46.02)$ , $(46.02,54.03,46.02)$ and $(46.02,46.02,54.03)$ in $\mathbb{R}^{3}$ with probability masses $0.105$ , $0.298$ , $0.298$ , $0.298$ respectively does the job. As the table illustrates, the upper bounds obtained using SDPNAL+ are slightly numerically inaccurate for the default accuracy tolerance of $10^{-6}$ , but are accurate with a tolerance of $10^{-9}$ (this was the largest tolerance that resulted in bounds accurate to two decimal places). The upper bound obtained using SDDP is almost identical to the exact bound for each strike price. The DDP bound is loose, and interestingly does not change with the strike price. Running times for the different methods on this small example are negligible and hence not presented.

In order to demonstrate the scalability of DDP and SDDP, we consider a larger scale randomized example with $m=50$ underlying assets. The mean and covariance are taken to be the sample mean and covariance of $100$ instances of vectors belonging to $\mathbb{R}^{m}$ with elements drawn uniformly and independently from the interval $[0,10]$ . The strike price is $K=5$ and the payoff function is again

[TABLE]

Table 7 compares upper bounds and running times for the different methods. SDDP provides us with a bound that is very close to the one obtained using SDP, with a significant speedup in computation (approximately a factor of 100 compared to MOSEK and a factor of 90 compared to SDPNAL+). The running time for DDP is comparable to SDDP, but the bound is not as tight. SDPNAL+ (tolerance = $10^{-6}$ ) is faster as compared to MOSEK, but yields a slightly larger bound. In contrast to the smaller-scale example above, the bound does not change when the SDPNAL+ tolerance is set to $10^{-9}$ .

4.5 Sparse PCA

Next, we consider the problem of sparse principal component analysis (sparse PCA). In contrast to standard PCA, where the principal components (PCs) in general depend on all the observed variables, the goal of sparse PCA is to identify principal components that only depend on small subsets of the variables [106]. While the statistical fidelity of the resulting representation of the data in terms of sparse PCs will in general be lower than the standard PCs, sparse PCs can significantly enhance interpretability of the results. This feature has proved to be very useful in applications such as finance and analysis of gene expressions; see, e.g., [26] and references therein.

Given a $n\times n$ covariance matrix $A$ , the problem of finding sparse principal components can be written as the following optimization problem:

[TABLE]

Here, $\textrm{Card}(x)$ is the cardinality (number of non-zero entries) of $x$ and $k$ is a given threshold. The problem can be reformulated as the following rank-constrained matrix optimization problem [26]:

[TABLE]

In [26], the authors propose “DSPCA”, an SDP relaxation of this problem that is obtained by dropping the rank constraint and replacing the cardinality constraint with a constraint on the $l_{1}$ norm:

[TABLE]

The optimal value of problem (24) is an upper bound on the optimal value of (22). Further, when the solution $X_{1}$ to (24) has rank equal to one, the SDP relaxation is tight and the dominant eigenvector $x_{1}$ of $X_{1}$ is the optimal loading of the first sparse PC [26]. When a rank one solution is not obtained, the dominant eigenvector can still be retained as an approximate solution to the problem. Further sparse PCs can be obtained by deflating $A$ to obtain:

[TABLE]

and re-solving the SDP with $A_{2}$ in place of $A$ (and iterating this procedure).

The framework presented in this paper can be used to obtain LP and SOCP relaxations of (22) by replacing the constraint $X\succeq 0$ by the constraint $X\in DD_{n}^{*}$ or $X\in SDD_{n}^{*}$ respectively, where $DD_{n}^{*}$ and $SDD_{n}^{*}$ are the dual cones of $DD_{n}$ and $SDD_{n}$ (see [2, Section 3.3] for a description of these dual cones). Since $S_{n}^{+}\subseteq SDD_{n}^{*}\subseteq DD_{n}^{*}$ , we are guaranteed that the resulting optimal solutions will be upper bounds on the SDP solution. Further, as in the SDP case, the relaxation is tight when a rank-one solution is obtained, and once again, if a rank-one solution is not obtained, we can still use the dominant eigenvector as an approximate solution.

We first consider Example $6.1$ from [26]. In this example, there are three hidden variables distributed normally:

[TABLE]

These hidden variables generate $10$ observed variables:

[TABLE]

with $j=1$ for $i=1,\dots,4$ , $j=2$ for $i=5,\dots,8$ and $j=3$ for $i=9,10,$ and $\epsilon_{i}^{j}$ independent for $j=1,2,3$ and $i=1,\dots,10$ . This knowledge of the distributions of the hidden and observed variables allows us to compute the exact $10\times 10$ covariance matrix for the observed variables. The sparse PCA algorithm described above can then be applied to this covariance matrix. Table 8 presents the first two principal components computed using standard PCA, DDP, SDDP and DSPCA (corresponding to SDP). As the table illustrates, the loadings corresponding to the first two PCs computed using standard PCA are not sparse. All other methods (DDP, SDDP, and SDP) give exactly the same answer and correspond to a rank-one solution, i.e., the optimal sparse solution. As expected, the sparsity comes at the cost of a reduction in the variance explained by the PCs (see [106] for computation of the explained variance).

Next, we consider a larger-scale example. We generate five random $100\times 100$ covariance matrices of rank $4$ . Table 9 presents the running times, number of non-zero entries (NNZ) in the top principal component161616Note that the entries of the PCs were thresholded lightly in order to remove spurious non-zero elements arising from numerical inaccuracies in the solvers., optimal value of the program (24) (Opt.), and the explained variance (Expl.). The optimal values for DDP and SDDP upper bound the SDP solution as expected, and are quite close in value. The number of non-zero entries and explained variances are comparable across the different methods. The running times are between $1100$ and $2000$ times faster for DDP in comparison to SDP, and between $900$ and $1400$ times faster for SDDP. Hence the example illustrates that one can obtain a very large speedup with the approach presented here, with only a small sacrifice in quality of the approximate sparse PCs.

4.6 Applications in control theory

As a final application of the methods presented in this paper, we consider two examples from control theory and robotics. The applications of SOS programming in control theory are numerous and include the computation of regions of attraction of polynomial systems [78], feedback control synthesis [50], robustness analysis [100], and computation of the joint spectral radius for uncertain linear systems [77]. A thorough treatment of the application of the (S)DSOS approach to control problems is beyond the scope of this paper but can be found in a different paper [64], which is joint work with Tedrake. Here we briefly highlight two examples from [64] that we consider particularly representative.

Most of the applications in control theory mentioned above rely on SOS programming for checking Lyapunov inequalities that certify stability of a nonlinear system. As discussed in Section 1.1, one can compute (inner approximations of) the region of attraction (ROA) of a dynamical system $\dot{x}=f(x)$ by finding a Lyapunov function $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ that satisfies the following conditions:

[TABLE]

This guarantees that the set $\{x|\ V(x)\leq\beta\}$ is a subset of the ROA of the system, i.e. initial conditions that begin in this sublevel set converge to the origin. For polynomial dynamical systems, one can specify sufficient conditions for (25) using SOS programming [78]. By replacing the SOS constraints with (S)DSOS constraints, we can compute inner approximations to the ROA more efficiently. While the approximations will be more conservative in general, the ability to tradeoff conservatism with computation time and scalability is an important one in control applications.

4.6.1 Region of attraction for an inverted $N$ -link pendulum

As an illustration, we consider the problem of computing ROAs for the underactuated $N$ -link pendulum depicted in Figure 5. This system has $2N$ states $x=[\theta_{1},\dots,\theta_{N},\dot{\theta}_{1},\dots,\dot{\theta}_{N}]$ composed of the joint angles (with the vertical line) and their derivatives. There are $N-1$ control inputs (the joint closest to the base is not actuated). We take the unstable “upright” position of the system to be the origin of our state space and design a Linear Quadratic Regulator (LQR) controller in order to stabilize this equilibrium. A polynomial approximation of the dynamics of the closed loop system is obtained by a Taylor expansion of order $3$ . Using Lyapunov functions of degree 2 results in the time derivative of the Lyapunov function being quartic and hence yields dsos/sdsos/sos constraints on a polynomial of degree 4 in $2N$ variables.

Figures 5(a) and 5(b) compare projections of the ROAs computed for the system with $N=6$ onto two $2-$ dimensional subspaces of the state space. As the plots suggest, the ROA computed using SDSOS is only slightly more conservative than the ROA computed using SOS programming. In fact, comparing the volumes of the two ROAs, we find:

[TABLE]

Here, the volumes have been correctly rescaled in the standard manner by the dimension of the ambient space. The ROA computed using DSOS is much smaller (but still potentially useful in practice).

Comparing running times of the different approaches on this example, we find that the LP corresponding to the DSOS program takes 9.67 seconds, and the SOCP corresponding to the SDSOS program takes 25.9 seconds. The running time of the SOS program is 1526.5 seconds using MOSEK and 23676.5 seconds using SeDuMi. Hence, in particular DSOS is approximately $2500$ times faster than SOS using SeDuMi and $150$ times faster than SOS using MOSEK, while SDSOS is $900$ times faster in comparison to SeDuMi and $60$ times faster than MOSEK for SOS.

Of course the real benefit of our approach is that we can scale to problems where SOS programming ceases to run due to memory/computation constraints. Table 10 illustrates this ability by comparing running times of the programs obtained using our approach with SOS programming for different values of $2N$ (number of states). As expected, for cases where the SOS programs do run, the DSOS and SDSOS programs are significantly faster. Further, the SOS programs obtained for $2N>12$ are too large to run (due to memory constraints). In contrast, our approach allows us to handle almost twice as many states.

4.6.2 Control synthesis for a humanoid robot

In our final example, we highlight a robotics application considered in joint work with Tedrake in [64]. Control of humanoid robots is an important problem in robotics and presents a significant challenge due to the nonlinear dynamics of the system and high dimensionality of the state space. Here we show how the approach described in this paper can be used to design a balancing controller for a model of the ATLAS robot shown in Figure 6. This robot was designed and built by Boston Dynamics Inc. and was used for the 2015 DARPA Robotics Challenge.

Our model of the robot is based on physical parameters of the hardware platform and has $30$ states and $14$ inputs. The task considered here is to balance the robot on its right toe. The balancing controller is constructed by searching for both a quadratic Lyapunov function and a linear feedback control law in order to maximize the size of the resulting region of attraction. The Lyapunov conditions are imposed using SDSOS programming. We Taylor expand the dynamics about the equilibrium to degree $3$ in order to obtain polynomial dynamics. The total computation time is approximately $22.5$ minutes. We note that SOS programming is unable to handle this system due to memory (RAM) constraints.

Figure 7 demonstrates the performance of the resulting controller from SDSOS programming by plotting initial configurations of the robot that are stabilized to the fixed point. As the plot illustrates, the controller is able to stabilize a very wide range of initial conditions. A video of simulations of the closed loop system started from different initial conditions is available online at http://youtu.be/lmAT556Ar5c.

5 Improvements on DSOS and SDSOS optimization

While DSOS and SDSOS techniques result in significant gains in terms of solving times and scalability, they inevitably lead to some loss in solution accuracy when compared to the SOS approach. In this section, we briefly outline three possible strategies to mitigate this loss. The first two (Sections 5.1 and 5.2) generate sequences of linear and second order cone programs, while the third (Section 5.3) works with “small” (fixed-size) semidefinite programs. The reader may recall that the r-DSOS and r-SDSOS hierarchies of Section 3.2 could also be used to improve on DSOS and SDSOS techniques. Like the first two methods that we present here, they did so while staying in the realm of LP and SOCP. They differ from these methods on two crucial points however: (i) they are more expensive to implement, and (ii) they approximate the cone of SOS polynomials “blindly” irrespective of a particular objective function. This is in contrast to the methods that we present in Section 5.1 and 5.2, which approximate the cone in the direction of a specific linear objective function.

For brevity of exposition, we explain how all three strategies can be applied to approximate a generic semidefinite program:

[TABLE]

A treatment tailored to the case of sum of squares programs can be found in the references we provide.

5.1 Iterative change of basis

In [2], Ahmadi and Hall build on the notions of diagonal and scaled diagonal dominance to provide a sequence of improving inner approximations to the cone $P_{n}$ of psd matrices in the direction of the objective function of an SDP at hand. The idea is simple: define a family of cones171717One can think of $DD(U)$ as the set of matrices that are dd after a change of coordinates via the matrix $U$ .

[TABLE]

parametrized by an $n\times n$ matrix $U$ . Optimizing over the set $DD(U)$ is an LP since $U$ is fixed, and the defining constraints are linear in the coefficients of the two unknown matrices $M$ and $Q$ . Furthermore, the matrices in $DD(U)$ are all psd; i.e., $\forall U,$ $DD(U)\subseteq P_{n}$ .

The proposal in [2] is to solve a sequence of LPs, indexed by $k$ , by replacing the condition $X\succeq 0$ by $X\in DD(U_{k})$ :

[TABLE]

The sequence of matrices $\{U_{k}\}$ is defined as follows

[TABLE]

where $X_{k}$ is an optimal solution to the LP in (28).

Note that the first LP in the sequence optimizes over the set of diagonally dominant matrices. By defining $U_{k+1}$ as a Cholesky factor of $X_{k}$ , improvement of the optimal value is guaranteed in each iteration. Indeed, as $X_{k}=U_{k+1}^{T}IU_{k+1}$ , and the identity matrix $I$ is diagonally dominant, we see that $X_{k}\in DD(U_{k+1})$ and hence is feasible for iteration $k+1$ . This implies that the optimal value at iteration $k+1$ is at least as good as the optimal value at the previous iteration; i.e., $DSOS_{k+1}\leq DSOS_{k}$ (in fact, the inequality is strict under mild assumptions; see [2]).

In an analogous fashion, one can construct a sequence of SOCPs that inner approximate $P_{n}$ with increasing quality. This time, we define a family of cones

[TABLE]

parameterized again by an $n\times n$ matrix $U$ . For any $U$ , optimizing over the set $SDD(U)$ is an SOCP and we have $SDD(U)\subseteq P_{n}$ . This leads us to the following iterative SOCP sequence:

[TABLE]

Assuming existence of an optimal solution $X_{k}$ at each iteration, we can define the sequence $\{U_{k}\}$ iteratively in the same way as was done in (29). Using similar reasoning, we have $SDSOS_{k+1}\leq SDSOS_{k}$ . In practice, the sequence of upper bounds $\{SDSOS_{k}\}$ approaches faster to the SDP optimal value than the sequence of the LP upper bounds $\{DSOS_{k}\}$ .

Figure 8 shows the improvement (in every direction) obtained just by a single iteration of this approach. The outer set in green in both subfigures is the feasible set of a randomly generated semidefinite program. The sets in black are the DD (left) and the SDD (right) inner approximations. What is shown in dark blue in both cases is the boundary of the improved inner approximation after one iteration. Note that the SOCP in particular fills up almost the entire spectrahedron in a single iteration.

We refer the interested reader to [2] for more details, in particular for an explanation of how the same techniques can be used (via duality) to outer approximate feasible sets of semidefinite programs.

5.2 Column generation

In [1], Ahmadi, Dash, and Hall design another iterative method for inner approximating the set of psd matrices using linear and second order cone programming. Their approach combines DSOS/SDSOS techniques with ideas from the theory of column generation in large-scale linear and integer programming. The high-level idea is to approximate the SDP in (27) by a sequence of LPs (parameterized by $t$ ):

[TABLE]

where $B_{1},\ldots,B_{t}$ are fixed psd matrices. These matrices are initialized to be the extreme rays of $DD_{n}$ (recall Lemma 3.5), i.e., all rank one matrices $v_{i}v_{i}^{T}$ , where the vector $v_{i}$ has at most two nonzero components, each equal to $\pm 1$ . Once this initial LP is solved, then one adds one (or sometimes several) new psd matrices $B_{j}$ to problem (31) and resolves. This process then continues. In each step, the new matrices $B_{j}$ are picked carefully to bring the optimal value of the LP closer to that of the SDP. Usually, the construction of $B_{j}$ involves solving a “pricing subproblem” (in terminology of the column generation literature), which adds appropriate cutting planes to the dual of (31); see [1] for more details.

The SOCP analogue of this process is similar. The SDP in (27) is inner approximated by a sequence of SOCPs (parameterized by $t$ ):

[TABLE]

where $V_{1},\ldots,V_{t}$ are fixed $n\times 2$ matrices. They are initialized as the set of matrices that have zeros everywhere, except for a 1 in the first column in position $j$ and a 1 in the second column in position $k\neq j$ . This gives exactly $SDD_{n}$ (via Lemma 3.10). In subsequent steps, one (or sometimes several) appropriately-chosen matrices $V_{i}$ are added to problem (33). These matrices are obtained by solving pricing subproblems and help bring the optimal value of the SOCP closer to that of the SDP in each iteration.

Figure 9 shows the improvement obtained by five iterations of this process on a randomly generated SDP, where the objective is to maximize a linear function in the northeast direction. 5

5.3 Factor-width $k$ matrices with $k>2$

A relevant generalization of sdd matrices is through the notion of factor-width, defined by Boman et al. [19]. The factor-width of a symmetric psd matrix $Q$ is the smallest integer $k$ for which $Q$ can be written as $Q=VV^{T}$ , where each column of $V$ contains at most $k$ nonzeros. Equivalently, the factor-width of $Q$ is the smallest $k$ for which $Q$ can be written as a sum of psd matrices that are nonzero only on a single $k\times k$ principal submatrix.

If we denote the cone of $n\times n$ symmetric matrices of factor-width $k$ by $FW^{k}_{n}$ , we have that $FW^{k}_{n}\subseteq P_{n}$ for all $k\in\{1,\ldots,n\}$ , $FW^{2}_{n}=SDD_{n}$ (cf. Lemma 3.10), and $FW^{n}_{n}=P_{n}$ . For values of $k\in\{2,\ldots,n-1\}$ , one gets an increasingly more accurate inner approximation to the set of psd matrices, all outperforming the approximation given by sdd matrices. We did not consider these cones in this paper as it is already known that for $k=3$ , a representation based on second order cone programming is not possible [36]. Nevertheless, working with $FW^{3}_{n}$ can be useful as it involves semidefinite constraints that are small ( $3\times 3$ ) and hence efficiently solvable. Unfortunately though, optimizing over $FW^{3}_{n}$ requires $O(n^{3})$ such semidefinite constraints which in many applications can be prohibitive. We believe that a promising future direction would be to use the column generation framework on Section 5.2 to work with a subset of the extreme rays of $FW^{k}_{n}$ (for small $k$ ) and generate additional ones on the fly based on problem structure.

Some initial experiments with $FW^{k}_{n}$ have been performed by Ding and Lim [32]. The authors also develop self-concordant barrier functions for these cones and show that any second order cone program can be written as an optimization problem over $FW^{2}_{n}$ , i.e., a scaled diagonally dominant program (recall the definition from Section 3.1). In related work, Permenter and Parrilo [82] consider optimization problems over $FW^{k}_{n}$ (and their dual cones) for facial reduction in semidefinite programming. We believe the notion of factor-width is likely to receive more attention in upcoming years in the theory of matrix optimization.

6 Conclusions

We have proposed more scalable alternatives to SOS programming by introducing inner approximations to the sum of squares cone in the form of the cones of dsos and sdsos polynomials. These cones can be optimized over using LP and SOCP respectively and afford us considerable gains in terms of scalability by trading off solution quality. Our numerical examples from a diverse range of applications including polynomial optimization, combinatorial optimization, statistics and machine learning, derivative pricing, control theory, and robotics demonstrate that with reasonable tradeoffs in solution quality, we can handle problem sizes that are significantly beyond the current capabilities of SOS programming (at least without exploiting structure or resorting to specialized solvers). In particular, we have shown that our approach is able to produce useful lower bounds on dense polynomial optimization problems with up to $70$ polynomial variables (with degree-4 polynomials), or design stabilizing controllers for realistic robotic control systems with as many as 30 state variables. The Supplementary Material of this paper provides a brief but complete tutorial on the toolbox that was used to generate the numerical results in the paper. In addition, since the extended abstract of this work first appeared, our techniques have been applied by other authors to areas such as conic optimization [21], distance geometry [31], portfolio selection in finance [10], and power engineering [95, 94].

On the theoretical front, we have shown that the (S)DSOS approach shares some of the theoretical asymptotic guarantees usually associated with SOS programming (such as certificates of nonnegativity of forms, arbitrarily tight approximation of copositive programs, and a converging hierarchy of lower bounds for polynomial optimization problems). Finally, in the last section of this paper, we reviewed recent approaches that were developed with the idea of bridging the gap between the (S)DSOS approach and the SOS one. One can obtain improved approximations using these methods at the cost of additional computation time. This raises as a natural research direction the possibility of quantifying these tradeoffs, either for one-shot or for adaptive approximation schemes.

To conclude, we would like to emphasize that a key benefit of our approach is that it can be applied in any application where SOS programming is used. Our hope is that the (S)DSOS approach may open up application areas that have previously been beyond reach due to the limitations in scalability of SOS programming. While it is true that improvements to SDP solver technology will always push the boundary of problems that are within the reach of SOS programming, we believe that concurrent improvements in LP and SOCP solver technology will always leave a range of additional problems that can be handled by relaxations such as ours. For example, we foresee exciting possibilities for real-time applications [5], [66] using technology for real-time linear and second order cone programming that is already coming to fruition [34], [67].

Acknowledgments

We would like to thank Pablo Parrilo for acquainting us with scaled diagonally dominant matrices, and Georgina Hall for many corrections and simplifications on the first draft of this work. Our gratitude extends to Russ Tedrake and the members of the Robot Locomotion Group at MIT for several insightful discussions, particularly around control applications. We thank Frank Permenter and Mark Tobenkin for their help with the numerical implementation of DSOS/SDSOS programs in SPOT, and Aida Khajavirad for running the experiments of Section 4.1 with BARON. We thank the Editorial Board of SIAGA and the anonymous referees for their constructive feedback and insightful comments. Finally, this work has benefited from questions and comments from several colleagues, among whom we gladly acknowledge Greg Blekherman, Stephen Boyd, Sanjeeb Dash, Etienne de Klerk, Jesus de Loera, Lijun Ding, João Gouveia, Didier Henrion, Jean Bernard Lasserre, Leo Liberti, Lek-Heng Lim, Bruce Reznick, James Saunderson, and Bernd Sturmfels.

7 Supplementary Material: Software Toolbox

A complete implementation of the code used to generate the numerical results presented in this paper was written using the Systems Polynomial Optimization Toolbox (SPOT) [68] and is freely available online181818Link to the software: https://github.com/anirudhamajumdar/spotless/tree/spotless_isos. The toolbox features efficient polynomial algebra and allows us to setup the large-scale LPs and SOCPs arising from our examples. Here we provide a brief introduction to the software. This is not meant as a comprehensive tutorial on the SPOT toolbox (for this one may refer to SPOT’s documentation191919Link to SPOT’s documentation: https://github.com/spot-toolbox/spotless/blob/master/doc/manual.pdf). Rather, the goal here is to provide an introduction sufficient for setting up and solving the DSOS and SDSOS programs in this paper. The code relevant for this purpose is included in a branch of SPOT that we have named iSOS (for “inside sum of squares”).

7.1 Installing SPOT

Download the software package from Github available here:

https://github.com/anirudhamajumdar/spotless/tree/spotless_isos

Next, start MATLAB and run the spot_install.m script. This script will setup the MATLAB path and compile a few mex functions. The user may wish to save the new MATLAB path for future use.

7.2 Variables and polynomials

Polynomials are defined and manipulated by the

@msspoly class of SPOT. In order to define a new variable, one can use the msspoly.m function:

[TABLE]

which creates the polynomial $p(x)=x$ . The argument to this function is the name of the created variable and is restricted to four characters chosen from the alphabet (lower case and upper case). A MATLAB vector of variables can be created by passing a second argument to the function:

[TABLE]

This will create a $n\times 1$ vector of variables that can be accessed using standard array indexing. Multivariate polynomials can then be constructed as follows:

[TABLE]

Variables can be manipulated and operated on using a variety of functions. These include standard arithmetic operations (e.g. addition, multiplication, dot product) and operations for manipulating vectors (e.g. concatenating, reshaping, transposing). Other useful functions include:

deg.m: Returns the total degree of a polynomial. If a second argument in the form of a msspoly is provided, the degree with respect to these variables is returned.

diff.m: Differentiates a polynomial (first argument) with respect to a set of msspoly variables (second argument). The result is the matrix of partial derivatives.

subs.m: Substitutes the third argument in place of the second argument wherever it appears in the first argument.

7.3 Programs

Programs involving DSOS/SDSOS/SOS constraints are constructed using the spotsosprog class. In this section, we demonstrate the workings of spotsosprog with the help of an example. In particular, we take the problem of minimizing a form on the unit sphere considered in this paper. The following example is also available in doc/examples/sdsos_example.m.

[TABLE]

This block of code constructs the polynomial that is to be lower bounded. In order to do this, we create a six dimensional vector x of variables using the msspoly command introduced before. Next, we construct a vector of monomials using the monomials function. The first input to this function is the msspoly variable over which the monomials are defined. The second input to the function is the range of degrees the monomials should have. In this case, since we are considering homogeneous quartics, the range is simply $4:4$ (i.e., just $4$ ).

[TABLE]

This block of code sets up the program and constraints. First, the program is initialized in the form of the variable prog. This object will contain information about constraints and decision variables. Next, we declare the variable x to be an indeterminate or abstract variable (thus distinguishing it from a decision variable). Decision variables are created using the newFree function in the class spotsosprog. The input to this function is the number of new decision variables to be created. The outputs are the updated program and a variable corresponding to the decision variable. In our case, we need a single variable gamma to be declared. Finally, we specify a DSOS constraint on p - gamma*(x’*x)^2 using the withDSOS command. There are corresponding functions withSDSOS and withSOS that can be used to setup SDSOS or SOS constraints.

[TABLE]

Finally, in this block of code, we dictate an options structure for the program. In particular, the field options.solveroptions contains options that are specific to the solver to be used (MOSEK in our case). The minimize command is used to specify the objective of the program (which must be linear in the decision variables), the function handle corresponding to the solver to be used, and the structure of options. Currently, MOSEK, Gurobi and SeDuMi are the supported solvers. So, for example, in order to use the Gurobi solver for a (S)DSOS program, we would specify the function handle @spot_gurobi.

The output of the minimize command is a solution structure, which contains diagnostic information and allows one to access the optimized decision variables. The last line of our example code demonstrates how to obtain the optimized variable gamma and convert it to a MATLAB double type.

7.4 Additional functionality

Next, we review additional functionality not covered in the example above. While this section is not meant to be an exhaustive list of all the functionality available in SPOT, it should be sufficient to reproduce the numerical results presented in this paper. In the lines of code presented in the following sections, it is assumed that prog is an object of the spotsosprog class, x is a msspoly variable of size n and an indeterminate variable of prog. These variables can be initialized as in the example above with the following lines of code:

[TABLE]

7.4.1 Creating decision variables

It is often useful to construct a polynomial whose coefficients are decision variables. This can be achieved with the newFree and monomials functions introduced above. In particular, one can create a vector of monomials in a msspoly variable x of a given degree d using the command v = monomials(x,0:d). Then, the set of coefficients can be declared using [prog,c] = prog.newFree(length(v)). Finally, one can obtain the desired polynomial by multiplying these two together: p = c’*v.

It is also possible to create other types of decision variables. For example, the following functions can be used to create various types of matrix decision variables:

newSym: New symmetric matrix.

newDD: New symmetric matrix constrained to be diagonally dominant.

newSDD: New symmetric matrix constrained to be scaled diagonally dominant.

newPSD: New symmetric positive semidefinite matrix.

newDDdual: New symmetric matrix constrained to lie in the dual of the cone of diagonally dominant matrices.

newSDDdual: New symmetric matrix constrained to lie in the dual of the cone of scaled diagonally dominant matrices.

The input to these functions is the size of the desired (square) matrix. For example, in order to create a $10\times 10$ symmetric matrix Q constrained to be diagonally dominant, one can use the following lines of code:

[TABLE]

7.4.2 Specifying constraints

In addition to the withDSOS, withSDSOS and withSOS functions introduced previously, constraints on existing decision variables can be specified using the following functions:

withEqs: Sets up constraints of the form expr = 0, where expr is the input to the function and is a matrix whose elements are to be constrained to be equal to 0.

withPos: Sets up constraints of the form expr $\geq$ 0, where expr is the input to the function and is a matrix whose elements are to be constrained to be nonnegative.

Note that the functions withEqs and withPos allow one to specify constraints in a vectorized manner, thus avoiding MATLAB’s potentially-slow for-loops.

withDD: Constrains a symmetric matrix (the input to the function) to be diagonally dominant.

withSDD: Constrains a symmetric matrix (the input to the function) to be scaled diagonally dominant.

withPSD: Constrains a symmetric matrix (the input to the function) to be positive semidefinite.

In each case above, the inputs must be affine functions of the decision variables of the program. The output of these functions is an updated spotsosprog object that contains the new constraints. We illustrate the use of the function withEqs with the help of a simple example. The other functions can be used in a similar manner.

[TABLE]

In this example, the elements of the $10\times 1$ decision vector tau1 are constrained to be equal to the elements of the vector tau2.

withPolyEqs: In order to constrain two polynomials $p_{1}(x)$ and $p_{2}(x)$ to be equal to each other for all $x$ , one may use the function withPolyEqs. This function will constrain the coefficients of the two polynomials to be equal. Here is a simple example:

[TABLE]

This will constrain c(1) + c(2) = 2 and c(3) = c(4).

7.4.3 Checking if a polynomial is dsos

The three utility functions isDSOS, isSDSOS and isSOS allow one to check if a given polynomial is dsos, sdsos, or sos respectively. The only input to the functions is the polynomial to be checked. The outputs are a boolean variable indicating whether the polynomial is in fact dsos/ sdsos/ sos, the Gram matrix, and the monomial basis corresponding to the Gram matrix (these last two are non-empty only if the given polynomial is in fact dsos/sdsos/sos). The following lines of code provide a simple example:

[TABLE]

One can check that the polynomial p - v’Qv is the zero polynomial (up to numerical tolerances).

Bibliography106

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. A. Ahmadi, S. Dash, and G. Hall , Optimization over structured subsets of positive semidefinite matrices via column generation , Discrete Optimization, 24 (2017), pp. 129–151.
2[2] A. A. Ahmadi and G. Hall , Sum of squares basis pursuit with linear and second order cone programming , Contemporary Mathematics, (2017).
3[3] A. A. Ahmadi and G. Hall , On the construction of converging hierarchies for polynomial optimization based on certificates of global positivity , Mathematics of Operations Research, (2018). To appear. Online version available at https://arxiv.org/abs/1709.09307 .
4[4] A. A. Ahmadi and A. Majumdar , DSOS and SDSOS optimization: LP and SOCP-based alternatives to sum of squares optimization , in preceedings of the 48th annual Conference on Information Sciences and Systems (CISS), IEEE, 2014.
5[5] A. A. Ahmadi and A. Majumdar , Some applications of polynomial optimization in operations research and real-time decision making , Optimization Letters, 10 (2016), pp. 709–729.
6[6] A. A. Ahmadi and A. Majumdar , Response to “Counterexample to global convergence of DSOS and SDSOS hierarchies” , ar Xiv preprint ar Xiv:1710.02901, (2017).
7[7] F. Alizadeh and D. Goldfarb , Second-order cone programming , Mathematical programming, 95 (2003), pp. 3–51.
8[8] E. Artin , Über die Zerlegung Definiter Funktionen in Quadrate , Hamb. Abh., 5 (1927), pp. 100–115.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

DSOS and SDSOS Optimization:

Abstract

keywords:

1 Introduction

1.1 Why optimize over nonnegative polynomials?

Polynomial optimization

Combinatorial optimization

Control systems and robotics

Statistical regression with shape constraints

Probability bounds given moments

Other applications

Organization of the paper

2 Review of the semidefinite programming-based approach and computational considerations

Theorem 2.1** (see, e.g., [78],[79]).**

3 DSOS and SDSOS Optimization

3.1 The cones of dsos and sdsos polynomials

Definition 3.1**.**

Definition 3.2**.**

Definition 3.3**.**

Theorem 3.4**.**

Lemma 3.5** (Barker and Carlson [11]).**

Proof 3.6** (Proof of Theorem 3.4).**

Theorem 3.7**.**

Proof 3.8**.**

Theorem 3.9** (see theorems 8 and 9 by Boman et al. [19]).**

Lemma 3.10**.**

Proof 3.11**.**

Theorem 3.12**.**

Proof 3.13**.**

3.2 The cone of r-dsos and r-sdsos polynomials and asymptotic guarantees

Definition 3.14**.**

Theorem 3.15**.**

Theorem 3.16**.**

Proof 3.17**.**

Proposition 3.18**.**

Proof 3.19**.**

Theorem 3.20** (see Section 4 of [3]).**

4 Numerical examples and applications

4.1 Lower bounds on polynomial optimization problems

4.2 Copositive programming and combinatorial optimization

4.3 Convex regression in statistics

4.4 Options pricing

4.5 Sparse PCA

4.6 Applications in control theory

4.6.1 Region of attraction for an inverted NNN-link pendulum

4.6.2 Control synthesis for a humanoid robot

5 Improvements on DSOS and SDSOS optimization

5.1 Iterative change of basis

5.2 Column generation

5.3 Factor-width kkk matrices with k>2k>2k>2

6 Conclusions

Acknowledgments

7 Supplementary Material: Software Toolbox

7.1 Installing SPOT

7.2 Variables and polynomials

7.3 Programs

7.4 Additional functionality

7.4.1 Creating decision variables

7.4.2 Specifying constraints

7.4.3 Checking if a polynomial is dsos

Theorem 2.1 (see, e.g., [78],[79]).

Definition 3.1.

Definition 3.2.

Definition 3.3.

Theorem 3.4.

Lemma 3.5 (Barker and Carlson [11]).

Proof 3.6 (Proof of Theorem 3.4).

Theorem 3.7.

Proof 3.8.

Theorem 3.9 (see theorems 8 and 9 by Boman et al. [19]).

Lemma 3.10.

Proof 3.11.

Theorem 3.12.

Proof 3.13.

Definition 3.14.

Theorem 3.15.

Theorem 3.16.

Proof 3.17.

Proposition 3.18.

Proof 3.19.

Theorem 3.20 (see Section 4 of [3]).

4.6.1 Region of attraction for an inverted $N$ -link pendulum

5.3 Factor-width $k$ matrices with $k>2$