Convexification of box-constrained polynomial optimization problems via   monomial patterns

Gennadiy Averkov; Benjamin Peters; Sebastian Sager

arXiv:1901.05675·math.OC·September 29, 2021

Convexification of box-constrained polynomial optimization problems via monomial patterns

Gennadiy Averkov, Benjamin Peters, Sebastian Sager

PDF

Open Access

TL;DR

This paper introduces a unified framework for convexifying box-constrained polynomial optimization problems using monomial relaxations, balancing computational cost and relaxation tightness, with promising experimental results.

Contribution

It develops a novel convexification strategy that unifies nonlinear programming and positivity certificate approaches within a monomial relaxation framework.

Findings

01

The method effectively balances relaxation quality and computational effort.

02

Computational experiments demonstrate promising results.

03

The framework offers a flexible trade-off between bound tightness and computational cost.

Abstract

Convexification is a core technique in global polynomial optimization. Currently, there are two main approaches competing in theory and practice: the approach of nonlinear programming and the approach based on positivity certificates from real algebra. The former are comparatively cheap from a computational point of view, but typically do not provide tight relaxations with respect to bounds for the original problem. The latter are typically computationally expensive, but do provide tight relaxations. We embed both kinds of approaches into a unified framework of monomial relaxations. We develop a convexification strategy that allows to trade off the quality of the bounds against computational expenses. Computational experiments show very encouraging results.

Figures13

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Methods and relaxations to be compared in Figs. 7 , 8 , 9 , 10 , and 12 .

Label	Description
(B)	Reference solution: To approximate $ω_{ℳ {(K)}_{A}} (𝐟)$ we use the best upper bound for $\max_{𝐱 \in K} f (𝐱)$ and the best lower bound for $\min_{𝐱 \in K} f (𝐱)$ that BARON returns within a CPU time limit of $1000$ seconds each.
(R)	Root node relaxation of the BARON solver.
(CS)	Reference solution obtained from solver CS-TSSOS.
(Y)	Reference solution obtained from YALMIP’s sos method.
(SOS)	Self-implemented sos relaxation (that does not exploit sparsity) of the lowest hierarchy level.
(M)	Relaxation based on the multilinear patterns $ℱ_{A}^{m}$ , which consists of the inclusion-maximal elements of ${ML (α, {0, 1}^{n}) : α \in A \ {0}} .$
(S)	Relaxation based on a family of shifted chains $ℱ_{A}^{s}$ , which consists of the inclusion-maximal elements of {η+CH(e^i,d) : d∈2N\{0}, η∈N^n, i∈[n]}, that satisfy $# (η + CH (γ, d)) \cap A \geq 2$ and #(η+CH(γ,d))∩A¿ #(η+CH(γ,d-1))∩A. The latter conditions ensure that each shifted chains contains at least two exponents from $A$ and that we cannot include more exponents from $A$ if we choose a bigger $d$ .
(C)	Relaxation based on a family of chains $ℱ_{A}^{c}$ , which consists of the inclusion-maximal elements of {CH(γ,d) : d∈2N\{0}, γ∈N^n}, that satisfy $# CH (γ, d) \cap A \geq 2$ and $# CH (γ, d) \cap A > # CH (γ, d - 1) \cap A .$
(MC)	Relaxation based on a family of multilinear patterns, chains and shifted chains, F^mc_A:= F^m_A∪F^c_¯A ∪F^s_~A where $\bar{A} := A_{ℱ_{A}^{m}}$ , $\tilde{A} := A_{ℱ_{A}^{m} \cup ℱ_{\bar{A}}^{c}}$ .
(H)	Let $d (A) := \max ({α_{i} : α \in A, i \in [n]})$ and $Γ := {𝟏, 𝐞^{1}, \dots, 𝐞^{n}}$ . A relaxation based on the family F^h_A: = {CH(γ,d(A)) : γ∈Γ}∪F^m_ {CH(γ,d(A)) : γ∈Γ}∪F^m_A, which uses $n + 1$ chains that are linked by $d (A)$ multilinear patterns to strengthen $ℱ_{A}^{m}$ .
(T)	Let $d_{1} := 2 \cdot ⌈ \deg (A) / 2 ⌉$ , $d_{2} := 2 \cdot ⌈ \deg (A) / 4 ⌉$ and $Γ := (2 𝐞^{1}, \dots, 2 𝐞^{n})$ . A relaxation based on the family $ℱ_{A}^{t}$ , which consists of the inclusion-maximal elements of {TS((e^i)_i∈supp(α),d_1) : α∈A\TS(Γ,d_2) }∪{TS(Γ,d_2)}. Here, ${(𝐞^{i})}_{i \in supp (α)}$ is a matrix with columns $𝐞^{i}, i \in supp (α)$ . The family $ℱ_{A}^{t}$ uses $k$ -variate truncated submonoids with $k \leq d_{1}$ to cover the exponents in $A$ and connects these chains using one $n$ -variate truncated submonoid.

Equations96

x^{α} := x_{1}^{α_{1}} \cdot \dots \cdot x_{n}^{α_{n}}

x^{α} := x_{1}^{α_{1}} \cdot \dots \cdot x_{n}^{α_{n}}

⟨ v, w ⟩ := α \in A \cap B \sum v_{α} w_{α} .

⟨ v, w ⟩ := α \in A \cap B \sum v_{α} w_{α} .

diam (X) := u, z \in X max ∥ u - z ∥_{1},

diam (X) := u, z \in X max ∥ u - z ∥_{1},

ω_{X} (c) := max {⟨ c, u ⟩ : u \in X} - min {⟨ c, u ⟩ : u \in X}

ω_{X} (c) := max {⟨ c, u ⟩ : u \in X} - min {⟨ c, u ⟩ : u \in X}

supp (v) := {α \in A : v_{α} \neq = 0} and supp (X) := v \in X ⋃ supp (v) .

supp (v) := {α \in A : v_{α} \neq = 0} and supp (X) := v \in X ⋃ supp (v) .

m (x)_{A} := (x^{α})_{α \in A} .

m (x)_{A} := (x^{α})_{α \in A} .

\underline{x}_{K}^{α} := x \in K min x^{α} and \overline{x}_{K}^{α} := x \in K max x^{α},

\underline{x}_{K}^{α} := x \in K min x^{α} and \overline{x}_{K}^{α} := x \in K max x^{α},

\displaystyle\begin{array}[]{clll}\operatorname{minimize}&\lx@intercol f(\mathbf{x})\hfil\lx@intercol\\ \operatorname{for}&\mathbf{x}&\in\mathbb{R}^{\mathrm{n}}\\ \operatorname{subject}\,\operatorname{to}&\mathbf{x}&\in\mathrm{K}.\end{array}

\displaystyle\begin{array}[]{clll}\operatorname{minimize}&\lx@intercol f(\mathbf{x})\hfil\lx@intercol\\ \operatorname{for}&\mathbf{x}&\in\mathbb{R}^{\mathrm{n}}\\ \operatorname{subject}\,\operatorname{to}&\mathbf{x}&\in\mathrm{K}.\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathrm{A}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\in\left\{\operatorname{m}(\mathbf{x})_{\mathrm{A}}:\mathbf{x}\in\mathrm{K}\right\}.\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathrm{A}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\in\left\{\operatorname{m}(\mathbf{x})_{\mathrm{A}}:\mathbf{x}\in\mathrm{K}\right\}.\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathrm{A}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\in\mathcal{M}(\mathrm{K})_{\mathrm{A}}.\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathrm{A}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\in\mathcal{M}(\mathrm{K})_{\mathrm{A}}.\end{array}

v_{P} \in M (K)_{P} for P \in F,

v_{P} \in M (K)_{P} for P \in F,

A \subseteq P \in F ⋃ P .

A \subseteq P \in F ⋃ P .

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathrm{A}_{\mathcal{F}}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}_{\mathrm{P}}&\in\mathcal{M}(\mathrm{K})_{\mathrm{P}}\,\operatorname{for}\,\operatorname{all}\,\mathrm{P}\in\mathcal{F}.\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathrm{A}_{\mathcal{F}}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}_{\mathrm{P}}&\in\mathcal{M}(\mathrm{K})_{\mathrm{P}}\,\operatorname{for}\,\operatorname{all}\,\mathrm{P}\in\mathcal{F}.\end{array}

m (K)_{A} \xlongrightarrow convexifying M (K)_{A} \xlongrightarrow embedding M (K)_{A_{F}} \xlongrightarrow projecting M (K)_{P}

m (K)_{A} \xlongrightarrow convexifying M (K)_{A} \xlongrightarrow embedding M (K)_{A_{F}} \xlongrightarrow projecting M (K)_{P}

ML (α, I)

ML (α, I)

M (K)_{ML (α, I)} = conv (m (V)_{I})

M (K)_{ML (α, I)} = conv (m (V)_{I})

P^{α} := ML (α, {α} \cup {e^{i} : i \in supp (α)}) .

P^{α} := ML (α, {α} \cup {e^{i} : i \in supp (α)}) .

v_{P^{α}} \in M (K)_{P^{α}} for all, α \in A

v_{P^{α}} \in M (K)_{P^{α}} for all, α \in A

v_{P} \in M (K)_{P} for all P \in F_{rec}^{α}, α \in A

v_{P} \in M (K)_{P} for all P \in F_{rec}^{α}, α \in A

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\tilde{\mathbf{f}},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\tilde{\mathrm{A}}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\in\mathcal{M}(\operatorname{Box}(\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma}))_{\tilde{\mathrm{A}}}.\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\tilde{\mathbf{f}},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\tilde{\mathrm{A}}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\in\mathcal{M}(\operatorname{Box}(\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma}))_{\tilde{\mathrm{A}}}.\end{array}

A_{1}

A_{1}

A_{2}

A_{3}

A_{4}

A_{5}

A_{6}

A_{ex}

F^{α, β} (x) := i \in [n] \prod (x_{i} - l_{i})^{α_{i} - β_{i}} (u_{i} - x_{i})^{β_{i}}

F^{α, β} (x) := i \in [n] \prod (x_{i} - l_{i})^{α_{i} - β_{i}} (u_{i} - x_{i})^{β_{i}}

L F^{α, β} (v) \geq 0 for all β \in BF (α)

L F^{α, β} (v) \geq 0 for all β \in BF (α)

M (K)_{A}

M (K)_{A}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathbb{N}^{\mathrm{n}}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\text{is a moment sequence of a probability measure on $\mathrm{K}$.}\end{array}

\displaystyle\begin{array}[]{cl@{\,}l@{\,}l}\operatorname{minimize}&\langle&\lx@intercol\mathbf{f},\mathbf{v}\,\rangle\hfil\lx@intercol\\ \operatorname{for}&&\mathbf{v}&\in\mathbb{R}^{\mathbb{N}^{\mathrm{n}}}\\ \operatorname{subject}\,\operatorname{to}&&\mathbf{v}&\text{is a moment sequence of a probability measure on $\mathrm{K}$.}\end{array}

M_{k} (g, v) := γ \in N^{n} \sum g_{γ} v_{γ + α + β}_{α, β \in N_{k}^{n}} and M_{k} (v) := M_{k} (1, v) .

M_{k} (g, v) := γ \in N^{n} \sum g_{γ} v_{γ + α + β}_{α, β \in N_{k}^{n}} and M_{k} (v) := M_{k} (1, v) .

{x \in R^{n} : s^{0} (x) + i \in [m] \sum s^{i} (x) g^{i} (x) \geq 0}

{x \in R^{n} : s^{0} (x) + i \in [m] \sum s^{i} (x) g^{i} (x) \geq 0}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Polynomial and algebraic computation · Numerical Methods and Algorithms

Full text

\newsiamremark

hypothesisHypothesis

\newsiamthmclaimClaim

\headersConvexification via monomial patternsA. Averkov, B. Peters, and S. Sager

\externaldocumentex_supplement

Convexification of box-constrained polynomial optimization problems via monomial patterns ††thanks:

\fundingThis work was funded by the the Deutsche Forschungs gemeinschaft (DFG, German Research Foundation) - 314838170, GRK 2297 MathCoRe.

Gennadiy Averkov Fakultät 1, Brandenburgische Technische Universität Cottbus-Senftenberg, Germany (). [email protected]

Benjamin Peters Fakultät für Mathematik, Otto-von-Guericke Universität Magdeburg, Germany (, ). [email protected]

[email protected]

Sebastian Sager33footnotemark: 3

Abstract

Convexification is a core technique in global polynomial optimization. Currently, there are two main approaches competing in theory and practice: the approach of nonlinear programming and the approach based on positivity certificates from real algebra. The former are comparatively cheap from a computational point of view, but typically do not provide tight relaxations with respect to bounds for the original problem. The latter are typically computationally expensive, but do provide tight relaxations. We embed both kinds of approaches into a unified framework of monomial relaxations. We develop a convexification strategy that allows to trade off the quality of the bounds against computational expenses. Computational experiments show very encouraging results.

keywords:

Convexification, McCormick envelopes, moment problem, nonlinear optimization, polynomial optimization, sum-of-squares, sparsity

{AMS}

68Q25, 68R10, 68U05

1 Introduction

Many important convexification techniques applied to polynomial optimization problems share the following common distinctive features: in the case of a problem in $\mathrm{n}$ variables $\mathbf{x}=(\mathrm{x}_{1},\ldots,\mathrm{x}_{\mathrm{n}})$ , monomials

[TABLE]

with $\alpha\in\mathbb{N}^{\mathrm{n}}$ are substituted with monomial variables $\mathrm{v}_{\alpha}$ and the relationships among them are captured, exactly or in a relaxed fashion, by systems of convex constraints. In order to describe the relationship between different monomial variables by constraints one needs to introduce additional auxiliary monomial variables.

Different approaches exist on how to pick these auxiliary monomial variables and the respective convex constraints. The “nonlinear optimization community” uses monomial variables and constraints such that the resulting relaxations are cheap to compute. The resulting poor lower bounds are compensated by solving many relaxations within a branch-and-bound framework. The “polynomial optimization community” usually aims to solve only one single relaxation, which often produces a very tight bound. This comes at the price of a large number of monomial variables and hard constraints. Interestingly, up to now there has been little interaction between the two different schools of thought. The authors believe that a major reason is the lack of a mathematical formalism that would allow a uniform description of different convexification techniques.

One contribution of this paper is the introduction of the notion of patterns to fill this gap. Patterns are finite sets $\mathrm{P}\subseteq\mathbb{N}^{\mathrm{n}}$ of exponent vectors that are chosen in such a way that the monomial variables $\mathrm{v}_{\alpha}$ indexed by $\alpha\in\mathrm{P}$ can be linked by constraints that satisfy a given demand on the computability. While various kinds of patterns have been implicitly used by the disjoint research communities, the introduction of the explicit notion of patterns allows for the development of a unifying mathematical language that highlights common ideas. Promoting the elementary notion of patterns enables to see similarities of the different research directions and will help to connect different communities that work independently on the same problems.

For example, the pattern $\{(1,0),(0,1),(1,1)\}$ corresponds to the well-known McCormick envelope [31, 7], i.e. the convexification of the variables $\mathrm{x}_{1}$ and $\mathrm{x}_{2}$ and their product $\mathrm{x}_{1}\mathrm{x}_{2}$ . Other examples of methods that can be expressed using the notion of patterns are truncated moment relaxation and its dual the sum-of-squares relaxation [2, 21, 28], scaled-diagonally-dominant sums of squares [1], sums of non-negative circuit polynomials [13, 38], bound-factor products [10] and their dual Handelman’s hierarchy [17], multilinear intermediates [4], polyhedral outer approximations [43] as well as expression trees [40, 39]. We propose a flexible template for the relaxation of box-constrained polynomial optimization problems (pop) that allows to use the ideas of these until now largely disjoint schools of thought. It allows to combine different types of patterns to build convex relaxations of a pop. Our new and more general point of view might also help to understand numerical issues and the facial structures of feasible sets in the aforementioned convexification approaches. This, in turn, can be expected to have a positive impact on the improvement of existing and on the development of novel approaches to polynomial optimization.

We address in this paper the case of box-constrained pops. In nonlinear global optimization, convexification of expressions occurring in constraints and objective functions (with the underlying variables in specified finite ranges) is a widely used technique. Since objective functions and constraints can be convexified by the same principles, one could also use our strategy for more general versions of polynomial optimization, with more general sets of constraints. On the other hand, developing our strategy into a sound method for general polynomial optimization would require more thought and ideas. Therefore, this is out of scope for this paper. Furthermore, box-constrained subproblems are an essential part of branch-and-bound frameworks such as employed in BARON [36]. Thus, in the future one can also try to employ our method developed for box-constrained pops with more general constraints by using them within branch-and-bound frameworks.

We derive various new convexification techniques from the monomial pattern template. The resulting relaxations can be solved by a variety of different numerical approaches. In the interest of analyzing the tightness and computational expenses related to different convexification strategies, we use the interior point solver MOSEK.

The paper is organized as follows. The basic notation is given in Section 2. In Section 3 the notion of the pattern relaxation is introduced and the separation problem for patterns is formulated as an optimization problem. Section 4 is dedicated to the interpretation and discussion of established convexification techniques as monomial patterns. Multilinear envelopes are generalized as multilinear patterns. In Section 5 new pattern types are introduced, which give rise to new algorithmic approaches to pop. Computational results in Section 6 highlight the benefits of our novel approach. Finally, a conclusion is given in Section 7.

2 Basic Notation

$\mathbb{N}$ is the set of natural numbers including zero. For integers $\mathrm{n}>0$ and $\mathrm{d}\geq 0$ we define $\mathbb{N}^{\mathrm{n}}_{\mathrm{d}}:=\left\{\alpha\in\mathbb{N}^{\mathrm{n}}:\alpha_{1}+\dots+\alpha_{\mathrm{n}}\leq\mathrm{d}\right\}$ , $[\mathrm{n}]:=\{1,\dots,\mathrm{n}\}$ and $[\mathrm{n}]_{0}:=[\mathrm{n}]\cup\{0\}$ . Let $\mathrm{A},\mathrm{B}\subseteq\mathbb{N}^{\mathrm{n}}$ be nonempty, finite sets with cardinalities $\#\mathrm{A}$ and $\#\mathrm{B}$ . We denote vectors of real numbers with entries indexed by the elements of set $A$ as $\mathbf{v}=(\mathrm{v}_{\alpha})_{\alpha\in\mathrm{A}}\in\mathbb{R}^{\mathrm{A}}$ . Note that $\mathbb{R}^{\mathrm{A}}$ is isomorphic to $\mathbb{R}^{\#\mathrm{A}}$ . We define the bilinear product of two such vectors $\mathbf{v}=(\mathrm{v}_{\alpha})_{\alpha\in\mathrm{A}}\in\mathbb{R}^{\mathrm{A}}$ and $\mathbf{w}=(\mathrm{w}_{\alpha})_{\alpha\in\mathrm{B}}\in\mathbb{R}^{\mathrm{B}}$ as

[TABLE]

Furthermore, if $\mathrm{B}\subseteq\mathrm{A}$ , we define the coordinate projection of $\mathbf{v}$ onto components indexed by $\mathrm{B}$ as $\mathbf{v}_{\mathrm{B}}:=(\mathrm{v}_{\alpha})_{\alpha\in\mathrm{B}}.$ The $\ell_{1}$ and $\ell_{\infty}$ norms of $\mathbf{v}$ are $\|\mathbf{v}\|_{1}$ and $\|\mathbf{v}\|_{\infty}$ , respectively. Let $\mathrm{X}\subseteq\mathbb{R}^{\mathrm{A}}$ be a nonempty and compact set. We call

[TABLE]

the diameter of $\mathrm{X}$ and

[TABLE]

the width function of $\mathrm{X}$ in direction $\mathbf{c}$ . We define the support of a vector $\mathbf{v}$ and the support of a set $\mathrm{X}$ as

[TABLE]

$\mathbf{v}$ is said to have full support if $\operatorname{supp}(\mathbf{v})=\mathrm{A}$ . The standard basis vectors of $\mathbb{R}^{\mathrm{A}}$ are denoted by $\mathbf{e}^{\alpha}$ for $\alpha\in A$ and the all ones vector by $\mathbf{1}$ . $\mathbb{R}[\mathbf{x}]$ is the ring of polynomials in a vector of $\mathrm{n}$ intermediates $\mathbf{x}=(\mathrm{x}_{1},\dots,\mathrm{x}_{\mathrm{n}})$ and $\mathbb{R}[\mathbf{x}]_{\mathrm{A}}$ the set of polynomials $f(\mathbf{x})=\sum_{\alpha\in\mathrm{A}}\mathrm{f}_{\alpha}\mathbf{x}^{\alpha}$ . That is, by $\mathrm{A}$ we prescribe which monomials can occur in $f$ . The vector $\mathbf{f}=(\mathrm{f}_{\alpha})_{\alpha}$ is called the coefficient vector of $f$ . The monomial support of a polynomial $f$ is $\operatorname{supp}(f)=\operatorname{supp}(\mathbf{f})$ . A polynomial $p\in\mathbb{R}[\mathbf{x}]$ is called sum-of-squares (sos), if $p=(p^{1})^{2}+\dots+(p^{\mathrm{k}})^{2}$ for finitely many polynomials $p^{1},\ldots,p^{\mathrm{k}}\in\mathbb{R}[\mathbf{x}]$ . We use $\Sigma_{n,2d}$ to denote the cone of $\mathrm{n}$ -variate sos of degree at most $2\mathrm{d}$ . The $\mathrm{A}$ -truncated moment vector map is

[TABLE]

The minimum and maximum of the monomial $\mathbf{x}^{\alpha},\alpha\in\mathrm{A}$ , over a compact set $\mathrm{K}\subseteq\mathbb{R}^{\mathrm{n}}$ are

[TABLE]

respectively, $\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{A}:=(\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\alpha})_{\alpha\in\mathrm{A}}$ and $\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\mathrm{A}}:=(\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\alpha})_{\alpha\in\mathrm{A}}.$ The degree of the set $\mathrm{A}$ is $\deg(\mathrm{A}):=\max\{\|\alpha\|_{1}:\alpha\in\mathrm{A}\}.$ For vectors we understand notions like $<,\leq,>,\geq$ componentwise. Let $\mathbf{l},\mathbf{u}\in\mathbb{R}^{\mathrm{n}}$ with $\mathbf{l}<\mathbf{u}$ , we define the box as $\operatorname{Box}(\mathbf{l},\mathbf{u}):=[\mathrm{l}_{1},\mathrm{u}_{1}]\times\dots\times[\mathrm{l}_{\mathrm{n}},\mathrm{u}_{\mathrm{n}}]\subseteq\mathbb{R}^{\mathrm{n}}.$ We use $psd$ to abbreviate positive semidefinite.

3 Pattern Relaxation

3.1 Monomial Convexification and Monomial Relaxation

Let $\mathrm{A}\subset\mathbb{N}^{\mathrm{n}}$ be a finite and nonempty set and $\mathbf{l},\mathbf{u}\in\mathbb{R}^{\mathrm{n}}$ be given. We consider the problem of minimizing a polynomial $f\in\mathbb{R}[\mathbf{x}]_{\mathrm{A}}$ over the box $\mathrm{K}:=\operatorname{Box}(\mathbf{l},\mathbf{u})$ , i.e.

[TABLE]

Via lifting, we reformulate (4) as an optimization problem in $\mathbb{R}^{\mathrm{A}}$ with a linear objective:

[TABLE]

Replacing the feasible set by its convex hull $\mathcal{M}(\mathrm{K})_{\mathrm{A}}:=\operatorname{conv}(\left\{\operatorname{m}(\mathbf{x})_{\mathrm{A}}:\mathbf{x}\in\mathrm{K}\right\})$ yields the monomial convexification of (4):

[TABLE]

We refer to $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ as a ( $\mathrm{n}$ -variate) moment body. Clearly, the convexification (8) of (4) is tight, that is, the optimal values of (8) and (4) coincide. For general sets $\mathrm{A}$ , the constraint $\mathbf{v}\in\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ is difficult to verify. Thus, it is natural to relax $\mathbf{v}\in\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ to a system of simpler constraints of the same type

[TABLE]

where $\mathcal{F}$ is a finite family of finite subsets of $\mathbb{N}^{\mathrm{n}}$ that satisfies

[TABLE]

Our intention is to cover $\mathrm{A}$ by sets $\mathrm{P}\in\mathcal{F}$ such that the corresponding moment bodies $\mathcal{M}(\mathrm{K})_{\mathrm{P}}$ yield more structure that we can exploit algorithmically than the original moment body $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ . We call $\mathrm{P}\in\mathcal{F}$ a pattern and (9) a pattern relaxation of $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ with respect to the pattern family $\mathcal{F}$ . Throughout the paper we use $\mathrm{A}_{\mathcal{F}}$ to denote $\bigcup_{\mathrm{P}\in\mathcal{F}}\mathrm{P}$ and refer to $\alpha\in\mathrm{A}$ as original exponents and to $\alpha\in\mathrm{A}_{\mathcal{F}}\backslash\mathrm{A}$ as auxiliary exponents. Using a pattern relaxation of $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ we obtain a lower bound on (4) by solving

[TABLE]

An advantage of this approach is that we can choose patterns $\mathrm{P}\in\mathcal{F}$ such that the computational costs of solving (possibly several instances of) (14) and the obtained lower bounds on the objective function value of (4) are well balanced.

This procedure can also be seen as embedding $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ into $\mathcal{M}(\mathrm{K})_{\mathrm{A}_{\mathcal{F}}}$ for some set $\mathrm{A}_{\mathcal{F}}$ that contains $\mathrm{A}$ and can be represented nicely as a union of patterns $\mathrm{P}\in\mathcal{F}$ . Geometrically, the passage from (4) through (8) to (14) can be represented by the diagram

[TABLE]

The quality of a pattern relaxation of $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ with respect to the family of patterns $\mathcal{P}$ depends on how the moment variables are connected by the system of conditions (9). We say that monomial variables $\mathrm{v}_{\alpha},\mathbf{v}_{\beta}$ are directly connected by $\mathcal{P}$ if $\alpha,\beta\in\mathrm{P}\setminus\{\mathbf{0}\}$ holds for some $\mathrm{P}\in\mathcal{F}$ . Furthermore, $\mathrm{v}_{\alpha},\mathrm{v}_{\beta}$ are indirectly connected by $\mathcal{F}$ if there exist $\mathrm{P}_{\mathrm{j}}\in\mathcal{F},\mathrm{j}\in[\mathrm{k}]$ , such that $\alpha\in\mathrm{P}_{1},\beta\in\mathrm{P}_{\mathrm{k}}$ and $\mathrm{P}_{\mathrm{j}}\cap\mathrm{P}_{\mathrm{j}+1}\setminus\{\mathbf{0}\}\not=\emptyset$ for all $\mathrm{j}\in[\mathrm{k}-1]$ .

4 Known Convexification Techniques are Monomial Patterns

We formulate established convexification techniques from the literature as monomial patterns. These pattern types can be used – alone or in combination – to generate computationally tractable pattern relaxations (14) of (4).

4.1 Multilinear Pattern

Let $\mathrm{I}\subseteq\{0,1\}^{\mathrm{n}}$ , $\mathrm{I}\not=\emptyset$ and $\alpha\in\mathbb{N}^{\mathrm{n}}$ . We call

[TABLE]

a multilinear pattern (ML), see subplot $\mathrm{A}_{1}$ in Fig. 1 for an illustration. It is well known that the convex envelope of multilinear functions over $\mathrm{K}=\operatorname{Box}(\mathrm{l},\mathrm{u})$ is a polytope. In our context this implies the following.

Proposition 4.1.

Let $\alpha\in\mathbb{N}^{\mathrm{n}}$ be of full support. The moment body $\mathcal{M}(\mathrm{K})_{\operatorname{ML}(\alpha,\mathrm{I})}$ is a polytope satisfying

[TABLE]

*with $\mathrm{V}:=\{\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\alpha_{1}\mathbf{e}^{1}},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\alpha_{1}\mathbf{e}^{1}}\}\times\dots\times\{\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\alpha_{\mathrm{n}}\mathbf{e}^{\mathrm{n}}},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\alpha_{\mathrm{n}}\mathbf{e}^{\mathrm{n}}}\}$ . *

Multilinear patterns can be found in different contexts in the literature. In their basic version they are used to convexify multilinear polynomials. An essential building block for the convexification of product terms is the McCormick envelope [30], that is the convexification of bilinear products $\mathrm{x}_{1}\mathrm{x}_{2}$ by a tight description of the moment body $\mathcal{M}(\mathrm{K})_{\operatorname{ML}((1,1),\{0,1\}^{2})}$ , noting that $\operatorname{ML}((1,1),\{0,1\}^{2})=\{0,1\}^{2}$ . McCormick envelopes have been successfully used to build convex relaxations of multilinear monomials by applying them recursively. For a monomial $\mathbf{x}^{\alpha}$ with $\alpha\in\{0,1\}^{\mathrm{n}}$ and $\#\operatorname{supp}(\alpha)\geq 2$ this recursion can be described as follows. Let $\mathrm{J}=\{\alpha\}$ and $\mathcal{F}^{\alpha}_{\mathrm{rec}}=\emptyset$ . For each element $\beta\in\mathrm{J}$ write $\beta$ as $\beta=\beta^{\prime}+\beta^{\prime\prime}$ with $\beta^{\prime},\beta^{\prime\prime}\in\{0,1\}^{\mathrm{n}}\backslash\{\mathbf{0}\}$ . Remove $\beta$ from $\mathrm{J}$ and add $\beta^{\prime}$ to $\mathrm{J}$ if $\#\operatorname{supp}(\beta^{\prime})\geq 2$ respectively $\beta^{\prime\prime}$ to $\mathrm{J}$ if $\#\operatorname{supp}(\beta^{\prime\prime})\geq 2$ . Add the multilinear pattern $\{\beta,\beta^{\prime},\beta^{\prime\prime},\mathbf{0}\}$ to $\mathcal{F}^{\alpha}_{\mathrm{rec}}$ . This procedure corresponds to a binary tree with root $\alpha$ and the moment body of each pattern in $\mathcal{F}^{\alpha}_{\mathrm{rec}}$ is tightly described by a McCormick envelope.

In general it is not clear how to favorably decompose a multilinear exponent $\beta\in\mathrm{J}$ . For the smallest nontrivial case $\#\operatorname{supp}(\beta)=3$ this has been investigated in [41].

Another way to convexify $\mathbf{x}^{\alpha}$ with $\alpha\in\{0,1\}^{n}$ and $\#\operatorname{supp}(\alpha)\geq 2$ is to introduce for each factor $\mathrm{x}_{\mathrm{i}}^{\alpha_{\mathrm{i}}}$ with $\alpha_{\mathrm{i}}\not=0$ a moment variable $\mathrm{v}_{\alpha_{\mathrm{i}}}$ [11]. This corresponds to the pattern

[TABLE]

For $\mathrm{A}=\{\alpha\}$ the pattern relaxation corresponding to the pattern family $\{\mathrm{P}^{\alpha}\}$ is tight, while relaxation corresponding to $\mathcal{F}^{\alpha}_{\mathrm{rec}}$ is usually not tight for $\mathrm{A}=\{\alpha\}$ . It is however not clear which system

[TABLE]

or

[TABLE]

yields a tighter convex relaxation of $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ for $\mathrm{A}\subseteq\{0,1\}^{\mathrm{n}}$ with $\#\mathrm{A}\geq 2$ . This is due to the different choice of auxiliary variables and how the original moment variables are connected by the different pattern families. In our definition (15), the parameter $\mathrm{I}$ allows to flexibly choose auxiliary variables and thereby control the connective properties of the multilinear pattern family.

Multilinear patterns have also been applied to general polynomials $f\in\mathbb{R}[\mathbf{x}]_{\mathrm{A}}$ with $\mathrm{A}\subseteq\mathbb{N}^{\mathrm{n}}$ and $\mathrm{A}\backslash\{\mathbf{0}\}\not=\emptyset$ [4, 16]. Using the set $\Gamma:=\left\{\gamma=\alpha_{\mathrm{i}}\mathbf{e}^{\mathrm{i}}:\alpha\in\mathrm{A},\mathrm{i}\in[\mathrm{n}]\right\}\backslash\{\mathbf{0}\}$ , the substitution $\mathrm{y}_{\alpha_{\mathrm{i}}\mathbf{e}^{\mathrm{i}}}=\mathrm{x}^{\alpha_{\mathrm{i}}\mathbf{e}^{\mathrm{i}}}$ and $\tilde{\mathrm{A}}:=\{\beta\in\{0,1\}^{\Gamma}:\exists\alpha\in\mathrm{A}\text{ s.t. }\sum_{\gamma\in\Gamma}\beta_{\gamma}\gamma=\alpha\}$ a multilinear intermediate $\tilde{f}\in\mathbb{R}[\mathbf{y}]_{\tilde{\mathrm{A}}}$ of $f$ is generated. This corresponds to relaxing the usually non-polyhedral $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ with the polytope $\mathcal{M}(\operatorname{Box}(\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma}))_{\tilde{\mathrm{A}}}$ and

[TABLE]

(22) is further relaxed using (18) or (17). The entire process can be expressed using multilinear patterns as well. For example using (17) to further relax (22) yields the family $\left\{\operatorname{ML}(\alpha,\{\mathbf{1}\}\cup\left\{\mathbf{e}^{\mathrm{i}}:\mathrm{i}\in[\mathrm{n}]\right\}):\alpha\in\mathrm{A}\right\}$ .

Example 4.2.

We consider different exponent sets for $\mathrm{n}=2$ in the following,

[TABLE]

*The exponent sets and different patterns are visualized in Figures 1, 4, and 5 as follows. The title of a subplot refers to the set of original exponents which are depicted by red squares. The auxiliary exponents are depicted by blue dots. A pattern $\mathrm{P}$ corresponds to an undirected smooth curve and all the colored points and squares that the curve passes through. $\mathrm{A}_{\mathrm{ex}}$ will also be used in the numerical result section. *

4.2 Expression Trees

Convexification using expression trees is common in general nonlinear optimization [39, 40]. This approach is based on the observation that each algebraic expression is made up of a certain set of elementary operations, such as powers, linear combinations, or products of expressions. A decomposition of an algebraic expression into these operations can be visualized using an algebraic expression tree, like in Fig. 2. This is a rooted tree with nodes labeled by terms occurring in the expression. Each term is built up from its child terms using elementary operations and the underlying convexification is obtained by introducing a variable for each node and providing convex constraints that link every node and its child nodes. For polynomials, given as a linear combination of monomials, all the nodes apart from the root node correspond to monomial variables. A non-root node and its child nodes therefore build a pattern.

For example, the term $\mathrm{x}_{1}^{2}\mathrm{x}_{2}^{3}$ in Fig. 2 is decomposed into the product of the powers $\mathrm{x}_{1}^{2}$ and $\mathrm{x}_{2}^{3}$ of the variables $\mathrm{x}_{1}$ and $\mathrm{x}_{2}$ . For these three terms, one introduces the monomial variables $\mathrm{v}_{(2,3)}$ , $\mathrm{v}_{(2,0)}$ and $\mathrm{v}_{(0,3)}$ , respectively. The relationship of these variables is captured by the pattern $\mathrm{P}=\{(2,3),(2,0),(0,3)\}$ and the corresponding moment body $\mathcal{M}(\mathrm{K})_{\mathrm{P}}$ is described by the well-known McCormick inequalities. The variable $\mathrm{v}_{(0,3)}$ is further connected to $\mathrm{v}_{(0,1)}$ by exponentiation. The corresponding pattern is $\{(0,1),(0,3)\}$ . All patterns induced by the tree in Fig. 2 are visualized in the first subplot of Fig. 3. Observe that there other ways to form expression trees. For example one could also decompose $\mathrm{x}_{1}^{2}\mathrm{x}_{2}^{3}$ into $\mathrm{x}_{1}\mathrm{x}_{2}^{1}$ and $\mathrm{x}_{1}\mathrm{x}_{2}^{2}$ . However, the corresponding pattern $\{(2,3),(1,1),(1,2)\}$ is no longer tightly described by McCormick inequalities.

Since expression trees normally correspond to patterns of small size, they lead to weak, but efficiently computable relaxations, which are often used in divide-and-conquer approaches like branch-and-bound.

4.3 Bound-Factor Products

Another convexification approach is based on so-called bound-factor products (BF) [10]. Since the polynomials $\mathrm{x}_{\mathrm{i}}-\mathrm{l}_{\mathrm{i}}$ and $\mathrm{u}_{\mathrm{i}}-\mathrm{x}_{\mathrm{i}}$ are nonnegative on $\mathrm{K}$ , the products of these polynomials (with repetitions allowed) are also nonnegative on $\mathrm{K}$ . So, one can consider the products

[TABLE]

of $|\alpha|$ polynomials with $\alpha_{\mathrm{i}}$ linear factors depending on the variable $\mathrm{x}_{\mathrm{i}}$ , where $\alpha,\beta\in\mathbb{N}^{\mathrm{n}}$ and $\alpha\geq\beta$ . For a generic choice of $\mathbf{l}$ and $\mathbf{u}$ , the polynomial $F^{\alpha,\beta}(\mathbf{x})$ includes all monomials with exponents in the pattern $\operatorname{BF}(\alpha):=\{0,\dots,\alpha_{1}\}\times\dots\times\{0,\dots,\alpha_{\mathrm{n}}\}$ . By substituting $\mathrm{v}_{\gamma}=\mathbf{x}^{\gamma}$ for all $\gamma\in\operatorname{BF}(\alpha)$ we obtain a linearization $LF^{\alpha,\beta}(\mathbf{v})$ of $F^{\alpha,\beta}(\mathbf{x})$ . The system of linear inequalities

[TABLE]

is valid for $\mathbf{v}\in\mathcal{M}(\mathrm{K})_{\operatorname{BF}(\alpha)}$ . This approach can also be viewed as hierarchical since one can increase the order of the bound-factor products in order to tighten the relaxation. Note that in polynomial optimization this approach is known as the dual of Handelman’s hierarchy [17]. Within this approach one groups monomial variables into patterns of a rather large size and connects them with only linear constraints. For example, to generate a non-trivial relaxation of (4) using bound-factor products for the set $\mathrm{A}_{\mathrm{ex}}$ from Example 4.2 one is forced to use at least one pattern $\operatorname{BF}(\alpha)$ with $\alpha_{1}\geq 5$ and $\alpha_{2}\geq 5$ , which means that at least $36$ monomial variables have to be introduced, compare Fig. 3. Another issue is that the system of linear inequalities (24) is not a tight description of $\mathcal{M}(\mathrm{K})_{\operatorname{BF}(\alpha)}$ . These kinds of relaxations have also been used within branch-and-bound strategies [10].

4.4 Moment Relaxation

The most popular convexification techniques in the polynomial optimization community are the moment relaxation and its dual counterpart, the sos relaxation [2, 21, 28]. This approach introduces a large number of monomial variables and links them all with one large pattern using psd constraints. The approach is hierarchical in the sense that one first needs to choose a bound on the degree of the monomials, for which monomial variables are introduced. These hierarchies have in practice good approximation properties at the expense of large sdps, see [33] for computational studies. Even though the lowest possible hierarchy level of the moment relaxation often produces tight bounds, it does not scale well when the number of variables and/or degree grows. However, strategies exist to make the approach more tractable, e.g., exploiting correlative sparsity [45, 24, 23, 18, 27], term sparsity and structures of the Newton polytope [34, 48], combinations of the previous [47, 49], symmetry structures [35, 2] as well as spectral methods that exploit the so-called constant trace property of sos hierarchies [26].

To derive a so-called moment relaxation of (4), the following representation of the moment body $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ in terms of probability measures is used:

[TABLE]

So a vector $\mathbf{v}\in\mathbb{R}^{\mathrm{A}}$ belongs to $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ iff there exists a probability measure $\mu$ on $\mathrm{K}$ such that $\mathrm{v}_{\alpha}=\int\mathbf{x}^{\alpha}\mu(d\mathbf{x})$ for all $\alpha\in\mathrm{A}$ . Hence, (8) can be formulated as

[TABLE]

In pursuit of a tractable characterization of the feasible set, we use the following definition and theorem.

Definition 4.3 (Moment Matrix and Localizing Matrix [20, Ch.2.7.1]).

The localizing matrix $\mathbf{M}_{\mathrm{k}}(g,\mathbf{v})$ for a polynomial $g$ with coefficients $(\mathrm{g}_{\alpha})_{\alpha}$ and the moment matrix $\mathbf{M}_{\mathrm{k}}(\mathbf{v})$ are defined as

[TABLE]

Theorem 4.4 ([20, Th. 2.44]).

Let $g^{1},\dots,g^{\mathrm{m}}$ be $\mathrm{n}$ -variate polynomials such that there exist sos polynomials $s^{0},\dots,s^{\mathrm{m}}$ for which

[TABLE]

is compact. Furthermore, let $\mathrm{K}=\{\mathbf{x}\in\mathbb{R}^{\mathrm{n}}:g^{\mathrm{i}}(\mathbf{x})\geq 0,\mathrm{i}\in[\mathrm{m}]\}.$ A sequence $(\mathrm{v}_{\alpha})_{\alpha}$ has a finite Borel representing measure with support in $\mathrm{K}$ iff

[TABLE]

We describe the box $\mathrm{K}$ by the polynomials $g^{\mathrm{i}}(\mathbf{x}):=(\mathrm{x}_{\mathrm{i}}-\mathrm{l}_{\mathrm{i}})(\mathrm{u}_{\mathrm{i}}-\mathrm{x}_{\mathrm{i}})$ for $\mathrm{i}\in[\mathrm{n}]$ , i.e. $\mathrm{K}=\{\mathbf{x}\in\mathbb{R}^{\mathrm{n}}:g^{\mathrm{i}}(\mathbf{x})\geq 0,\mathrm{i}\in[\mathrm{n}]\}$ . Clearly, the assumptions of Theorem 4.4 hold and we can formulate (8) as

[TABLE]

The moment and localizing matrices from the above constraints are submatrices of infinite matrices with rows and columns indexed by $\alpha,\beta\in\mathbb{N}^{\mathrm{n}}$ rather than $\alpha,\beta\in\mathbb{N}_{\mathrm{k}}^{\mathrm{n}}$ . Thus, since $\mathrm{k}$ is arbitrarily large, the constraints can be viewed as infinite-dimensional psd constraints that impose semidefiniteness of the infinite moment matrix and $\mathrm{m}$ infinite localizing matrices. By fixing a particular $\mathrm{k}=\mathrm{d}$ one relaxes the infinite dimensional psd problem to a finite-dimensional one. This is known as the choice of the level of the hierarchy of the moment relaxations. It is natural to restrict attention to levels that are sufficient large to ensure that all the variables occurring in the objective function appear in the constraints. Thus, for every $\mathrm{d}\geq\lceil\frac{\deg(\mathrm{A})}{2}\rceil$ , we consider the optimal value $\rho_{\mathrm{d}}$ of the semidefinite problem

[TABLE]

The value $\rho_{\mathrm{d}}$ is a lower bound on the optimal value of (4). This problem has one sdp constraint of size $\binom{\mathrm{n}+\mathrm{d}}{\mathrm{d}}$ that involves the monomial variables $\mathrm{v}_{\alpha},\alpha\in\mathbb{N}^{\mathrm{n}}_{2\mathrm{d}}=\left\{\alpha\in\mathbb{N}^{\mathrm{n}}:\alpha_{1}+\dots+\alpha_{\mathrm{n}}\leq 2\mathrm{d}\right\}$ , and $\mathrm{n}$ sdp constraints of size $\binom{\mathrm{n}+\mathrm{d}-1}{\mathrm{d}-1}$ that involve $\mathrm{v}_{\alpha},\alpha\in\mathbb{N}^{\mathrm{n}}_{2\mathrm{d}-2}$ . Hence, the moment relaxation corresponds to the pattern $\mathbb{N}^{\mathrm{n}}_{2\mathrm{d}}$ . Note that for general problems it is not possible to reduce the size of the mentioned sdp constraints [3]. For a small example like Example 4.2 with $\deg(\mathrm{A}_{\mathrm{ex}})=10$ and $\mathrm{n}=2$ this adds up to 66 moment variables. The third subplot in Fig. 3 shows the pattern corresponding to the lowest hierarchy level which involves an sdp constraint with a $21\times 21$ matrix.

4.5 Singletons

The smallest patterns are singletons $\{\alpha\}$ with $\alpha\in\mathbb{N}^{\mathrm{n}}$ . The moment body of a singleton $\{\alpha\}$ is the interval $\mathcal{M}(\mathrm{K})_{\{\alpha\}}=[\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\alpha},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\alpha}]$ . The pattern relaxation of $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ induced by the family of singletons $\left\{\{\alpha\}:\alpha\in\mathrm{A}\right\}$ is $\operatorname{Box}(\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\mathrm{A}},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\mathrm{A}})$ . This is the weakest possible relaxation within the pattern approach. The provided bounds on the monomial variables can be exploited by branch-and-bound solvers [36].

4.6 Alternative Techniques

Besides the mentioned techniques there exist other approaches for polynomial instances. For example approaches based on geometric programming or relative entropy relaxations for signomial programming have been investigated in [14, 15, 9, 8]. Closely related to geometric and signomial programming are special non-negativity certificates utilizing so-called sums of nonnegative circuit polynomials (sonc) [13, 38, 25]. Note that the cones of nonnegative circuit polynomials are essentially power cones. As it is well known that these cones have second oder cone lifts, so does the sonc cone. For a different proof see [46]. Furthermore, in [12] the authors purpose a linear approximation of the sonc cone.

Another approach uses scaled diagonally dominant sums of squares (sdsos) [1], that is a non-negativity certificate based on sos polynomials with sparse monomial support $\{2\alpha,\alpha+\beta,2\beta\}$ with $\alpha,\beta\in\mathbb{N}^{\mathrm{n}}$ .

By dualizing the sonc [19] and sdsos relaxations, one arrives at convexifications in terms of monomial variables. These duals correspond to pattern relaxations that use special pattern types, see an illustration in Fig. 4 for the case $\mathrm{n}=2$ .

5 Truncated Submonoids

In order to generate computationally tractable relaxations of (4) we look for patterns $\mathrm{P}$ such that we can formulate the constraint $\mathbf{v}_{\mathrm{P}}\in\mathcal{M}(\mathrm{K})_{\mathrm{P}}$ of (14) (or a sufficiently tight approximation of this constraint) in such a way that it is accessible to optimization methods. In this section we introduce the new pattern type truncated submonoids for which we determine the size of these constraints.

Let $\mathrm{k}\in[\mathrm{n}]$ , $\mathrm{d}\in\mathbb{N}$ and $\Gamma=(\gamma^{1},\dots,\gamma^{\mathrm{k}})\in\mathbb{N}^{\mathrm{n}\times\mathrm{k}}$ be a matrix, whose columns $\gamma^{\mathrm{i}}$ are nonzero vectors with pairwise disjoint supports. Clearly, such vectors $\gamma^{1},\dots,\gamma^{\mathrm{k}}$ are linearly independent. We call

[TABLE]

the $\mathrm{k}$ -variate $\mathbb{N}^{\mathrm{k}}_{\mathrm{d}}$ -truncated submonoid (TS) and $\gamma^{1},\dots,\gamma^{\mathrm{k}}$ its generators.

Proposition 5.1.

The moment body $\mathcal{M}(\mathrm{K})_{\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{\mathrm{d}})}$ can be represented as a $\mathrm{k}$ -variate moment body by

[TABLE]

Proof 5.2.

*The desired representation is obtained by taking the convex hull of the left and the right hand side of the equality $\operatorname{m}(\mathrm{K})_{\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{\mathrm{d}})}=\operatorname{m}(\operatorname{Box}(\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma},\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\Gamma}))_{\mathbb{N}^{\mathrm{k}}_{\mathrm{d}}}$ . *

The next proposition follows by combining Theorem 4.4 and Proposition 5.1.

Proposition 5.3.

Let $g^{\mathrm{i}}(\tilde{\mathbf{x}}):=(\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\gamma_{\mathrm{i}}}-\tilde{\mathrm{x}}_{\mathrm{i}})(\tilde{\mathrm{x}}_{\mathrm{i}}-\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\gamma_{\mathrm{i}}})$ for each $\mathrm{i}\in[\mathrm{k}].$ Then $\mathbf{v}\in\mathcal{M}(\mathrm{K})_{\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{2\mathrm{d}})}$ if and only if there exists $\mathbf{w}\in\mathbb{R}^{\mathbb{N}^{\mathrm{k}}}$ with $\mathrm{v}_{\Gamma\omega}=\mathrm{w}_{\omega}$ for all $\omega\in\mathbb{N}^{\mathrm{k}}_{2\mathrm{d}}$ and

[TABLE]

Using Proposition 5.3 we can treat the constraint $\mathbf{v}\in\mathcal{M}(\mathrm{K})_{\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{\mathrm{d}})}$ as in the moment relaxation, i.e., truncating the infinite dimensional matrices at an even $\mathrm{r}\in\mathbb{N}$ with $\mathrm{r}\geq\mathrm{d}$ . Naturally, the complexity of the constraints (41) depends $\mathrm{k}$ and $\mathrm{d}$ . For practical purposes, it is desirable to choose these parameters not to large. We use $\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{2\mathrm{d}})$ , truncating the matrices in (41) at $\mathrm{r}=2\mathrm{d}$ , because of several reasons.

•

Since our overall strategy for (4), based on (14), does not guarantee determination of the exact optimal value of (4), we see no need in exact approximation of the constraints $\mathbf{v}_{\mathrm{P}}\in\mathcal{M}(\mathrm{K})_{\mathrm{P}}$ in (14) at high computational costs. Therefore, when $\mathrm{P}$ is a truncated submonoid pattern, we prefer to relax $\mathcal{M}(\mathrm{K})_{\mathrm{P}}$ by means of Proposition 5.3 using a value $\mathrm{r}$ that is not too large.

•

The lowest possible level of the moment relaxation often yields sufficiently tight bounds.

We would like to stress that in practice the size of moment relaxations for the original problem (4) does not scale well if the degree of $\mathrm{A}$ and $\mathrm{n}$ grows. In general it is not possible to reduce this size if $\mathrm{A}$ does not admit any specific sparsity structures; see [3] for a theoretical justification. In contrast, we believe that one can use moment relaxations for the constraint $\mathbf{v}\in\mathcal{M}(\mathrm{K})_{\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{\mathrm{d}})}$ , since we can keep the size of the matrices in (41) under control.

5.1 Chains

For $\gamma\in\mathbb{N}^{\mathrm{n}}\setminus\{\mathbf{0}\}$ and $\mathrm{d}\in\mathbb{N}$ , we call

[TABLE]

a chain. A chain is a special truncated submonoid pattern with $\mathrm{k}=1$ . In the case of chains $\mathrm{P}$ , the constraints $\mathbf{v}_{\mathrm{P}}\in\mathcal{M}(\mathrm{K})_{\mathrm{P}}$ of (14) amounts to semidefinite constraints.

Theorem 5.4 ([21, Th. 3.23]).

Let $\mathrm{d}$ be an nonnegative integer, $\mathrm{a},\mathrm{b}\in\mathbb{R}$ with $\mathrm{a}<\mathrm{b}$ and $g(\tilde{\mathbf{x}}):=(\mathrm{a}-\tilde{\mathrm{x}})(\tilde{\mathrm{x}}-\mathrm{b}).$ Then

[TABLE]

Combining Proposition 5.1 and Theorem 5.4 we obtain:

Proposition 5.5.

Let $\operatorname{CH}(\gamma,2\mathrm{d})$ be a chain pattern and $g(\tilde{\mathbf{x}}):=(\operatorname{\overline{\mathbf{x}}}_{K}^{\gamma}-\tilde{\mathrm{x}})(\tilde{\mathrm{x}}-\operatorname{\underline{\mathbf{x}}}_{K}^{\gamma}).$ Then the moment body $\mathcal{M}(\mathrm{K})_{\operatorname{CH}(\gamma,2\mathrm{d})}$ can be represented using semidefinite constraints

[TABLE]

5.2 Shifting a Pattern

To generate new patterns by shifting existing ones by a vector $\eta$ , we can use the following proposition.

Proposition 5.6.

Let $\mathrm{P}\subseteq\mathbb{N}^{\mathrm{n}}$ be a pattern with $\operatorname{supp}(\mathrm{P})\not=[\mathrm{n}]$ and $\eta\in\mathbb{N}^{\mathrm{n}}$ a vector with $\operatorname{supp}(\eta)\subseteq[\mathrm{n}]\backslash\operatorname{supp}(\mathrm{P})$ . Then

[TABLE]

Proof 5.7.

The assertion follows from $\mathcal{M}(\mathrm{K})_{\eta+\mathrm{P}}=\operatorname{conv}(\{\mathbf{x}^{\eta}(\mathbf{x}^{\beta})_{{\beta}\in\mathrm{P}}:\mathbf{x}\in\mathrm{K}\})$ and the observation that $\mathbf{x}^{\eta}$ and $\mathbf{x}^{\beta}$ have no common factor since $\operatorname{supp}(\eta)\cap\operatorname{supp}(\beta)=\emptyset$ . Hence

[TABLE]

5.3 Shifted Chains

We apply the shifting procedure to chains and generate a new pattern type. Let $\mathrm{d}\in\mathbb{N}$ and $\gamma,\eta\in\mathbb{N}$ with $\operatorname{supp}(\gamma)\cap\operatorname{supp}(\eta)=\emptyset$ . We call $\eta+\operatorname{CH}(\gamma,\mathrm{d})$ a shifted chain. Using Proposition 5.6 we can represent the moment body $\mathcal{M}(\mathrm{K})_{\eta+\operatorname{CH}(\gamma,\mathrm{d})}$ as the convex hull of $\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\eta}\mathcal{M}(\mathrm{K})_{\operatorname{CH}(\gamma,\mathrm{d})}$ and $\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\eta}\mathcal{M}(\mathrm{K})_{\operatorname{CH}(\gamma,\mathrm{d})}$ and formulate a result for shifted chains in analogy to Proposition 5.5.

Corollary 5.8.

Let $\gamma,\eta\in\mathbb{N}^{\mathrm{n}}\setminus\{\mathbf{0}\}$ have disjoint support and $\mathrm{d}>0$ an integer. Let $g(\mathrm{t})=(\operatorname{\overline{\mathbf{x}}}_{\mathrm{K}}^{\gamma}-\tilde{\mathrm{x}})(\tilde{\mathrm{x}}-\operatorname{\underline{\mathbf{x}}}_{\mathrm{K}}^{\gamma})$ . Then $\mathbf{v}\in\eta+\operatorname{CH}(\gamma,\mathrm{d})$ if and only if there exists $\lambda\in[0,1]$ such that

[TABLE]

5.4 Generalizing Truncated Submonoids

It is possible to expand the notion of truncated submonoids to generators $\Gamma=(\gamma^{1},\dots,\gamma^{\mathrm{k}})\in\mathbb{Z}^{\mathrm{n}\times\mathrm{k}}$ by using the shifted truncated submonoids $\eta+\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{2\mathrm{d}})$ with $\eta\in 2\mathbb{N}$ such that

[TABLE]

For different choices of the parameters $\Gamma$ , $\mathbb{N}^{\mathrm{k}}_{2\mathrm{d}}$ and sets $\mathrm{K}$ one can apply Positivestellensätze that yield tractable characterizations of $\mathcal{M}(\mathrm{K})_{\eta+\operatorname{TS}(\Gamma,\mathbb{N}^{\mathrm{k}}_{2\mathrm{d}})}$ . For example let $\Gamma^{1}\in\mathbb{Z}^{\mathrm{n}}\backslash 2\mathbb{Z}^{\mathrm{n}},\Gamma^{2}\in\mathbb{Z}^{\mathrm{n}\times 2}\backslash 2\mathbb{Z}^{\mathrm{n}\times 2}$ and $\Gamma^{\mathrm{k}}\in\mathbb{Z}^{\mathrm{n}\times\mathrm{k}}\backslash 2\mathbb{Z}^{\mathrm{n}\times\mathrm{k}}$ be matrices whose columns have pairwise disjoint support and satisfy (42). Then from [28, Hilbert 1888] it follows that $\mathcal{M}(\mathbb{R}^{\mathrm{n}})_{\eta+\operatorname{TS}(\Gamma^{1},\mathbb{N}_{2\mathrm{d}})}$ , $\mathcal{M}(\mathbb{R}^{\mathrm{n}})_{\eta+\operatorname{TS}(\Gamma^{2},\mathbb{N}^{2}_{4})}$ and $\mathcal{M}(\mathbb{R}^{\mathrm{n}})_{\eta+\operatorname{TS}(\Gamma^{\mathrm{k}},\mathbb{N}^{\mathrm{k}}_{2})}$ can be represented by psd constraints of size $\tbinom{1+\mathrm{d}}{\mathrm{d}}$ , $\tbinom{2+2}{2}$ and $\tbinom{\mathrm{k}+2}{1}$ , respectively. Furthermore, if $\Gamma^{1}\in 2\mathbb{Z}^{\mathrm{n}}$ satisfies (42), then $\mathcal{M}(\mathbb{R}^{\mathrm{n}})_{\eta+\operatorname{TS}(\Gamma^{1},\mathbb{N}_{2\mathrm{d}})}$ can be characterized by two psd matrices, one of size $\binom{1+\mathrm{d}}{\mathrm{d}}$ , one of size $\binom{1+\mathrm{d}-1}{\mathrm{d}-1}$ . In particular, combining these representations of $\Gamma^{1}\in 2\mathbb{Z}^{\mathrm{n}}$ and $\Gamma^{1}\in\mathbb{Z}^{\mathrm{n}}\backslash 2\mathbb{Z}^{\mathrm{n}}$ leads to a generalization of the underlying pattern of the sdsos certificate. At last, if $\Gamma^{1}\in\mathbb{Z}^{\mathrm{n}}$ satisfies (42), then $\mathcal{M}(\operatorname{Box}(\mathbf{l},\mathbf{u}))_{\eta+\operatorname{TS}(\Gamma^{1},\mathbb{N}_{2\mathrm{d}})}$ can be characterized using at most two psd matrices of size at most $\binom{1+\mathrm{d}}{\mathrm{d}}$ . For that one has to determine whether the closure of $\{\operatorname{m}(\mathbf{x})_{\Gamma^{1}}\in\mathbb{R}:\mathbf{x}\in\operatorname{Box}(\mathbf{l},\mathbf{u})\backslash\{\mathbf{0}\})\}$ is $\mathbb{R}$ , a semi-infinite interval, the union of two disjoint semi-infinite intervals or a bounded interval and then apply the respective Positivestellensatz: [28, Hilbert 1888], [28, Stieltjes 1885], [28, Hausdorff 1921] or [28, Svecov 1885]. However, since we do not use any of these representations in the computations section, we do not pursue these patterns any further.

6 Computational Results

Finding an unbiased setting to compare the advantages and disadvantages of convex relaxations for pop is not trivial, as models, their purpose, and methods are usually closely linked to one another. We decided to use a prototype implementation to compute solutions of (14) for different monomial pattern families. The solutions are used to approximate the size of the relaxations, and compared on a new benchmark library of random pop instances among another and to results from BARON, YALMIP and CS-TSSOS. We start by describing implementation and comparison details, before numerical results for different classes of instances are discussed.

6.1 Implementation Details

Four different solvers were run for the numerical evaluation on a compute server with 4 Intel(R) Xeon(R) Gold 6138 CPUs with 20 cores of 2 threads and 1 TB RAM each under Ubuntu 18.04.4. Each solver-instance pair was assigned to one such job, i.e. the solvers themselves did not use the parallel structure. In order to distribute the solver-instance pairs to the 80 cores we used [42]. We used MATLAB 9.6.0.1174912 (R2019a) Update 5 [29], MOSEK 9.2.32 [32], JULIA 1.5.2 [6], CS-TSSOS version 1.00 [47], BARON 1.8.9 [43], and YALMIP 20200930 [22]. All reported run times are wall-clock times. The code for solving the pattern relaxation (14) was implemented and run in MATLAB and consists of roughly 3500 lines of code and uses MOSEK to solve the relaxations. The reported time is the termination time obtained from MOSEK. BARON [43] was called from MATLAB with default settings. BARON currently only returns the CPU time, when its MATLAB interface is used. Hence, we timed BARON calls with MATLAB’s tic and toc commands111This method was suggested with the support of BARON.. CS-TSSOS is a JULIA package that allows to exploit correlative sparsity and term sparsity simultaneously. We called the first level of the hierarchy by running the command cs_tssos_first with settings order $=\lceil\tfrac{\deg(f)}{2}\rceil$ and TS="MD". CS-TSSOS does not report the time of the solution process. Thus, we first piped the output from the sdp solver MOSEK, that CS-TSSOS uses, to a text file. After that we read the termination time of MOSEK from the text file. Since only two decimal places are obtained this way, the time we report is only a proxy of the time of CS-TSSOS’s actual MOSEK call. YALMIP is a MATLAB toolbox that allows to compute the moment as well as the sos relaxation of (4). We run YALMIP’s solvesos lowest possible level of the sos hierarchy and report the termination time obtained from MOSEK.

6.2 Setup of Numerical Comparisons

As an indicator for the tightness of relaxations we approximate the size of feasible sets by their width. For a given finite and nonempty set $\mathrm{A}\subseteq\mathbb{N}^{\mathrm{n}}$ and a vector $\mathbf{f}\in\mathbb{R}^{\mathrm{A}}$ we define the width function $\omega_{\mathcal{M}(\mathrm{K})_{\mathrm{A}}}(\mathbf{f})$ of $\mathcal{M}(\mathrm{K})_{\mathrm{A}}$ in direction $\mathbf{f}$ as

[TABLE]

Replacing $\mathrm{K}$ by a relaxation based on a pattern family $\mathcal{F}$ one obtains an upper bound on the value of $\omega_{\mathcal{M}(\mathrm{K})_{\mathrm{A}}}(\mathbf{f})$ , denoted by $\omega(\mathcal{F},\mathcal{M}(\mathrm{K})_{\mathrm{A}},\mathbf{f})$ . The evaluation requires solving two instances of (14) for every pattern of interest, using the objective functions $\left<-\mathbf{f},\mathbf{v}\right>$ and $\left<\mathbf{f},\mathbf{v}\right>$ , respectively. To normalize the values $\omega_{\mathcal{M}(\mathrm{K})_{\mathrm{A}}}(\mathbf{f})$ and $\omega(\mathcal{F},\mathcal{M}(\mathrm{K})_{\mathrm{A}},\mathbf{f})$ , we divide by the width function obtained for the (trivial) relaxation using the singletons-only pattern $\mathcal{F}^{\mathrm{sgl}}_{\mathrm{A}}=\{\{\alpha\}:\alpha\in\mathrm{A}\}$ , i.e.,

[TABLE]

Table 1 lists the methods and patterns that were used for the numerical results. Method (B) gives an approximation of the reference solution, albeit at a high computational cost. (R) can be seen as the current state-of-the-art for a relaxation within a divide-and-conquer approach. Our approach allows to compare the new relaxation strategies (M), (C), (MC), (H), (T) with respect to the width function.

Figs. 7, 8, 9, 10, and 12 show box plots of our numerical findings. The box plots visualize the distributions (20 random vectors $\mathbf{f}^{\mathrm{i}}$ ) of the normalized width functions (44) for various methods from Table 1 computed with BARON, YALMIP and CS-TSSOS. The title of a subplot corresponds to the exponent set $\mathrm{A}$ . Below the method (see Table 1) the rounded mean time in seconds is shown for the respective method. The box borders are the $\nicefrac{{1}}{{4}}$ and the $\nicefrac{{3}}{{4}}$ -quantiles. The lower whisker is the smallest data value which is larger than the lower quartile $-1.5$ times the interquartile range and the upper whisker accordingly.

6.3 Test Instances

In our test instances, we use 13 finite exponents sets $\mathrm{A}\subseteq\mathbb{N}^{\mathrm{n}}$ classified into four types: specially structured adversary sets, dense sets, sparse sets, and the example $\mathrm{A}_{\mathrm{ex}}$ from above. They are explained in the next subsection. For each exponent set we chose $\mathrm{K}=[0,1]^{\mathrm{n}}$ and 20 (uniform distributed) random coefficient vectors $\mathbf{f}^{1},\dots,\mathbf{f}^{20}\in[-1,1]^{\mathrm{A}}$ . The instances were a priori filtered to avoid trivial problems. If BARON did terminate on both the minimization and the maximization tasks in (43) within the CPU time limit of 1000 seconds, the instance was replaced. Therefore the corresponding mean times for (B) are always at least $1000$ seconds.

Our approach to generation of test instances is a search for instances that are interesting and realistic enough, on the one hand, but computationally challenging for the existing methods, on the other hand. That is, we wanted to study if existing convexification strategies can be improved on some interesting families of optimization problems.

While we test our approach for (4) on the unit box $\mathrm{K}=[0,1]^{\mathrm{n}}$ and with the objective functions having coefficients in $[-1,1]$ , our approach is applicable without any changes for general objective functions and on arbitrary axis-aligned boxes. We also expect that the results of our numerical evaluations would be the same in this slightly more general setting.

6.4 Numerical Results

In this subsection we describe the different exponent sets and present numerical results for the different methods from Table 1.

6.4.1 Adversary Exponent Sets

If a pattern family yields poor connectivity properties for an exponent set, we consider this set to be an adversary exponent set for this family. In subplot $\mathrm{A}_{2}$ from Fig. 1, for example, we see that the sparse family $\mathcal{F}^{\mathrm{m}}_{\operatorname{CH}(\mathbf{1}^{5},\mathrm{d})}$ of multilinear patterns connects none of the original exponents. Hence, chain shaped exponent sets are natural adversaries for relaxations that only use multilinear patterns. As a result, the first two subplots in Fig. 7 show that the bounds using $\mathcal{F}^{\mathrm{m}}_{\operatorname{CH}(\gamma,\mathrm{d})}$ (M) coincide with the bounds obtained by the weakest pattern family $\mathcal{F}^{\mathrm{sgl}}_{\operatorname{CH}(\gamma,\mathrm{d})}$ . On the other hand, it is not surprising that the bounds obtained by using one chain (C) match the reference solution (B). The sparsity exploiting solver CS-TSSOS as well as YALMIP’s sos method fail to terminate for any of the 20 instances with exponent set $\operatorname{CH}(\mathbf{1}^{4},10)$ . We suspect that the reason for this is that $\operatorname{CH}(\mathbf{1}^{4},10)$ does not yield any term or chordal sparsity structures that can be exploited. Thus, CS-TSSOS and YALMIP solve a regular sos relaxation for $\mathrm{n}=4$ and $\deg(\operatorname{CH}(\mathbf{1}^{4},10))=40$ , involving an sdp with a $\tbinom{24}{4}\times\tbinom{24}{4}=10626\times 10626$ psd matrix.

Another adversary exponent set for multilinear patterns is $\mathrm{C}(\mathrm{n},\mathrm{d}):=\operatorname{CH}(\mathbf{1}^{\mathrm{n}},\mathrm{d})\cup\operatorname{CH}(\mathbf{e}^{1},\mathrm{d})\cup\dots\cup\operatorname{CH}(\mathbf{e}^{\mathrm{n}},\mathrm{d})$ . It can be covered sparsely by $d$ multilinear patterns using the family $\mathcal{F}^{\mathrm{m}}_{\mathrm{C}(\mathrm{n},\mathrm{d})}$ . Each pattern of $\mathcal{F}^{\mathrm{m}}_{\mathrm{C}(\mathrm{n},\mathrm{d})}$ connects $\mathrm{n}+1$ original exponents, but establishes no connection between monomials from different patterns. That is because two patterns $\mathrm{P},\mathrm{P}^{\prime}\in\mathcal{F}^{\mathrm{m}}_{\mathrm{C}(\mathrm{n},\mathrm{d})}$ with $\mathrm{P}\not=\mathrm{P}^{\prime}$ satisfy $\mathrm{P}\cap\mathrm{P}^{\prime}=\{\mathbf{0}\}$ . The poor connective properties of $\mathcal{F}^{\mathrm{m}}_{\mathrm{C}(\mathrm{n},\mathrm{d})}$ explain their poor performance, see (M) in Fig. 7. By additionally using $\mathrm{n}+1$ chains to connect the $\mathrm{d}$ multilinear patterns, the family $\mathcal{F}^{\mathrm{h}}_{\mathrm{C}(\mathrm{n},\mathrm{d})}$ exploits the structure of $\mathrm{C}(\mathrm{n},\mathrm{d})$ . As a result, the resulting bounds of (H) and (B) are indistinguishable in Fig. 7. Again, CS-TSSOS and YALMIP fail to terminate for any of the instances with exponent set $\mathrm{C}(4,10)$ – most likely for the same reason as above.

6.4.2 Dense Exponent Sets

We consider dense exponent sets $\mathrm{A}=\mathbb{N}^{\mathrm{n}}_{\mathrm{d}}$ for $\mathrm{n}\in\{2,4\}$ and $\mathrm{d}=10$ . The pattern families shown in Fig. 8 perform reasonably well, probably due to their connectivity properties. Furthermore, we see that the multilinear patterns (M) perform for $\mathrm{n}=4$ drastically better than (R). This might be because the multilinear patterns $\operatorname{ML}(\alpha,\{0,1\}^{\mathrm{n}})$ used in $\mathcal{F}^{\mathrm{m}}_{\mathbb{N}_{10}^{4}}$ are bigger than the ones BARON uses, leading to more connections between monomial variables.

6.4.3 Sparse Exponent Sets

We use randomly generated sparse exponent sets $\mathrm{A}=\mathrm{S}(\mathrm{n},\mathrm{d})$ to test pattern families that do not assume any structure of $\mathrm{A}$ . $\mathrm{S}(\mathrm{n},\mathrm{d})$ is generated by randomly picking $\left\lceil\sqrt{\tbinom{\mathrm{n}+\mathrm{d}}{\mathrm{d}}}\right\rceil$ exponents via randperm from $\mathbb{N}^{\mathrm{n}}_{\mathrm{d}}$ .

Fig. 9 column (M) shows that $\mathcal{F}^{\mathrm{m}}_{\mathrm{S}(\mathrm{n},\mathrm{d})}$ does not perform particularly well. Column (H) shows that additionally enforcing indirect connections between moment variables via $\mathrm{n}+1$ chains and $\mathrm{d}$ multilinear patterns in $\mathcal{F}^{\mathrm{h}}_{\mathrm{S}(\mathrm{n},\mathrm{d})}$ results in tighter bounds.

Fig. 10 shows the distribution of the width for sparse instances with a high number of variables $\mathrm{n}=20,25,30,40$ and low degree $\mathrm{d}=4$ . Computing lower bounds for the instances using relaxations that do not exploit sparsity of $\mathrm{S}(\mathrm{n},\mathrm{d})$ involves severe computational cost. We ran into memory problems with YALMIP for the instance with $\mathrm{S}(\mathrm{n},4)$ when $\mathrm{n}\geq 35$ . Thus, we used (SOS), that is an own implementation of an sos relaxation instead. (The method (SOS) yields similar bounds to (Y) for $\mathrm{n}=20,25$ with average times $147.4$ s and $935.8$ s). Interestingly, the bounds computed by CS-TSSOS are worse than the ones ones computed with the pattern family $\mathcal{F}^{\mathrm{sgl}}_{\mathrm{S}(\mathrm{n},4)}$ . It might be that using different settings for CS-TSSOS yields better bounds. However, this would also result in higher computation times. The pattern strategy (T) yields for all tested $\mathrm{n}$ nontrivial bounds. Note that for $\mathrm{n}=20,25,30,40$ these bounds seem to be reasonably tight, when compared to (Y) or (SOS), but for a fraction of the computation time.

We want to point out that we were able to compute nontrivial bounds for instances with exponent sets $\mathrm{S}(80,4)$ . For these exponent sets the computation of one of the two optima involved in the definition of the width usually takes between 6-7 minutes. The reason for the good performance in terms of computation time of (T) can be traced back to the relatively small size of the biggest involved $\mathrm{r}\times\mathrm{r}$ psd matrices in the relaxation of (14). That is for $\mathcal{F}^{\mathrm{t}}_{\mathrm{S}(\mathrm{n},2\mathrm{d})}=\left\{\operatorname{TS}(\{\mathbf{e}^{\mathrm{i}}\}_{\mathrm{i}\in\operatorname{supp}(\alpha)},2\mathrm{d}):\alpha\in\mathrm{S}(\mathrm{n},2\mathrm{d})\backslash\operatorname{TS}(\Gamma,2\mathrm{d})\right\}\cup\{\operatorname{TS}(\Gamma,\mathrm{d})\}$

[TABLE]

For $2\mathrm{d}=4$ and $\mathrm{n}\geq 4$ this boils down to $\mathrm{r}\leq\max\Big{\{}\tbinom{2+4}{4},\tbinom{1+\mathrm{n}}{\mathrm{n}}\Big{\}}$ .

6.4.4 Custom Strategies

A customized pattern family $\mathcal{F}$ for a given exponent set $\mathrm{A}$ allows to trade off computational cost versus tightness of the relaxation. Figure 11 shows three example pattern families customized for $\mathrm{A}_{\mathrm{ex}}$ from Example 4.2.

While the bounds obtained from $\mathcal{F}^{2}$ , see $\mathrm{F}^{2}$ in Fig. 12, are far from optimal, they are an improvement compared to $\mathcal{F}_{\mathrm{A}_{\mathrm{ex}}}^{\mathrm{m}}$ in producing bounds similar to those obtained by (B).

7 Conclusion

We have presented a customizable framework for the relaxation of polynomial optimization problems over a box that is based on monomial patterns. This framework allows inclusion and combination of existing approaches that were developed by different communities. In fact, various kinds of linearizations of multilinear terms, relaxations based on bound-factor products, dual versions of the relaxations of polynomial optimization problems based on sos, sdsos and sonc polynomials all come with their particular type of pattern. The advantage of our approach is that by using patterns we can exploit the combinatorial structure of the set $\mathrm{A}$ of monomial exponents. This is done by covering the monomial support $\mathrm{A}$ with a pattern family that reflects the structure of $\mathrm{A}$ . Using patterns, we are able to avoid hard problem formulations by neglecting dependencies between certain monomials and instead focus on well-behaved and easy-to-describe dependencies between certain other monomials. The results were high-quality and tractable relaxations of (4).

Our computational experiments provided numerical evidence for the benefits of using different generic as well as customized pattern relaxations.

These computed bounds could be further improved by techniques within divide-and-conquer frameworks such as BARON [43], SCIP [44], COUENNE [5] or LINDOGlobal [37], in a similar manner as already done with McCormick envelopes in the global optimization community.

In particular, the more involved $\mathrm{k}$ -variate truncated submonoids and its possible generalizations provide a way to use sos or moment methods to solve problems with polynomials of higher degree and with more variables. Choosing an appropriate set of generators of a truncated submonoid pattern, this pattern type could be used as an interface to combine sos methods with divide-and-conquer frameworks. Furthermore, combining truncated submonoids with sparsity exploiting approaches such as chordal or term sparsity pose a way to further improve the run times.

The numerical results also suggest that the connectivity properties of a pattern family have a major impact on the quality of the computed bounds. This could be further investigated and exploited with hypergraph based approaches in the future.

How to efficiently generalize the approach to polynomial inequalities and to identify properties of instances in specific application areas that might benefit particularly from the new approach, are further open research questions.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. A. Ahmadi and A. Majumdar , Dsos and sdsos optimization: more tractable alternatives to sum of squares and semidefinite optimization , SIAM Journal on Applied Algebra and Geometry, 3 (2019), pp. 193–230.
2[2] M. F. Anjos and J. B. Lasserre , eds., Handbook on semidefinite, conic and polynomial optimization , vol. 166 of International Series in Operations Research & Management Science, Springer, New York, 2012, https://doi.org/10.1007/978-1-4614-0769-0 , https://doi.org/10.1007/978-1-4614-0769-0 . · doi ↗
3[3] G. Averkov , Optimal size of linear matrix inequalities in semidefinite approaches to polynomial optimization , SIAM J. Appl. Algebra Geom., 3 (2019), pp. 128–151, https://doi.org/10.1137/18M 1201342 , https://doi.org/10.1137/18M 1201342 . · doi ↗
4[4] X. Bao, A. Khajavirad, N. V. Sahinidis, and M. Tawarmalani , Global optimization of nonconvex problems with multilinear intermediates , Mathematical Programming Computation, 7 (2015), pp. 1–37.
5[5] P. Belotti , Couenne, an exact solver for nonconvex minlps , 2015, https://projects.coin-or.org/Couenne/ .
6[6] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah , Julia: A fresh approach to numerical computing , SIAM Review, 59 (2017), pp. 65–98, https://doi.org/10.1137/141000671 . · doi ↗
7[7] N. Boland, S. S. Dey, T. Kalinowski, M. Molinaro, and F. Rigterink , Bounding the gap between the mccormick relaxation and the convex hull for bilinear functions , Mathematical Programming, 162 (2017), pp. 523–535.
8[8] V. Chandrasekaran and P. Shah , Relative entropy relaxations for signomial optimization , SIAM J. Optim., 26 (2016), pp. 1147–1173, https://doi.org/10.1137/140988978 , https://doi.org/10.1137/140988978 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Convexification of box-constrained polynomial optimization problems via monomial patterns ††thanks:

Abstract

keywords:

1 Introduction

2 Basic Notation

3 Pattern Relaxation

3.1 Monomial Convexification and Monomial Relaxation

4 Known Convexification Techniques are Monomial Patterns

4.1 Multilinear Pattern

Proposition 4.1**.**

Example 4.2**.**

4.2 Expression Trees

4.3 Bound-Factor Products

4.4 Moment Relaxation

Definition 4.3** (Moment Matrix and Localizing Matrix [20, Ch.2.7.1]).**

Theorem 4.4** ([20, Th. 2.44]).**

4.5 Singletons

4.6 Alternative Techniques

5 Truncated Submonoids

Proposition 5.1**.**

Proof 5.2**.**

Proposition 5.3**.**

5.1 Chains

Theorem 5.4** ([21, Th. 3.23]).**

Proposition 5.5**.**

5.2 Shifting a Pattern

Proposition 5.6**.**

Proof 5.7**.**

5.3 Shifted Chains

Corollary 5.8**.**

5.4 Generalizing Truncated Submonoids

6 Computational Results

6.1 Implementation Details

6.2 Setup of Numerical Comparisons

6.3 Test Instances

6.4 Numerical Results

6.4.1 Adversary Exponent Sets

6.4.2 Dense Exponent Sets

6.4.3 Sparse Exponent Sets

6.4.4 Custom Strategies

7 Conclusion

Proposition 4.1.

Example 4.2.

Definition 4.3 (Moment Matrix and Localizing Matrix [20, Ch.2.7.1]).

Theorem 4.4 ([20, Th. 2.44]).

Proposition 5.1.

Proof 5.2.

Proposition 5.3.

Theorem 5.4 ([21, Th. 3.23]).

Proposition 5.5.

Proposition 5.6.

Proof 5.7.

Corollary 5.8.