Polynomial Norms

Amir Ali Ahmadi; Etienne de Klerk; Georgina Hall

arXiv:1704.07462·math.OC·July 18, 2018

Polynomial Norms

Amir Ali Ahmadi, Etienne de Klerk, Georgina Hall

PDF

Open Access

TL;DR

This paper explores polynomial norms, characterizing their properties, computational complexity, and approximation capabilities, and introduces new methods for optimization and applications in statistics and dynamical systems.

Contribution

It provides a complete characterization of polynomial norms, analyzes their computational hardness, and develops semidefinite programming techniques for their optimization.

Findings

01

Polynomial norms are characterized by strict convexity of the underlying polynomial.

02

Testing whether a form defines a polynomial norm is strongly NP-hard for degree 4.

03

Polynomial norms can be approximated arbitrarily well by general norms.

Abstract

In this paper, we study polynomial norms, i.e. norms that are the $d^{th}$ root of a degree- $d$ homogeneous polynomial $f$ . We first show that a necessary and sufficient condition for $f^{1/ d}$ to be a norm is for $f$ to be strictly convex, or equivalently, convex and positive definite. Though not all norms come from $d^{th}$ roots of polynomials, we prove that any norm can be approximated arbitrarily well by a polynomial norm. We then investigate the computational problem of testing whether a form gives a polynomial norm. We show that this problem is strongly NP-hard already when the degree of the form is 4, but can always be answered by testing feasibility of a semidefinite program (of possibly large size). We further study the problem of optimizing over the set of polynomial norms using semidefinite programming. To do this, we introduce the notion of r-sos-convexity and…

Figures5

Click any figure to enlarge with its caption.

Equations204

f (λ x + (1 - λ) y) < λ f (x) + (1 - λ) f (y), \forall x \neq = y, \forall λ \in (0, 1) .

f (λ x + (1 - λ) y) < λ f (x) + (1 - λ) f (y), \forall x \neq = y, \forall λ \in (0, 1) .

f (γ \overset{x}{ˉ} + (1 - γ) \overset{y}{ˉ}) = γ f (\overset{x}{ˉ}) + (1 - γ) f (\overset{y}{ˉ}) .

f (γ \overset{x}{ˉ} + (1 - γ) \overset{y}{ˉ}) = γ f (\overset{x}{ˉ}) + (1 - γ) f (\overset{y}{ˉ}) .

h (α) : = g (α) - (g (1) - g (0)) α - g (0) .

h (α) : = g (α) - (g (1) - g (0)) α - g (0) .

x^{*} = ∣∣ x ∣∣ = 1 argmin f (x) .

x^{*} = ∣∣ x ∣∣ = 1 argmin f (x) .

f (x) \geq ∣∣ x ∣∣ = R min f (x) \geq R^{d} f (x^{*}) = M,

f (x) \geq ∣∣ x ∣∣ = R min f (x) \geq R^{d} f (x^{*}) = M,

f (y) > f (x) + \nabla f (x)^{T} (y - x), \forall y \neq = x .

f (y) > f (x) + \nabla f (x)^{T} (y - x), \forall y \neq = x .

S_{g} = {x ∣ f^{1/ d} (x) \leq 1} = {x ∣ f (x) \leq 1} = S_{f},

S_{g} = {x ∣ f^{1/ d} (x) \leq 1} = {x ∣ f (x) \leq 1} = S_{f},

g (\frac{g ( x )}{g ( x ) + g ( y )} \cdot \frac{x}{g ( x )} + \frac{g ( y )}{g ( x ) + g ( y )} \cdot \frac{y}{g ( y )}) \leq 1.

g (\frac{g ( x )}{g ( x ) + g ( y )} \cdot \frac{x}{g ( x )} + \frac{g ( y )}{g ( x ) + g ( y )} \cdot \frac{y}{g ( y )}) \leq 1.

\frac{1}{g ( x ) + g ( y )} g (x + y) \leq 1

\frac{1}{g ( x ) + g ( y )} g (x + y) \leq 1

(1 - ϵ) ∣∣ x ∣∣ \leq f_{d}^{1/ d} (x) \leq ∣∣ x ∣∣, \forall x \in R^{n} .

(1 - ϵ) ∣∣ x ∣∣ \leq f_{d}^{1/ d} (x) \leq ∣∣ x ∣∣, \forall x \in R^{n} .

\frac{d}{n + d} (\frac{n}{n + d})^{n / d} ∣∣ x ∣∣ \leq f_{d}^{1/ d} (x) \leq ∣∣ x ∣∣, \forall x \in R^{n} .

\frac{d}{n + d} (\frac{n}{n + d})^{n / d} ∣∣ x ∣∣ \leq f_{d}^{1/ d} (x) \leq ∣∣ x ∣∣, \forall x \in R^{n} .

d \to \infty lim \frac{f _{d}^{1/ d} ( x )}{∣∣ x ∣∣} = 1 \forall x \in R^{n} .

d \to \infty lim \frac{f _{d}^{1/ d} ( x )}{∣∣ x ∣∣} = 1 \forall x \in R^{n} .

B = {x \in R^{n} ∣ ∥ x ∥ \leq 1} .

B = {x \in R^{n} ∣ ∥ x ∥ \leq 1} .

B^{\circ} = {y \in R^{n} ∣ ⟨ x, y ⟩ \leq 1 \forall x \in B} .

B^{\circ} = {y \in R^{n} ∣ ⟨ x, y ⟩ \leq 1 \forall x \in B} .

∥ x ∥ = y \in B^{\circ} max ⟨ x, y ⟩ = y \in B^{\circ} max ∣ ⟨ x, y ⟩ ∣ \forall x \in R^{n} .

∥ x ∥ = y \in B^{\circ} max ⟨ x, y ⟩ = y \in B^{\circ} max ∣ ⟨ x, y ⟩ ∣ \forall x \in R^{n} .

f_{d} (x) = \frac{1}{\mbox v o l B ^{\circ}} \int_{B^{\circ}} ⟨ x, y ⟩^{d} d y .

f_{d} (x) = \frac{1}{\mbox v o l B ^{\circ}} \int_{B^{\circ}} ⟨ x, y ⟩^{d} d y .

f_{d}^{1/ d} (x) \leq ∥ x ∥ \forall x \in R^{n} .

f_{d}^{1/ d} (x) \leq ∥ x ∥ \forall x \in R^{n} .

H_{+} = {y \in R^{n} ∣ ⟨ x_{0}, y ⟩ \geq 0} .

H_{+} = {y \in R^{n} ∣ ⟨ x_{0}, y ⟩ \geq 0} .

\mbox v o l (H_{+} \cap B^{\circ}) = \frac{1}{2} \mbox v o l (B^{\circ}) .

\mbox v o l (H_{+} \cap B^{\circ}) = \frac{1}{2} \mbox v o l (B^{\circ}) .

A_{+} (α) = {(1 - α) y + α y_{0} ∣ y \in H_{+} \cap B^{\circ}} .

A_{+} (α) = {(1 - α) y + α y_{0} ∣ y \in H_{+} \cap B^{\circ}} .

\mbox v o l A_{+} (α) = \frac{1}{2} (1 - α)^{n} \mbox v o l (B^{\circ}) .

\mbox v o l A_{+} (α) = \frac{1}{2} (1 - α)^{n} \mbox v o l (B^{\circ}) .

⟨ x_{0}, y ⟩ \geq α \forall y \in A_{+} (α),

⟨ x_{0}, y ⟩ \geq α \forall y \in A_{+} (α),

\mbox v o l (A_{+} (α) \cap A_{-} (α)) = 0.

\mbox v o l (A_{+} (α) \cap A_{-} (α)) = 0.

⟨ x_{0}, y ⟩ \leq - α \forall y \in A_{-} (α) .

⟨ x_{0}, y ⟩ \leq - α \forall y \in A_{-} (α) .

f^{1/ d} (x_{0})

f^{1/ d} (x_{0})

d \to \infty lim \frac{d}{n + d} (\frac{n}{n + d})^{n / d} = t ↓ 0 lim t^{t} (1 + t)^{- (1 + t)} = 1,

d \to \infty lim \frac{d}{n + d} (\frac{n}{n + d})^{n / d} = t ↓ 0 lim t^{t} (1 + t)^{- (1 + t)} = 1,

m_{μ} (α) = \int_{Ω} x^{α} d μ (x) \forall α \in N_{0}^{n},

m_{μ} (α) = \int_{Ω} x^{α} d μ (x) \forall α \in N_{0}^{n},

m_{μ} (α) = m_{μ^{'}} (α) \forall α \in S .

m_{μ} (α) = m_{μ^{'}} (α) \forall α \in S .

d μ (y) = \frac{1}{\mbox v o l B ^{\circ}} d y .

d μ (y) = \frac{1}{\mbox v o l B ^{\circ}} d y .

⟨ x, y ⟩^{d} = ∣ α ∣ = d \sum (α d) x^{α} y^{α},

⟨ x, y ⟩^{d} = ∣ α ∣ = d \sum (α d) x^{α} y^{α},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Complexity and Algorithms in Graphs · Polynomial and algebraic computation

Full text

Polynomial Norms

Amir Ali Ahmadi 44footnotemark: 4 Amir Ali Ahmadi. ORFE, Princeton University, Sherrerd Hall, Princeton, NJ 08540, USA. Email: [email protected]

Etienne de Klerk Etienne de Klerk. Department Econometrics and Operations Research, TISEM, Tilburg University, 5000LE Tilburg, The Netherlands. Email: [email protected]

Georgina Hall Georgina Hall, corresponding author. ORFE, Princeton University, Sherrerd Hall, Princeton, NJ 08540, USA. Email: [email protected] Ali Ahmadi and Georgina Hall are partially supported by the DARPA Young Faculty Award, the Young Investigator Award of the AFOSR, the CAREER Award of the NSF, the Google Faculty Award, and the Sloan Fellowship.

Abstract

In this paper, we study polynomial norms, i.e. norms that are the $d^{\text{th}}$ root of a degree- $d$ homogeneous polynomial $f$ . We first show that a necessary and sufficient condition for $f^{1/d}$ to be a norm is for $f$ to be strictly convex, or equivalently, convex and positive definite. Though not all norms come from $d^{\text{th}}$ roots of polynomials, we prove that any norm can be approximated arbitrarily well by a polynomial norm. We then investigate the computational problem of testing whether a form gives a polynomial norm. We show that this problem is strongly NP-hard already when the degree of the form is 4, but can always be answered by solving a hierarchy of semidefinite programs. We further study the problem of optimizing over the set of polynomial norms using semidefinite programming. To do this, we introduce the notion of r-sos-convexity and extend a result of Reznick on sum of squares representation of positive definite forms to positive definite biforms. We conclude with some applications of polynomial norms to statistics and dynamical systems.

Keywords: polynomial norms, sum of squares polynomials, convex polynomials, semidefinite programming

AMS classification: 90C22, 14P10, 52A27

1 Introduction

A function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a norm if it satisfies the following three properties:

(i)

positive definiteness: $f(x)>0,~{}\forall x\neq 0,$ and $f(0)=0$ . 2. (ii)

$1$ -homogeneity: $f(\lambda x)=|\lambda|f(x),~{}\forall x\in\mathbb{R}^{n},~{}\forall\lambda\in\mathbb{R}$ . 3. (iii)

triangle inequality: $f(x+y)\leq f(x)+f(y),~{}\forall x,y\in\mathbb{R}^{n}.$

Some well-known examples of norms include the $1$ -norm, $f(x)=\sum_{i=1}^{n}|x_{i}|$ , the $2$ -norm, $f(x)=\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$ , and the $\infty$ -norm, $f(x)=\max_{i}|x_{i}|.$ Our focus throughout this paper is on norms that can be derived from multivariate polynomials. More specifically, we are interested in establishing conditions under which the $d^{th}$ root of a homogeneous polynomial of degree $d$ is a norm, where $d$ is an even number. We refer to the norm obtained when these conditions are met as a polynomial norm. It is straightforward to see why we restrict ourselves to $d^{th}$ roots of degree- $d$ homogeneous polynomials. Indeed, nonhomogeneous polynomials cannot hope to satisfy the homogeneity condition of a norm and homogeneous polynomials of degree $d>1$ are not 1-homogeneous unless we take their $d^{th}$ root. The question of when the square root of a homogeneous quadratic polynomial is a norm (i.e., when $d=2$ ) has a well-known answer (see, e.g., [16, Appendix A]): a function $f(x)=\sqrt{x^{T}Qx}$ is a norm if and only if the symmetric $n\times n$ matrix $Q$ is positive definite. In the particular case where $Q$ is the identity matrix, one recovers the $2$ -norm. Positive definiteness of $Q$ can be checked in polynomial time using for example Sylvester’s criterion (positivity of the $n$ leading principal minors of $Q$ ). This means that testing whether the square root of a quadratic form is a norm can be done in polynomial time. A similar characterization in terms of conditions on the coefficients are not known for polynomial norms generated by forms of degree greater than 2. In particular, it is not known whether one can efficiently test membership or optimize over the set of polynomial norms.

Outline and contributions.

In this paper, we study polynomial norms from a computational perspective. In Section 2, we give two different necessary and sufficient conditions under which the $d^{th}$ root of a degree- $d$ form $f$ will be a polynomial norm: namely, that $f$ be strictly convex, or (equivalently) that $f$ be convex and postive definite (Theorem 2.1). Section 3 investigates the relationship between general norms and polynomial norms: while many norms are polynomial norms (including all $p$ -norms with $p$ even), some norms are not (consider, e.g., the $1$ -norm). We show, however, that any norm can be approximated to arbitrary precision by a polynomial norm (Theorem 3.1). While it is well known that polynomials can approximate continuous functions on compact sets arbitrarily well, the approximation result here needs to preserve the convexity and homogeneity properties of the original norm, and hence does not follow, e.g., from the Stone-Weierstrass theorem. In Section 4, we move on to complexity results and show that simply testing whether the $4^{th}$ root of a quartic form is a norm is strongly NP-hard (Theorem 4.1). We then provide a semidefinite programming-based hierarchy for certifying that the $d^{th}$ root of a degree $d$ form is a norm (Theorem 4.4) and for optimizing over a subset of the set of polynomial norms (Theorem 4.21). The latter is done by introducing the concept of $r$ -sum of squares-convexity (see Definition 4.7). We show that any form with a positive definite Hessian is $r$ -sos-convex for some value of $r$ , and present a lower bound on that value (Theorem 4.8). We also show that the level $r$ of the semidefinite programming hierarchy cannot be bounded as a function of the number of variables and the degree only (Theorem 4.19). Finally, we cover some applications of polynomial norms in statistics and dynamical systems in Section 5. In Section 5.1, we compute approximations of two different types of norms, polytopic gauge norms and $p$ -norms with $p$ noneven, using polynomial norms. The techniques described in this section can be applied to norm regression. In Section 5.2, we use polynomial norms to prove stability of a switched linear system, a task which is equivalent to computing an upperbound on the joint spectral radius of a family of matrices.

2 Two equivalent characterizations of polynomial norms

We start this section with a theorem that provides conditions under which the $d^{th}$ root of a degree- $d$ form is a norm. We will use this theorem in Section 4 to establish semidefinite programming-based approximations of polynomial norms. We remark that this result is generally assumed to be known by the optimization community. Indeed, some prior work on polynomial norms has been done by Dmitriev and Reznick in [18, 19, 38, 41]. For completeness of presentation, however, and as we could not find the exact statement of this result in the form we present, we include it here with alternative proofs. Throughout this paper, we suppose that the number of variables $n$ is larger or equal than $2$ and that $d$ is a positive even integer.

Theorem 2.1.

Let $f$ be a form of degree $d$ in $n$ variables. The following statements are equivalent:

(i)

The function $f^{\frac{1}{d}}$ is a norm on $\mathbb{R}^{n}$ . 2. (ii)

The function $f$ is convex and positive definite. 3. (iii)

The function $f$ is strictly convex, i.e.,

[TABLE]

Proof.

$(i)\Rightarrow(ii)$ If $f^{1/d}$ is a norm, then $f^{1/d}$ is positive definite, and so is $f$ . Furthermore, any norm is convex and the $d^{th}$ power of a nonnegative convex function remains convex.

$(ii)\Rightarrow(iii)$ Suppose that $f$ is convex, positive definite, but not strictly convex, i.e., there exists $\bar{x},\bar{y}\in\mathbb{R}^{n}$ with $\bar{x}\neq\bar{y}$ , and $\gamma\in(0,1)$ such that

[TABLE]

Let $g(\alpha)\mathrel{\mathop{:}}=f(\bar{x}+\alpha(\bar{y}-\bar{x})).$ Note that $g$ is a restriction of $f$ to a line and, consequently, $g$ is a convex, positive definite, univariate polynomial in $\alpha$ . We now define

[TABLE]

Similarly to $g$ , $h$ is a convex univariate polynomial as it is the sum of two convex univariate polynomials. We also know that $h(\alpha)\geq 0,\forall\alpha\in(0,1)$ . Indeed, by convexity of $g$ , we have that $g(\alpha x+(1-\alpha)y)\geq\alpha g(x)+(1-\alpha)g(y),\forall x,y\in\mathbb{R}$ and $\alpha\in(0,1)$ . This inequality holds in particular for $x=1$ and $y=0$ , which proves the claim. Observe now that $h(0)=h(1)=0$ . By convexity of $h$ and its nonnegativity over $(0,1)$ , we have that $h(\alpha)=0$ on $(0,1)$ which further implies that $h=0$ . Hence, from (1), $g$ is an affine function. As $g$ is positive definite, it cannot be that $g$ has a nonzero slope, so $g$ has to be a constant. But this contradicts that $\lim_{\alpha\rightarrow\infty}g(\alpha)=\infty.$ To see why this limit must be infinite, we show that $\lim_{||x||\rightarrow\infty}f(x)=\infty.$ As $\lim_{\alpha\rightarrow\infty}||\bar{x}+\alpha(\bar{y}-\bar{x})||=\infty$ and $g(\alpha)=f(\bar{x}+\alpha(\bar{y}-\bar{x}))$ , this implies that $\lim_{\alpha\rightarrow\infty}g(\alpha)=\infty.$ To show that $\lim_{||x||\rightarrow\infty}f(x)=\infty$ , let

[TABLE]

By positive definiteness of $f$ , $f(x^{*})>0.$ Let $M$ be any positive scalar and define $R\mathrel{\mathop{:}}=(M/f(x^{*}))^{1/d}$ . Then for any $x$ such that $||x||=R$ , we have

[TABLE]

where the second inequality holds by homogeneity of $f.$ Thus $\lim_{||x||\rightarrow\infty}f(x)=\infty$ .

$(iii)\Rightarrow(i)$ Homogeneity of $f^{1/d}$ is immediate. Positivity follows from the first-order characterization of strict convexity:

[TABLE]

Indeed, for $x=0$ , the inequality becomes $f(y)>0,~{}\forall y\neq 0$ , as $f(0)=0$ and $\nabla f(0)=0$ . Hence, $f$ is positive definite, and so is $f^{1/d}$ . It remains to prove the triangle inequality. Let $g\mathrel{\mathop{:}}=f^{1/d}$ . Denote by $S_{f}$ and $S_{g}$ the 1-sublevel sets of $f$ and $g$ respectively. It is clear that

[TABLE]

and as $f$ is strictly convex (and hence quasi-convex), $S_{f}$ is convex and so is $S_{g}$ . Let $x,y\in\mathbb{R}^{n}.$ We have that $\frac{x}{g(x)}\in S_{g}$ and $\frac{y}{g(y)}\in S_{g}$ . From convexity of $S_{g}$ ,

[TABLE]

Homogeneity of $g$ then gives us

[TABLE]

which shows that triangle inequality holds. ∎

3 Approximating norms by polynomial norms

It is easy to see that not all norms are polynomial norms. For example, the 1-norm $||x||_{1}=\sum_{i=1}^{n}|x_{i}|$ is not a polynomial norm. Indeed, all polynomial norms are differentiable at all but one point (the origin) whereas the 1-norm is nondifferentiable whenever one of the components of $x$ is equal to zero. In this section, we show that, though not every norm is a polynomial norm, any norm can be approximated to arbitrary precision by a polynomial norm (Theorem 3.1). A related result is given by Barvinok in [13]. In that paper, he shows that any norm can be approximated by the $d$ -th root of a nonnegative degree- $d$ form, and quantifies the quality of the approximation as a function of $n$ and $d$ . The form he obtains however is not shown to be convex. In fact, in a later work [14, Section 2.4], Barvinok points out that it would be an interesting question to know whether any norm can be approximated by the $d^{th}$ root of a convex form with the same quality of approximation as for $d$ -th roots of nonnegative forms. The result below is a step in that direction, although the quality of approximation is weaker than that by Barvinok [13]. We note that the form in Barvinok’s construction is a sum of squares of other forms. Such forms are not necessarily convex. By contrast, the form that we construct is a sum of powers of linear forms and hence always convex.

Theorem 3.1.

*Let $||\cdot||$ be any norm on $\mathbb{R}^{n}$ . Then, 111We would like to thank an anonymous referee for suggesting the proof of part (i) of this theorem. We were previously showing that for any norm $||\cdot||$ on $\mathbb{R}^{n}$ and for any $\epsilon>0$ , there exist an even integer $d$ and an $n$ -variate positive definite form $f_{d}$ of degree $d$ , which is a sum of powers of linear forms, and such that

$\displaystyle(1-\epsilon)||x||\leq f_{d}^{1/d}(x)\leq||x||,~{}\forall x\in\mathbb{R}^{n}.$

for any even integer $d\geq 2$ :*

(i)

There exists an $n$ -variate convex positive definite form $f_{d}$ of degree $d$ such that

[TABLE]

In particular, for any sequence $\{f_{d}\}$ $(d=2,4,6,\ldots)$ of such polynomials one has

[TABLE] 2. (ii)

One may assume without loss of generality that $f_{d}$ in (i) is a nonnegative sum of $d^{\text{th}}$ powers of linear forms.

Proof of (i).

Fix any norm $\|\cdot\|$ on $\mathbb{R}^{n}$ . We denote the Euclidean inner product on $\mathbb{R}^{n}$ by $\langle\cdot,\cdot\rangle$ , and the unit ball with respect to $\|\cdot\|$ by

[TABLE]

We denote the polar of $B$ with respect to $\langle\cdot,\cdot\rangle$ by

[TABLE]

Recall that $B^{\circ}$ is symmetric around the origin, because $B$ is. One may express the given norm in terms of the polar as follows (see e.g. relation (3.1) in [13]):

[TABLE]

For given, even integer $d$ , we define the polynomial

[TABLE]

Note that $f_{d}$ is indeed a convex form of degree $d$ . In fact, we will show later on that $f_{d}$ may in fact be written as a nonnegative sum of $d$ th powers of linear forms.

By (3), one has

[TABLE]

Now fix $x_{0}\in\mathbb{R}^{n}$ such that $\|x_{0}\|=1$ . By (3), there exists a $y_{0}\in B^{\circ}$ so that $\langle x_{0},y_{0}\rangle=1$ . Define the half-space

[TABLE]

Then, by symmetry,

[TABLE]

For any $\alpha\in(0,1)$ we now define

[TABLE]

Then $A_{+}(\alpha)\subset B^{\circ}$ , and

[TABLE]

Moreover

[TABLE]

and

[TABLE]

Letting $A_{-}(\alpha)=-A_{+}(\alpha)$ , by symmetry one has $A_{-}(\alpha)\subset B^{\circ}$ , and

[TABLE]

Thus

[TABLE]

The last expression is maximized by $\alpha=\frac{d}{n+d}$ , yielding the leftmost inequality in (2) in the statement of the theorem. Finally, note that,

[TABLE]

as required. ∎

For the proof of the second part of the theorem, we need a result concerning finite moment sequences of signed measures, given as the next lemma.

Lemma 3.2 ( [43], see Lemma 3.1 in [44] for a simple proof).

Let $\Omega\subset\mathbb{R}^{n}$ be Lebesgue-measurable, and let $\mu$ be the normalized Lebesgue measure on $\Omega$ , i.e. $\mu(\Omega)=1$ . Denote the moments of $\mu$ by

[TABLE]

where $x^{\alpha}:=\prod_{i}x_{i}^{\alpha_{i}}$ if $x=(x_{1},\ldots,x_{n})$ and $\alpha=(\alpha_{1},\ldots,\alpha_{n})$ . Let $S\subset\mathbb{N}_{0}^{n}$ be finite. Then there exists an atomic probability measure, say $\mu^{\prime}$ , supported on at most $|S|$ points in $\Omega$ , such that

[TABLE]

We may now prove part 2 of Theorem 3.1.

Proof of (ii) of Theorem 3.1.

Let $\mu$ be the normalized Lebesgue measure on $B^{\circ}$ , i.e.

[TABLE]

We now use the multinomial theorem

[TABLE]

with the multinomial notation

[TABLE]

to rewrite $f_{d}$ in (4) as

[TABLE]

where $m_{\mu}(\alpha)$ is the moment of order $\alpha$ of $\mu$ , as defined in (8).

By Lemma 3.2, there exist $\bar{y}^{(1)},\ldots,\bar{y}^{(p)}\in B^{\circ}$ with $p={d+n-1\choose d}$ so that

[TABLE]

for some $\lambda_{j}\geq 0$ $(j=1,\ldots,p)$ with $\sum_{j=1}^{p}\lambda_{j}=1$ . Substituting in (9), one has

[TABLE]

as required. ∎

4 Semidefinite programming-based approximations of polynomial norms

4.1 Complexity

It is natural to ask whether testing if the $d^{th}$ root of a given degree- $d$ form is a norm can be done in polynomial time. In the next theorem, we show that, unless $P=NP$ , this is not the case even when $d=4$ .

Theorem 4.1.

Deciding whether the $4^{th}$ root of a quartic form is a norm is strongly NP-hard.

Proof.

The proof of this result is adapted from a proof in [6]. Recall that the CLIQUE problem can be described thus: given a graph $G=(V,E)$ and a positive integer $k$ , decide whether $G$ contains a clique of size at least $k$ . The CLIQUE problem is known to be NP-hard [20]. We will give a reduction from CLIQUE to the problem of testing convexity and positive definiteness of a quartic form. The result then follows from Theorem 2.1. Let $\omega(G)$ be the clique number of the graph at hand, i.e., the number of vertices in a maximum clique of $G$ . Consider the following quartic form

[TABLE]

In [6], using in part a result in [31], it is shown that

[TABLE]

is convex and $b(x;y)$ is positive semidefinite. Here, $\gamma$ is a positive constant defined as the largest coefficient in absolute value of any monomial present in some entry of the matrix $\left[\frac{\partial^{2}b(x;y)}{\partial x_{i}\partial y_{j}}\right]_{i,j}$ . As $\sum_{i}x_{i}^{4}+\sum_{i}y_{i}^{4}$ is positive definite and as we are adding this term to a positive semidefinite expression, the resulting polynomial is positive definite. Hence, the equivalence holds if and only if the quartic on the righthandside of the equivalence in (10) is convex and positive definite. ∎

Note that this also shows that strict convexity is hard to test for quartic forms (this is a consequence of Theorem 2.1). A related result is Proposition 3.5. in [6], which shows that testing strict convexity of a polynomial of even degree $d\geq 4$ is hard. However, this result is not shown there for forms, hence the relevance of the previous theorem.

Theorem 4.1 rules out the possibility of a pseudo-polynomial time characterization of polynomial norms (unless P=NP) and motivates the study of tractable sufficient conditions. The sufficient conditions we consider next are based on semidefinite programming. Semidefinite programs can be solved to arbitrary accuracy in polynomial time [45] and technology for solving this class of problems is rapidly improving [9, 34, 30, 46, 21, 42, 5].

4.2 Sum of squares polynomials and semidefinite programming review

We start this section by reviewing the notion of sum of squares polynomials and related concepts such as sum of squares-convexity. We say that a polynomial $f$ is a sum of squares (sos) if $f(x)=\sum_{i}q_{i}^{2}(x)$ , for some polynomials $q_{i}$ . Being a sum of squares is a sufficient condition for being nonnegative. The converse however is not true, as is exemplified by the Motzkin polynomial

[TABLE]

which is nonnegative but not a sum of squares [33]. The sum of squares condition is a popular surrogate for nonnegativity due to its tractability. Indeed, while testing nonnegativity of a polynomial of degree greater or equal to 4 is a hard problem, testing whether a polynomial is a sum of squares can be done using semidefinite programming. This comes from the fact that a polynomial $p$ of degree $d$ is a sum of squares if and only if there exists a positive semidefinite matrix $Q$ such that $f(x)=z(x)^{T}Qz(x)$ , where $z(x)$ is the standard vector of monomials of degree up to $d$ (see, e.g., [35]). As a consequence, any optimization problem over the coefficients of a set of polynomials which includes a combination of affine constraints and sos constraints on these polynomials, together with a linear objective can be recast as a semidefinite program. These type of optimization problems are known as sos programs and have found widespread applications in recent years [36, 29, 25, 24].

Though not all nonnegative polynomials can be written as sums of squares, the following theorem by Artin [10] circumvents this problem using sos multipliers.

Theorem 4.2 (Artin [10]).

For any nonnegative polynomial $f$ , there exists an sos polynomial $q$ such that $q\cdot f$ is sos.

This theorem in particular implies that if we are given a polynomial $f$ , then we can always check its nonnegativity using an sos program that searches for $q$ (of a fixed degree). However, this result does not allow us to optimize over the set of nonnegative polynomials or positive semidefinite polynomial matrices using an sos program (as far as we know). This is because, in that setting, products of decision varibles arise from multiplying polynomials $f$ and $q$ , whose coefficients are decision variables.

By adding further assumptions on $f$ , Reznick showed in [39] that one could further pick $q$ to be a power of $\sum_{i}x_{i}^{2}$ (we refer the reader to [12, Chapter 1, Section 3] for another nice presentation of Reznick’s result). In what follows, $S^{n-1}$ denotes the unit sphere in $\mathbb{R}^{n}.$

Theorem 4.3 (Reznick [39]).

Let $f$ be a positive definite form of degree $d$ in $n$ variables and define

[TABLE]

If $r\geq\frac{nd(d-1)}{4\log(2)\epsilon(f)}-\frac{n+d}{2}$ , then $(\sum_{i=1}^{n}x_{i}^{2})^{r}\cdot f$ is a sum of squares.

Motivated by this theorem, the notion of $r$ -sos polynomials can be defined: a polynomial $f$ is said to be $r$ -sos if $(\sum_{i}x_{i}^{2})^{r}\cdot f$ is sos. Note that it is clear that any $r$ -sos polynomial is nonnegative and that the set of $r$ -sos polynomials is included in the set of $(r+1)$ -sos polynomials. The Motzkin polynomial in (11) for example is $1$ -sos although not sos.

To end our review, we briefly touch upon the concept of sum of squares-convexity (sos-convexity), which we will build upon in the rest of the section. Let $H_{f}$ denote the Hessian matrix of a polynomial $f$ . We say that $f$ is sos-convex if $y^{T}H_{f}(x)y$ is a sum of squares (as a polynomial in $x$ and $y$ ). As before, optimizing over the set of sos-convex polynomials can be cast as a semidefinite program. Sum of squares-convexity is obviously a sufficient condition for convexity via the second-order characterization of convexity. However, there are convex polynomials which are not sos-convex (see, e.g., [7]). For a more detailed overview of sos-convexity including equivalent characterizations and settings in which sos-convexity and convexity are equivalent, refer to [8].

4.2.1 Notation

Throughout, we will use the notation $H_{n,d}$ (resp. $P_{n,d}$ ) to denote the set of forms (resp. positive semidefinite, aka nonnegative, forms) in $n$ variables and of degree $d$ . We will futhermore use the falling factorial notation $(t)_{0}=1$ and $(t)_{k}=t(t-1)\ldots(t-(k-1))$ for a positive integer $k$ .

4.3 Certifying validity of a polynomial norm

In this subsection, we assume that we are given a form $f$ of degree $d$ and we would like to prove that $f^{1/d}$ is a norm using semidefinite programming.

Theorem 4.4.

Let $f$ be a degree- $d$ form. Then $f^{1/d}$ is a polynomial norm if and only if there exist $c>0$ , $r\in\mathbb{N}$ , and an sos form $q(x)$ such that $q(x)\cdot y^{T}H_{f}(x)y$ is sos and $\left(f(x)-c(\sum_{i}x_{i}^{2})^{d/2}\right)(\sum_{i}x_{i}^{2})^{r}$ is sos. Furthermore, this condition can be checked using semidefinite programming.

To show this result, we require a counterpart to Theorem 4.2 for matrices, which we present below. We say that a polynomial matrix $H(x)$ , i.e., a matrix whose entries are polynomials in $x=(x_{1},\ldots,x_{n})$ , is positive semidefinite if it has nonnegative eigenvalues for all substitutions $x\in\mathbb{R}^{n}$ .

Proposition 4.5 ([37, 22, 27]).

If $H(x)$ is a positive semidefinite polynomial matrix, then there exists a sum of squares polynomial $q(x)$ such that $q(x)\cdot y^{T}H(x)y$ is a sum of squares.

Proof.

This is an immediate consequence of a theorem by Procesi and Schacher [37] and independently Gondard and Ribenboim [22], reproven by Hillar and Nie [27]. This theorem states that if $H(x)$ is a symmetric polynomial matrix that is positive semidefinite for all $x\in\mathbb{R}^{n},$ then

[TABLE]

where the matrices $A_{i}(x)$ are symmetric and have rational functions as entries. Let $p(x)$ be the polynomial obtained by multiplying all denominators of the rational functions involved in any of the matrices $A_{i}$ . Note that $p(x)^{2}\cdot y^{T}H(x)y$ is a sum of squares as

[TABLE]

However, $p(x)\cdot A_{i}(x)$ is now a matrix with polynomial entries, which gives the result. ∎

We remark that this result does not immediately follow from the theorem given by Artin as the multiplier $q$ does not depend on $x$ and $y$ , but solely on $x$ . We now prove Theorem 4.4.

Proof of Theorem 4.4.

It is immediate to see that if there exist such a $c$ , $r$ , and $q$ , then $f$ is convex and positive definite. From Theorem 2.1, this means that $f^{1/d}$ is a polynomial norm.

Conversely, if $f^{1/d}$ is a polynomial norm, then, by Theorem 2.1, $f$ is convex and positive definite. As $f$ is convex, the polynomial $y^{T}H_{f}(x)y$ is nonnegative. Using Proposition 4.5, we conclude that there exists an sos polynomial $q(x)$ such that $q(x,y)\cdot y^{T}H_{f}(x)y$ is sos. We now show that, as $f$ is positive definite, there exist $c>0$ and $r\in\mathbb{N}$ such that $\left(f(x)-c(\sum_{i}x_{i}^{2})^{d/2}\right)(\sum_{i}x_{i}^{2})^{r}$ is sos. Let $f_{min}$ denote the minimum of $f$ on the sphere. As $f$ is positive definite, $f_{min}>0.$ We take $c\mathrel{\mathop{:}}=\frac{f_{min}}{2}$ and consider $g(x)\mathrel{\mathop{:}}=f(x)-c(\sum_{i}x_{i}^{2})^{d/2}$ . We have that $g$ is a positive definite form: indeed, if $x$ is a nonzero vector in $\mathbb{R}^{n}$ , then

[TABLE]

by homogeneity of $f$ and definition of $c$ . Using Theorem 4.3, $\exists r\in\mathbb{N}$ such that $g(x)(\sum_{i}x_{i}^{2})^{r}$ is sos.

For fixed $r$ , a given form $f$ , and a fixed degree $d$ , one can search for $c>0$ and an sos form $q$ of degree $d$ such that $q(x)\cdot y^{T}H_{f}(x)y$ is sos and $\left(f(x)-c(\sum_{i}x_{i}^{2})^{d/2}\right)(\sum_{i}x_{i}^{2})^{r}$ is sos using semidefinite programming. This is done by solving the following semidefinite feasibility problem:

[TABLE]

where the unknowns are the coefficients of $q$ and the real number $c$ . ∎

Remark 4.6.

We remark that we are not imposing $c>0$ in the semidefinite program above. This is because, in practice, especially if the semidefinite program is solved with interior point methods, the solution returned by the solver will be in the interior of the feasible set, and hence $c$ will automatically be positive. One can slightly modify (12) however to take the constraint $c>0$ into consideration explicitly. Indeed, consider the following semidefinite feasibility problem where both the degree of $q$ and the integer $r$ are fixed:

[TABLE]

It is easy to check that (13) is feasible with $\gamma\geq 0$ if and only if the last constraint of (12) is feasible with $c>0$ . To see this, take $c=1/\gamma$ and note that $\gamma$ can never be zero.

To the best of our knowledge, we cannot use the approach described in Theorem 4.4 to optimize over the set of polynomial norms with a semidefinite program. This is because of the product of decision variables in the coefficients of $f$ and $q$ . The next subsection will address this issue.

4.4 Optimizing over the set of polynomial norms

In this subsection, we consider the problem of optimizing over the set of polynomial norms. To do this, we introduce the concept of $r$ -sos-convexity. Recall that the notation $H_{f}$ references the Hessian matrix of a form $f$ .

4.4.1 Positive definite biforms and r-sos-convexity

Definition 4.7.

For an integer $r$ , we say that a polynomial $f$ is $r$ -sos-convex if $y^{T}H_{f}(x)y\cdot(\sum_{i}x_{i}^{2})^{r}$ is sos.

Observe that, for fixed $r$ , the property of $r$ -sos-convexity can be checked using semidefinite programming (though the size of this SDP gets larger as $r$ increases). Any polynomial that is $r$ -sos-convex is convex. Note that the set of $r$ -sos-convex polynomials is a subset of the set of $(r+1)$ -sos-convex polynomials and that the case $r=0$ corresponds to the set of sos-convex polynomials. We remark that $f_{d}$ in Theorem 3.1 is in fact sos-convex, since it is a sum of squares of linear forms. Thus Theorem 3.1 implies that any norm on $\mathbb{R}^{n}$ may be approximated arbitrarily well by a polynomial norm that corresponds to a sos-convex form.

It is natural to ask whether any convex polynomial $f$ is $r$ -sos-convex for some $r$ . Our next theorem shows that this is the case provided that the biform $y^{T}H_{f}(x)y$ , where $H_{f}(x)$ is the Hessian of $f$ , is positive over the bi-sphere.

Theorem 4.8.

Let $f$ be a form of degree $d$ such that $y^{T}H_{f}(x)y>0$ for $(x,y)\in S^{n-1}\times S^{n-1}$ . Let

[TABLE]

If $r\geq\frac{n(d-2)(d-3)}{4\log(2)\eta(f)}-\frac{n+d-2}{2}-d$ , then $f$ is $r$ -sos-convex.

Remark 4.9.

Note that $\eta(f)$ can also be interpreted as

[TABLE]

Remark 4.10.

Theorem 4.8 is a generalization of Theorem 4.3 by Reznick. Note though that this is not an immediate generalization. First, $y^{T}H_{f}(x)y$ is not a positive definite form (consider, e.g., $y=0$ and any nonzero $x$ ). Secondly, note that the multiplier is $(\sum_{i}x_{i}^{2})^{r}$ and does not involve the $y$ variables. (As we will see in the proof, this is essentially because $y^{T}H_{f}(x)y$ is quadratic in $y$ .) It is not immediate that the multiplier should have this specific form. From Theorem 4.3, it may perhaps seem more natural that the multiplier be $(\sum_{i}x_{i}^{2}+\sum_{i}y_{i}^{2})^{r}$ . It turns out in fact that such a multiplier would not give us the correct property (contrarily to $(\sum_{i}x_{i}^{2})^{r}$ ) as there exist forms $f$ whose Hessian is positive definite for all $x$ but for which the form

[TABLE]

is not sos for any $r$ . For a specific example, consider the form $f$ in $3$ variables and of degree $8$ given in [7, Theorem 3.2]. It is shown in [7] that (i) $f$ is convex; (ii) the $(1,1)$ entry of the Hessian of $f$ , $H_{f}^{1,1}$ , is not sos; and (iii) $(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})y^{T}H_{f}(x)y$ is sos.

Suppose for the sake of contradiction that

[TABLE]

is sos for some $r$ . This implies that the polynomial

[TABLE]

should be sos for any $\alpha$ as it is obtained from $q_{r}$ by setting $y=(\alpha,0,0)^{T}$ . Expanding this out, we get

[TABLE]

From arguments (ii) and (iii) above, we know that $H_{f}^{1,1}(x)$ is not sos but

[TABLE]

is sos. Fixing $\alpha$ large enough, we can ensure that $q_{r}(x,\alpha,0,0)$ is not sos. This contradicts our previous assumption.

Remark 4.11.

Theorem 4.8 can easily be adapted to biforms of the type $\sum_{j}f_{j}(x)g_{j}(y)$ where $f_{j}$ ’s are forms of degree $d$ in $x$ and $g_{j}$ ’s are forms of degree $\tilde{d}$ in $y$ . In this case, there exist integers $s,r$ such that

[TABLE]

is sos. For the purposes of this paper however and the connection to polynomial norms, we will show the result in the particular case where the biform of interest is $y^{T}H_{f}(x)y.$

We associate to any form $f\in H_{n,d}$ , the $d$ -th order differential operator $f(D)$ , defined by replacing each occurence of $x_{j}$ with $\frac{\partial}{\partial x_{j}}$ . For example, if $f(x_{1},\ldots,x_{n})\mathrel{\mathop{:}}=\sum_{i}c_{i}x_{1}^{a_{1}^{i}}\ldots x_{n}^{a_{i}^{n}}$ where $c_{i}\in\mathbb{R}$ and $a_{j}^{i}\in\mathbb{N}$ , then its differential operator will be

[TABLE]

Our proof will follow the structure of the proof of Theorem 4.3 given in [39] and reutilize some of the results given in the paper which we quote here for clarity of exposition.

Proposition 4.12 ([39], see Proposition 2.6).

For any nonnegative integer $r$ , there exist nonnegative rationals $\lambda_{k}$ and integers $\alpha_{kl}$ such that

[TABLE]

For simplicity of notation, we will let $\alpha_{k}\mathrel{\mathop{:}}=(\alpha_{k1},\ldots,\alpha_{kl})^{T}$ and $x\mathrel{\mathop{:}}=(x_{1},\ldots,x_{n})^{T}$ . Hence, we will write $\sum_{k}\lambda_{k}(\alpha_{k}^{T}x)^{2r}$ to mean $\sum_{k}\lambda_{k}(a_{k1}x_{1}+\ldots+a_{kn}x_{n})^{2r}$ .

Proposition 4.13 ([39], see Proposition 2.8).

If $g\in H_{n,e}$ and $h=\sum_{k}\lambda_{k}(\alpha_{k}^{T}x)^{d+e}\in H_{n,d+e}$ , then

[TABLE]

Proposition 4.14 ([39], see Theorems 3.7 and 3.9).

For $f\in H_{n,d}$ and $s\geq d$ , we define $\Phi_{s}(f)\in H_{n,d}$ by

[TABLE]

The inverse $\Phi_{s}^{-1}(f)$ of $\Phi_{s}(f)$ exists and this is a map verifying $\Phi_{s}(\Phi_{s}^{-1}(f))=f.$

Proposition 4.15 ([39], see Theorem 3.12 ).

Suppose $f$ is a positive definite form in $n$ variables and of degree $d$ and let

[TABLE]

If $s\geq\frac{nd(d-1)}{4\log(2)\epsilon(f)}-\frac{n-d}{2}$ , then $\Phi^{-1}_{s}(f)\in P_{n,d}.$

We will focus throughout the proof on biforms of the following structure

[TABLE]

where $p_{ij}(x)\in H_{n,d}$ , for all $i,j$ , and some even integer $d$ . Note that the polynomial $y^{T}H_{f}(x)y$ (where $f$ is some form) has this structure. We next present three lemmas which we will then build on to give the proof of Theorem 4.8.

Lemma 4.16.

For a biform $F(x;y)$ of the structure in (15), define the operator $F(D;y)$ as

[TABLE]

If $F(x;y)$ is positive semidefinite (i.e., $F(x;y)\geq 0,~{}\forall x,y$ ), then, for any $s\geq 0$ , the biform

[TABLE]

is a sum of squares.

Proof.

Using Proposition 4.12, we have

[TABLE]

where $\lambda_{l}\geq 0$ and $\alpha_{l}\in\mathbb{Z}^{n}.$ Hence, applying Proposition 4.13, we get

[TABLE]

Notice that $\sum_{i,j}y_{i}y_{j}p_{ij}(\alpha_{l})$ is a quadratic form in $y$ which is positive semidefinite by assumption, which implies that it is a sum of squares (as a polynomial in $y$ ). Furthermore, as $\lambda_{l}\geq 0~{}\forall l$ and $(\alpha_{l}^{T}x)^{2s-d}$ is an even power of a linear form, we have that $\lambda_{l}(\alpha_{l}^{T}x)^{2s-d}$ is a sum of squares (as a polynomial in $x$ ). Combining both results, we get that (16) is a sum of squares. ∎

We now extend the concept introduced by Reznick in Proposition 4.14 to biforms.

Lemma 4.17.

For a biform $F(x;y)$ of the structure as in (15), we define the biform $\Psi_{s,x}(F(x;y))$ as

[TABLE]

where $\Phi_{s}$ is as in (14). Define

[TABLE]

where $\Phi_{s}^{-1}$ is the inverse of $\Phi_{s}$ . Then, we have

[TABLE]

and

[TABLE]

Proof.

We start by showing that (17) holds:

[TABLE]

We now show that (18) holds:

[TABLE]

∎

Lemma 4.18.

For a biform $F(x;y)$ of the structure in (15), which is positive on the bisphere, let

[TABLE]

If $s\geq\frac{nd(d-1)}{4\log(2)\eta(F)}-\frac{n-d}{2}$ , then $\Psi_{s,x}^{-1}(F)$ is positive semidefinite.

Proof.

Fix $y\in S^{n-1}$ and consider $F_{y}(x)=F(x;y)$ , which is a positive definite form in $x$ of degree $d$ . From Proposition 4.15, if

[TABLE]

then $\Phi^{-1}_{s}(F_{y})$ is positive semidefinite. As $\eta(F)\leq\epsilon(F_{y})$ for any $y\in S^{n-1}$ , we have that if

[TABLE]

then $\Phi_{s}^{-1}(F_{y})$ is positive semidefinite, regardless of the choice of $y.$ Hence, $\Psi_{s,x}^{-1}(F)$ is positive semidefinite (as a function of $x$ and $y$ ).

∎

Proof of Theorem 4.8.

Let $F(x;y)=y^{T}H_{f}(x)y$ , let $r\geq\frac{n(d-2)(d-3)}{4\log(2)\eta(f)}-\frac{n+d-2}{2}-d$ , and let

[TABLE]

We know by Lemma 4.18 that $G(x;y)$ is positive semidefinite. Hence, using Lemma 4.16, we get that

[TABLE]

is sos. Lemma 4.17 then gives us:

[TABLE]

As a consequence, $F(x;y)(x_{1}^{2}+\ldots+x_{n}^{2})^{r}$ is sos.

∎

The last theorem of this section shows that one cannot bound the integer $r$ in Theorem 4.8 as a function of $n$ and $d$ only.

Theorem 4.19.

For any integer $r\geq 0$ , there exists a form $f$ in 3 variables and of degree 8 such that $H_{f}(x)\succ 0,\forall x\neq 0$ , but $f$ is not $r$ -sos-convex.

Proof.

Consider the trivariate octic:

[TABLE]

It is shown in [7] that $f$ has positive definite Hessian, and that the $(1,1)$ entry of $H_{f}(x)$ , which we will denote by $H_{f}^{(1,1)}(x)$ , is 1-sos but not sos. We will show that for any $r\in\mathbb{N}$ , one can find $s\in\mathbb{N}\backslash\{0\}$ such that

[TABLE]

satisfies the conditions of the theorem.

We start by showing that for any $s$ , $g_{s}$ has positive definite Hessian. To see this, note that for any $(x_{1},x_{2},x_{3})\neq 0,(y_{1},y_{2},y_{3})\neq 0$ , we have:

[TABLE]

As $y^{T}H_{f}(x)y>0$ for any $x\neq 0,y\neq 0$ , this is in particular true when $x=(x_{1},sx_{2},sx_{3})$ and when $y=(y_{1},sy_{2},sy_{3})$ , which gives us that the Hessian of $g_{s}$ is positive definite for any $s\in\mathbb{N}\backslash\{0\}.$

We now show that for a given $r\in\mathbb{N}$ , there exists $s\in\mathbb{N}$ such that $(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{r}y^{T}H_{g_{s}}(x)y$ is not sos. We use the following result from [40, Theorem 1]: for any positive semidefinite form $p$ which is not sos, and any $r\in\mathbb{N}$ , there exists $s\in\mathbb{N}\backslash\{0\}$ such that $(\sum_{i=1}^{n}x_{i}^{2})^{r}\cdot p(x_{1},sx_{2},\ldots,sx_{n})$ is not sos. As $H_{f}^{(1,1)}(x)$ is 1-sos but not sos, we can apply the previous result. Hence, there exists a positive integer $s$ such that

[TABLE]

is not sos. This implies that $(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{r}\cdot y^{T}H_{g_{s}}(x)y$ is not sos. Indeed, if $(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{r}\cdot y^{T}H_{g_{s}}(x)y$ was sos, then $(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{r}\cdot y^{T}H_{g_{s}}(x)y$ would be sos with $y=(1,0,0)^{T}.$ But, we have

[TABLE]

which is not sos. Hence, $(x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{r}\cdot y^{T}H_{g_{s}}(x)y$ is not sos, and $g$ is not $r$ -sos-convex. ∎

Remark 4.20.

Any form $f$ with $H_{f}(x)\succ 0,\forall x\neq 0$ is strictly convex but the converse is not true.

To see this, note that any form $f$ of degree $d$ with a positive definite Hessian is convex (as $H_{f}(x)\succeq 0,\forall x$ ) and positive definite (as, from a recursive application of Euler’s theorem on homogeneous functions, $f(x)=\frac{1}{d(d-1)}x^{T}H_{f}(x)x$ ). From the proof of Theorem 2.1, this implies that $f$ is strictly convex.

To see that the converse statement is not true, consider the strictly convex form $f(x_{1},x_{2})\mathrel{\mathop{:}}=x_{1}^{4}+x_{2}^{4}$ . We have

[TABLE]

which is not positive definite e.g., when $x=(1,0)^{T}$ .

4.4.2 Optimizing over a subset of polynomial norms with $r$ -sos-convexity

In the following theorem, we give a semidefinite programming-based hierarchy for optimizing over the set of forms $f$ with $H_{f}(x)\succ 0$ , $\forall x\neq 0.$ Comparatively to Theorem 4.4, this theorem allows us to impose as a constraint that the $d^{th}$ root of a form be a norm, rather than simply testing whether it is. This comes at a cost however: in view of Remark 4.20 and Theorem 2.1, we are no longer considering all polynomial norms, but a subset of them whose $d^{th}$ power has a positive definite Hessian.

Theorem 4.21.

Let $f$ be a degree- $d$ form. Then $H_{f}(x)\succ 0,\forall x\neq 0$ if and only if $\exists c>0,r\in\mathbb{N}$ such that $f(x)-c(\sum_{i}x_{i}^{2})^{d/2}$ is $r$ -sos-convex. Furthermore, this condition can be imposed using semidefinite programming.

Proof.

If there exist $c>0,r\in\mathbb{N}$ such that $g(x)=f(x)-c(\sum_{i}x_{i}^{2})^{d/2}$ is $r$ -sos-convex, then $y^{T}H_{g}(x)y\geq 0$ , $\forall x,y.$ As the Hessian of $(\sum_{i}x_{i}^{2})^{d/2}$ is positive definite for any nonzero $x$ and as $c>0$ , we get $H_{f}(x)\succ 0$ , $\forall x\neq 0.$

Conversely, if $H_{f}(x)\succ 0$ , $\forall x\neq 0$ , then $y^{T}H_{f}(x)y>0$ on the bisphere (and conversely). Let

[TABLE]

We know that $f_{\min}$ is attained and is positive. Take $c\mathrel{\mathop{:}}=\frac{f_{\min}}{2d(d-1)}$ and consider

[TABLE]

Then

[TABLE]

Note that, by Cauchy-Schwarz, we have $(\sum_{i}x_{i}y_{i})^{2}\leq||x||^{2}||y||^{2}$ . If $||x||=||y||=1$ , we get

[TABLE]

Hence, $H_{g}(x)\succ 0,\forall x\neq 0$ and there exists $r$ such that $g$ is $r$ -sos-convex from Theorem 4.8.

For fixed $r$ , the condition that there be $c>0$ such that $f(x)-c(\sum_{i}x_{i}^{2})^{d/2}$ is $r$ -sos-convex can be imposed using semidefinite programming. This is done by searching for coefficients of a polynomial $f$ and a real number $c$ such that

[TABLE]

Note that both of these conditions can be imposed using semidefinite programming. ∎

Remark 4.22.

Note that we are not imposing $c>0$ in the above semidefinite program. As mentioned in Section 4.3, this is because in practice the solution returned by interior point solvers will be in the interior of the feasible set.

In the special case where $f$ is completely free222This is the case of our two applications in Section 5. (i.e., when there are no additional affine conditions on the coefficients of $f$ ), one can take $c\geq 1$ in (19) instead of $c\geq 0$ . Indeed, if there exists $c>0$ , an integer $r$ , and a polynomial $f$ such that $f-c(\sum_{i}x_{i}^{2})^{d/2}$ is $r$ -sos-convex, then $\frac{1}{c}f$ will be a solution to (19) with $c\geq 1$ replacing $c\geq 0$ .

5 Applications

5.1 Norm approximation and regression

In this section, we study the problem of approximating a (non-polynomial) norm by a polynomial norm. We consider two different types of norms: $p$ -norms with $p$ noneven (and greater than 1) and gauge norms with a polytopic unit ball. For $p$ -norms, we use as an example $||(x_{1},x_{2})^{T}||=(|x_{1}|^{7.5}+|x_{2}|^{7.5})^{1/7.5}$ . For our polytopic gauge norm, we randomly generate an origin-symmetric polytope and produce a norm whose 1-sublevel corresponds to that polytope. This allows us to determine the value of the norm at any other point by homogeneity (see [16, Exercise 3.34] for more information on gauge norms, i.e., norms defined by convex, full-dimensional, origin-symmetric sets). To obtain our approximations, we proceed in the same way in both cases. We first sample $N=200$ points $x_{1},\ldots,x_{N}$ uniformly at random on the sphere $S^{n-1}$ . We then solve the following optimization problem with $d$ fixed:

[TABLE]

Problem (20) can be written as a semidefinite program as the objective is a convex quadratic in the coefficients of $f$ and the constraint has a semidefinite representation as discussed in Section 4.2. The solution $f$ returned is guaranteed to be convex. Moreover, any sos-convex form is sos (see [23, Lemma 8]), which implies that $f$ is nonnegative. One can numerically check to see if the optimal polynomial is in fact positive definite (for example, by checking the eigenvalues of the Gram matrix of a sum of squares decomposition of $f$ ). If that is the case, then, by Theorem 2.1, $f^{1/d}$ is a norm. Futhermore, note that we have

[TABLE]

where the first inequality is a consequence of concavity of $z\mapsto z^{1/d}$ and the second is a consequence of the inequality $|x-y|^{1/d}\geq||x|^{1/d}-|y|^{1/d}|$ . This implies that if the optimal value of (20) is equal to $\epsilon$ , then the sum of the squared differences between $||x_{i}||$ and $f^{1/d}(x_{i})$ over the sample is less than or equal to $N\cdot(\frac{\epsilon}{N})^{1/d}$ .

It is worth noting that in our example, we are actually searching over the entire space of polynomial norms of a given degree. Indeed, as $f$ is bivariate, it is convex if and only if it is sos-convex [8]. In Figure 1, we have drawn the 1-level sets of the initial norm (either the $p$ -norm or the polytopic gauge norm) and the optimal polynomial norm obtained via (20) with varying degrees $d$ . Note that when $d$ increases, the approximation improves.

A similar method could be used for norm regression. In this case, we would have access to data points $x_{1},\ldots,x_{N}$ corresponding to noisy measurements of an underlying unknown norm function. We would then solve the same optimization problem as the one given in (20) to obtain a polynomial norm that most closely approximates the noisy data.

5.2 Joint spectral radius and stability of linear switched systems

As a second application, we revisit a result from one of the authors and Jungers from [1, 3] on finding upper bounds on the joint spectral radius of a finite set of matrices. We first review a few notions relating to dynamical systems and linear algebra. The spectral radius $\rho$ of a matrix $A$ is defined as

[TABLE]

The spectral radius happens to coincide with the eigenvalue of $A$ of largest magnitude. Consider now the discrete-time linear system $x_{k+1}=Ax_{k}$ , where $x_{k}$ is the $n\times 1$ state vector of the system at time $k$ . This system is said to be asymptotically stable if for any initial starting state $x_{0}\in\mathbb{R}^{n}$ , $x_{k}\rightarrow 0,$ when $k\rightarrow\infty.$ A well-known result connecting the spectral radius of a matrix to the stability of a linear system states that the system $x_{k+1}=Ax_{k}$ is asymptotically stable if and only if $\rho(A)<1$ .

In 1960, Rota and Strang introduced a generalization of the spectral radius to a set of matrices. The joint spectral radius (JSR) of a set of matrices $\mathcal{A}\mathrel{\mathop{:}}=\{A_{1},\ldots,A_{m}\}$ is defined as

[TABLE]

Analogously to the case where we have just one matrix, the value of the joint spectral radius can be used to determine stability of a certain type of system, called a switched linear system. A switched linear system models an uncertain and time-varying linear system, i.e., a system described by the dynamics

[TABLE]

where the matrix $A_{k}$ varies at each iteration within the set $\mathcal{A}$ . As done previously, we say that a switched linear system is asymptotically stable if $x_{k}\rightarrow 0$ when $k\rightarrow\infty$ , for any starting state $x_{0}\in\mathbb{R}^{n}$ and any sequence of products of matrices in $\mathcal{A}$ . One can establish that the switched linear system $x_{k+1}=A_{k}x_{k}$ is asymptotically stable if and only if $\rho(\mathcal{A})<1$ [28].

Switched linear systems are typically used to model situations where the dynamics of a system are thought to be linear, but the matrix $A_{k}\in\mathcal{A}$ associated to the linear dynamics $x_{k+1}=A_{k}x_{k}$ is unknown and time-varying. Consider, e.g., the task of stabilizing a drone in a windy environment. By linearizing its dynamics around a desired equilibrium point, the behavior of the drone can be modeled locally by a linear dynamical system. However, as this linear dynamical system is unknown due to parameter uncertainty and modeling error, and time-varying due to the effect of the wind, the drone’s behavior is better modeled by a switched linear system.

Consequently, a natural question is whether one can efficiently test if $\rho(\mathcal{A})<1$ and hence determine if the corresponding switched linear system is asymptotically stable. Unlike the setting of linear systems, where one can decide whether the spectral radius of a matrix is less than one in polynomial time, it is not known whether the problem of testing if $\rho(\mathcal{A})<1$ is even decidable. The related question of testing whether $\rho(\mathcal{A})\leq 1$ is known to be undecidable, already when $\mathcal{A}$ contains only 2 matrices [15]. With this result in mind, it comes as no surprise that, e.g., stability of a switched linear system is not implied by all individual matrices in $\mathcal{A}$ having spectral radius less than one. This is easy to see on an example: consider the set of matrices $\mathcal{A}$ given by

[TABLE]

Observe that the spectral radii of $A_{1}$ and $A_{2}$ are zero, which is less than one. However

[TABLE]

and so $\rho(\mathcal{A})$ is lower bounded by $2>1$ , and the switched linear system is not stable.

An active area of research has consequently been to obtain sufficient conditions for the JSR to be strictly less than one, which, for example, can be checked using convex optimization. The theorem that we revisit below is a result of this type. We start first by recalling a classical theorem regarding stability of a linear system.

Theorem 5.1 (see, e.g., Theorem 8.4 in [26]).

Let $A\in\mathbb{R}^{n\times n}$ . Then, $\rho(A)<1$ if and only if there exists a contracting quadratic norm; i.e., a function $V:\mathbb{R}^{n}\rightarrow\mathbb{R}$ of the form $V(x)=\sqrt{x^{T}Qx}$ with $Q\succ 0$ , such that $V(Ax)<V(x),\forall x\neq 0.$

The next theorem (from [1, 3]) can be viewed as an extension of Theorem 5.1 to the joint spectral radius of a finite set of matrices. It is known that the existence of a contracting quadratic norm is no longer necessary for stability in this case. This theorem shows however that the existence of a contracting polynomial norm is.

Theorem 5.2 (adapted from [1, 3], Theorem 3.2 ).

Let $\mathcal{A}\mathrel{\mathop{:}}=\{A_{1},\ldots,A_{m}\}$ be a family of $n\times n$ matrices. Then, $\rho(A_{1},\ldots,A_{m})<1$ if and only if there exists a contracting polynomial norm; i.e., a function $V(x)=f^{1/d}(x)$ , where $f$ is an n-variate sos-convex and positive definite form of degree $d$ , such that $V(A_{i}x)<V(x),~{}\forall x\neq 0$ and $\forall i=1,\ldots,m.$

We remark that in [2], the authors show that the degree of $f$ cannot be bounded as a function of $m$ and $n$ . This is expected from the undecidability result mentioned before.

Example 5.3.

We consider a modification of Example 5.4. in [4] as an illustration of the previous theorem. We would like to show that the joint spectral radius of the two matrices

[TABLE]

is strictly less that one.

To do this, we search for a nonzero form $f$ of degree $d$ such that

[TABLE]

If problem (23) is feasible for some $d$ , then $\rho(A_{1},A_{2})<1$ . A quick computation using the software package YALMIP [32] and the SDP solver MOSEK [9] reveals that, when $d=2$ or $d=4$ , problem (23) is infeasible. When $d=6$ however, the problem is feasible and we obtain a polynomial norm $V=f^{1/d}$ whose 1-sublevel set is the outer set plotted in Figure 2. We also plot on Figure 2 the images of this 1-sublevel set under $A_{1}$ and $A_{2}$ . Note that both sets are included in the 1-sublevel set of $V$ as expected. From Theorem 5.2, the existence of a polynomial norm implies that $\rho(A_{1},A_{2})<1$ and hence, the pair $\{A_{1},A_{2}\}$ is asymptotically stable.

Remark 5.4.

As mentioned previously, problem (23) is infeasible for $d=4$ . Instead of pushing the degree of $f$ up to 6, one could wonder whether the problem would have been feasible if we had asked that $f$ of degree $d=4$ be $r$ -sos-convex for some fixed $r\geq 1$ . As mentioned before, in the particular case where $n=2$ (which is the case at hand here), the notions of convexity and sos-convexity coincide; see [8]. As a consequence, one can only hope to make problem (23) feasible by increasing the degree of $f$ .

6 Future directions

In this paper, we provided semidefinite programming-based hierarchies for certifying that the $d^{th}$ root of a given degree- $d$ form is a polynomial norm (Section 4.3), and for optimizing over the set of forms with positive definite Hessians (Section 4.4). A clear gap emerged between forms which are strictly convex and those which have a positive definite Hessian, the latter being a sufficient (but not necessary) condition for the former. This leads us to consider the following two open problems.

Open Problem 1.

Does there exist a family of cones $K^{r}_{n,2d}$ that have the following two properties: (i) for each $r$ , optimization of a linear function over $K_{n,2d}^{r}$ can be carried out with semidefinite programming, and (ii) every strictly convex form $f$ in $n$ variables and degree $2d$ belongs to $K_{n,2d}^{r}$ for some $r$ ? We have shown a weaker result, namely the existence of a family of cones that verify (i) and a modified version of (ii), where strictly convex forms are replaced by forms with a positive definite Hessian.

Open Problem 2.

Helton and Nie have shown in [23] that one can optimize a linear function over sublevel sets of forms that have positive definite Hessians with semidefinite programming. Is the same statement true for sublevel sets of all polynomial norms?

On the application side, it might be interesting to investigate how one can use polynomial norms to design regularizers in machine learning applications. Indeed, a very popular use of norms in optimization is as regularizers, with the goal of imposing additional structure (e.g., sparsity or low rank) on optimal solutions. One could imagine using polynomial norms to design regularizers that are based on the data at hand in place of more generic regularizers such as the 1-norm. Regularizer design is a problem that has already been considered (see, e.g., [11, 17]) but not using polynomial norms. This can be worth exploring as we have shown that polynomial norms can approximate any norm with arbitrary accuracy, while remaining differentiable everywhere (except at the origin), which can be beneficial for optimization purposes.

Acknowledgement

The authors would like to thank an anonymous referee for suggesting the proof of the first part of Theorem 3.1, which improves our previous statement by quantifying the quality of the approximation as a function of $n$ and $d$ , and two other anonymous referees for constructive comments that considerably helped improve the draft.

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. A. Ahmadi and R. M. Jungers, SOS-convex Lyapunov functions with applications to nonlinear switched systems , Proceedings of the IEEE Conference on Decision and Control, 2013.
2[2] A. A. Ahmadi and R. M. Jungers, Lower bounds on complexity of Lyapunov functions for switched linear systems , Nonlinear Analysis: Hybrid Systems 21 (2016), 118–129.
3[3] A. A. Ahmadi and R. M. Jungers, SOS-convex Lyapunov functions and stability of nonlinear difference inclusions , (2018), In preparation.
4[4] A. A. Ahmadi, R. M. Jungers, P. A. Parrilo, and M. Roozbehani, Analysis of the joint spectral radius via lyapunov functions on path-complete graphs , Proceedings of the 14th international conference on Hybrid systems: computation and control, ACM, 2011, pp. 13–22.
5[5] A. A. Ahmadi and A. Majumdar, DSOS and SDSOS optimization: more tractable alternatives to sum of squares and semidefinite optimization , Available at ar Xiv:1706.02586 (2017).
6[6] A. A. Ahmadi, A. Olshevsky, P. A. Parrilo, and J. N. Tsitsiklis, NP-hardness of deciding convexity of quartic polynomials and related problems , Mathematical Programming 137 (2013), no. 1-2, 453–476.
7[7] A. A. Ahmadi and P. A. Parrilo, A convex polynomial that is not sos-convex , Mathematical Programming 135 (2012), no. 1-2, 275–292.
8[8] , A complete characterization of the gap between convexity and sos-convexity , SIAM Journal on Optimization 23 (2013), no. 2, 811–833, Also available at ar Xiv:1111.4587.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Polynomial Norms

Abstract

1 Introduction

Outline and contributions.

2 Two equivalent characterizations of polynomial norms

Theorem 2.1**.**

Proof.

3 Approximating norms by polynomial norms

Theorem 3.1**.**

Proof of (i).

Lemma 3.2** ( [43], see Lemma 3.1 in [44] for a simple proof).**

Proof of (ii) of Theorem 3.1.

4 Semidefinite programming-based approximations of polynomial norms

4.1 Complexity

Theorem 4.1**.**

Proof.

4.2 Sum of squares polynomials and semidefinite programming review

Theorem 4.2** (Artin [10]).**

Theorem 4.3** (Reznick [39]).**

4.2.1 Notation

4.3 Certifying validity of a polynomial norm

Theorem 4.4**.**

Proposition 4.5** ([37, 22, 27]).**

Proof.

Proof of Theorem 4.4.

Remark 4.6**.**

4.4 Optimizing over the set of polynomial norms

4.4.1 Positive definite biforms and r-sos-convexity

Definition 4.7**.**

Theorem 4.8**.**

Remark 4.9**.**

Remark 4.10**.**

Remark 4.11**.**

Proposition 4.12** ([39], see Proposition 2.6).**

Proposition 4.13** ([39], see Proposition 2.8).**

Proposition 4.14** ([39], see Theorems 3.7 and 3.9).**

Proposition 4.15** ([39], see Theorem 3.12 ).**

Lemma 4.16**.**

Proof.

Lemma 4.17**.**

Proof.

Lemma 4.18**.**

Proof.

Proof of Theorem 4.8.

Theorem 4.19**.**

Proof.

Remark 4.20**.**

4.4.2 Optimizing over a subset of polynomial norms with rrr-sos-convexity

Theorem 4.21**.**

Proof.

Remark 4.22**.**

5 Applications

5.1 Norm approximation and regression

5.2 Joint spectral radius and stability of linear switched systems

Theorem 5.1** (see, e.g., Theorem 8.4 in [26]).**

Theorem 5.2** (adapted from [1, 3], Theorem 3.2 ).**

Example 5.3**.**

Remark 5.4**.**

6 Future directions

Open Problem 1**.**

Open Problem 2**.**

Acknowledgement

Theorem 2.1.

Theorem 3.1.

Lemma 3.2 ( [43], see Lemma 3.1 in [44] for a simple proof).

Theorem 4.1.

Theorem 4.2 (Artin [10]).

Theorem 4.3 (Reznick [39]).

Theorem 4.4.

Proposition 4.5 ([37, 22, 27]).

Remark 4.6.

Definition 4.7.

Theorem 4.8.

Remark 4.9.

Remark 4.10.

Remark 4.11.

Proposition 4.12 ([39], see Proposition 2.6).

Proposition 4.13 ([39], see Proposition 2.8).

Proposition 4.14 ([39], see Theorems 3.7 and 3.9).

Proposition 4.15 ([39], see Theorem 3.12 ).

Lemma 4.16.

Lemma 4.17.

Lemma 4.18.

Theorem 4.19.

Remark 4.20.

4.4.2 Optimizing over a subset of polynomial norms with $r$ -sos-convexity

Theorem 4.21.

Remark 4.22.

Theorem 5.1 (see, e.g., Theorem 8.4 in [26]).

Theorem 5.2 (adapted from [1, 3], Theorem 3.2 ).

Example 5.3.

Remark 5.4.

Open Problem 1.

Open Problem 2.