A Linear-algebraic Proof of Hilbert's Ternary Quartic Theorem

Anatolii Grinshpan; Hugo J. Woerdeman

arXiv:1905.04751·math.AG·May 14, 2019·Am. Math. Mon.

A Linear-algebraic Proof of Hilbert's Ternary Quartic Theorem

Anatolii Grinshpan, Hugo J. Woerdeman

PDF

TL;DR

This paper presents a linear-algebraic proof of Hilbert's ternary quartic theorem, demonstrating that nonnegative degree 4 homogeneous polynomials in three variables can be expressed as sums of three squares, using cone generation by rank 1 matrices.

Contribution

It introduces a novel linear-algebraic approach to Hilbert's theorem by showing the cone of positive semidefinite matrices is generated by rank 1 elements, providing a new proof technique.

Findings

01

Structured cone of positive semidefinite matrices is generated by rank 1 elements.

02

Provides a linear-algebraic proof of Hilbert's ternary quartic theorem.

03

Shows every nonnegative degree 4 homogeneous polynomial in three variables is a sum of three squares.

Abstract

Hilbert's ternary quartic theorem states that every nonnegative degree 4 homogeneous polynomial in three variables can be written as a sum of three squares of homogeneous quadratic polynomials. We give a linear-algebraic approach to Hilbert's theorem by showing that a structured cone of positive semidefinite matrices is generated by rank 1 elements.

Equations74

p (x, y, z) = i + j + k = 4 \sum p_{ij k} x^{i} y^{j} z^{k}

p (x, y, z) = i + j + k = 4 \sum p_{ij k} x^{i} y^{j} z^{k}

⟨ A, B ⟩ = i, j = 1 \sum n a_{ij} b_{ij} = tr (A B),

⟨ A, B ⟩ = i, j = 1 \sum n a_{ij} b_{ij} = tr (A B),

{X \in S_{n} : tr (X C) = h},

{X \in S_{n} : tr (X C) = h},

p (x, y, z) = [x^{2} x y x z y^{2} y z z^{2}] A x^{2} x y x z y^{2} y z z^{2},

p (x, y, z) = [x^{2} x y x z y^{2} y z z^{2}] A x^{2} x y x z y^{2} y z z^{2},

p (x, y, z) = i = 1 \sum k (v^{⊤} a_{i})^{2}, v^{⊤} = [x^{2} x y x z y^{2} y z z^{2}] .

p (x, y, z) = i = 1 \sum k (v^{⊤} a_{i})^{2}, v^{⊤} = [x^{2} x y x z y^{2} y z z^{2}] .

A_{0} = \frac{1}{2} 2 p_{400} p_{310} p_{301} 0 p_{211} 0 p_{310} 2 p_{220} 0 p_{130} p_{121} 0 p_{301} 0 2 p_{202} 0 p_{112} p_{103} 0 p_{130} 0 2 p_{040} p_{031} 0 p_{211} p_{121} p_{112} p_{031} 2 p_{022} p_{013} 00 p_{103} 0 p_{013} 2 p_{004} .

A_{0} = \frac{1}{2} 2 p_{400} p_{310} p_{301} 0 p_{211} 0 p_{310} 2 p_{220} 0 p_{130} p_{121} 0 p_{301} 0 2 p_{202} 0 p_{112} p_{103} 0 p_{130} 0 2 p_{040} p_{031} 0 p_{211} p_{121} p_{112} p_{031} 2 p_{022} p_{013} 00 p_{103} 0 p_{013} 2 p_{004} .

000 w_{1} w_{2} w_{3} 0 - 2 w_{1} - w_{2} 0 w_{4} w_{5} 0 - w_{2} - 2 w_{3} - w_{4} - w_{5} 0 w_{1} 0 - w_{4} 00 w_{6} w_{2} w_{4} - w_{5} 0 - 2 w_{6} 0 w_{3} w_{5} - 0 w_{6} 00, w_{1}, \dots, w_{6} \in R,

000 w_{1} w_{2} w_{3} 0 - 2 w_{1} - w_{2} 0 w_{4} w_{5} 0 - w_{2} - 2 w_{3} - w_{4} - w_{5} 0 w_{1} 0 - w_{4} 00 w_{6} w_{2} w_{4} - w_{5} 0 - 2 w_{6} 0 w_{3} w_{5} - 0 w_{6} 00, w_{1}, \dots, w_{6} \in R,

(A_{0} + W) \cap PSD_{6} \neq = \emptyset .

(A_{0} + W) \cap PSD_{6} \neq = \emptyset .

a_{14} = a_{22},

a_{14} = a_{22},

a_{25} = a_{34},

{X \in S_{6} : tr (X C) \geq h},

{X \in S_{6} : tr (X C) \geq h},

C \in PSD_{6} \cap W^{⊥} implies tr (A_{0} C) \geq 0.

C \in PSD_{6} \cap W^{⊥} implies tr (A_{0} C) \geq 0.

v^{⊤} = v (x, y, z)^{⊤} := [x^{2} x y x z y^{2} y z z^{2}],

v^{⊤} = v (x, y, z)^{⊤} := [x^{2} x y x z y^{2} y z z^{2}],

tr (A_{0} v v^{⊤}) = tr (v^{⊤} A_{0} v) = p (x, y, z) \geq 0.

tr (A_{0} v v^{⊤}) = tr (v^{⊤} A_{0} v) = p (x, y, z) \geq 0.

tr (A_{0} C) = i \sum ρ_{i} p (x_{i}, y_{i}, z_{i}) \geq 0.

tr (A_{0} C) = i \sum ρ_{i} p (x_{i}, y_{i}, z_{i}) \geq 0.

R [u v w y] = [u_{1} u_{2} v_{1} v_{2} w_{1} w_{2} y_{1} y_{2}]

R [u v w y] = [u_{1} u_{2} v_{1} v_{2} w_{1} w_{2} y_{1} y_{2}]

[α β γ δ] = [1 i] [u v w y] .

[α β γ δ] = [1 i] [u v w y] .

Re (e^{i θ} α) Re (e^{i θ} β) - Re (e^{i θ} γ) Re (e^{i θ} δ) = u_{1} v_{1} - w_{1} y_{1} = 0

Re (e^{i θ} α) Re (e^{i θ} β) - Re (e^{i θ} γ) Re (e^{i θ} δ) = u_{1} v_{1} - w_{1} y_{1} = 0

Im (e^{i θ} α) Im (e^{i θ} β) - Im (e^{i θ} γ) Im (e^{i θ} δ) = u_{2} v_{2} - w_{2} y_{2} = 0.

Re (e^{i θ} α) Re (e^{i θ} β) + Im (e^{i θ} α) Im (e^{i θ} β) = Re (e^{i θ} α \overline{e^{i θ} β}) = Re (α \overline{β})

Re (e^{i θ} α) Re (e^{i θ} β) + Im (e^{i θ} α) Im (e^{i θ} β) = Re (e^{i θ} α \overline{e^{i θ} β}) = Re (α \overline{β})

Re (e^{i θ} γ) Re (e^{i θ} δ) + Im (e^{i θ} γ) Im (e^{i θ} δ) = Re (e^{i θ} γ \overline{e^{i θ} δ}) = Re (γ \overline{δ}),

Re (e^{i θ} α) Re (e^{i θ} β) - Re (e^{i θ} γ) Re (e^{i θ} δ), Im (e^{i θ} α) Im (e^{i θ} β) - Im (e^{i θ} γ) Im (e^{i θ} δ)

Re (e^{i θ} α) Re (e^{i θ} β) - Re (e^{i θ} γ) Re (e^{i θ} δ), Im (e^{i θ} α) Im (e^{i θ} β) - Im (e^{i θ} γ) Im (e^{i θ} δ)

Re (e^{2 i θ} (α β - γ δ)) = 0.

Re (e^{2 i θ} (α β - γ δ)) = 0.

x = 4 a_{11}, y = sign (a_{12}) 4 a_{44}, z = sign (a_{13}) 4 a_{66} .

x = 4 a_{11}, y = sign (a_{12}) 4 a_{44}, z = sign (a_{13}) 4 a_{66} .

A = a^{⊤} b^{⊤} c^{⊤} d^{⊤} e^{⊤} f^{⊤} [a b c d e f], a, b, c, d, e, f \in R^{k},

A = a^{⊤} b^{⊤} c^{⊤} d^{⊤} e^{⊤} f^{⊤} [a b c d e f], a, b, c, d, e, f \in R^{k},

E_{i} \in (span I_{k})^{⊥}, i = 1, 2, 3, 4, 5, 6,

E_{i} \in (span I_{k})^{⊥}, i = 1, 2, 3, 4, 5, 6,

E_{1} = a d^{⊤} + d a^{⊤} - 2 b b^{⊤},

E_{1} = a d^{⊤} + d a^{⊤} - 2 b b^{⊤},

E_{3} = b c^{⊤} + c b^{⊤} - e a^{⊤} - a e^{⊤},

E_{5} = c e^{⊤} + e c^{⊤} - b f^{⊤} - f b^{⊤},

a^{⊤} b^{⊤} c^{⊤} d^{⊤} e^{⊤} f^{⊤} (I \pm εF) [a b c d e f] \in C

a^{⊤} b^{⊤} c^{⊤} d^{⊤} e^{⊤} f^{⊤} (I \pm εF) [a b c d e f] \in C

⟨ a, d ⟩ = ⟨ b, b ⟩,

⟨ a, d ⟩ = ⟨ b, b ⟩,

⟨ b, e ⟩ = ⟨ c, d ⟩,

R [a e b c] = [a_{1} a_{2} e_{1} e_{2} b_{1} b_{2} c_{1} c_{2}]

R [a e b c] = [a_{1} a_{2} e_{1} e_{2} b_{1} b_{2} c_{1} c_{2}]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Linear-algebraic Proof of Hilbert’s Ternary Quartic Theorem

Anatolii Grinshpan and Hugo J. Woerdeman

Abstract

Hilbert’s ternary quartic theorem states that every nonnegative degree 4 homogeneous polynomial in three variables can be written as a sum of three squares of homogeneous quadratic polynomials. We give a linear-algebraic approach to Hilbert’s theorem by showing that a structured cone of positive semidefinite matrices is generated by rank 1 elements.

1 Introduction.

A homogeneous polynomial

[TABLE]

in three variables of degree 4 is called a ternary quartic. Hilbert’s classical theorem [7], dating back to 1888, states that every ternary quartic that takes only nonnegative values, i.e., such that $p(x,y,z)\geq 0$ for all $x,y,z\in{\mathbb{R}},$ can be written as a sum of three squares of homogeneous quadratic polynomials. This theorem stood as a precursor of Hilbert’s 17th problem and subsequent development, and to this day attracts a lot of attention. Detailed expositions can be found in [13] and [14].

One distinguishes two parts to Hilbert’s theorem: the existence of a representation as a sum of squares (qualitative part) and the assertion that at most three squares suffice (quantitative part). Hilbert’s original proof, cast in modern form, takes roots in advanced topology and algebraic geometry. Many attempts have been made in search of more elementary proofs. In 1977, Choi and Lam [6] gave an elementary proof of the qualitative part, based on properties of extremal positive semidefinite forms. In 2004, Pfister [9] gave a different elementary proof, which was constructive. New approaches to Hilbert’s theorem were developed in [10] and [12]. But no simple elementary explanation of the quantitative part has been found. Very recently, Hilbert’s theorem has been considered from a new general perspective, in the framework of nonnegative quadratic forms on projective real varieties [3, 4].

In this note we would like to offer a new elementary proof of the qualitative part of Hilbert’s theorem. Our approach uses linear algebra and convex geometry.

2 The PSD6 cone and Hilbert’s theorem.

We begin with a few preliminary facts. As general sources, we refer the reader to [8] for background on matrix theory and to [1, 2, 5, 15] for background on convex geometry.

Let $S_{n}$ be the vector space of $n\times n$ real symmetric matrices $A=A^{\top}$ (the superscript $\top$ denotes the transpose). The dimension of $S_{n}$ is $n(n+1)/2$ . The scalar product of two symmetric matrices $A=(a_{ij}),B=(b_{ij})$ in $S_{n}$ is defined by

[TABLE]

where tr, the trace, is the sum of diagonal elements of a matrix. Equipped with the scalar product, $S_{n}$ becomes a Euclidean space. Every hyperplane in $S_{n}$ is of the form

[TABLE]

where $C\in S_{n}$ is a nonzero matrix and $h\in{\mathbb{R}}$ . Two subsets of $S_{n}$ are said to be strictly separated by the hyperplane ${\rm tr}(XC)=h$ if one is contained in the open half-space $\{X:{\rm tr}(XC)<h\}$ and the other in the open half-space $\{X:{\rm tr}(XC)>h\}$ . Two disjoint closed convex sets in a Euclidean space can be strictly separated by a hyperplane if their vector difference is closed [2, Proposition 1.5.3].

A matrix $A\in S_{n}$ is said to be positive semidefinite if $\langle Ax,\ x\rangle\geq 0$ for all vectors $x\in\mathbb{R}^{n}$ . Equivalently, $A$ is positive semidefinite if there is a matrix $B$ such that $A=BB^{\top}$ . In particular, if $A$ is of rank $k$ , then $B$ can be chosen of size $n\times k$ .

The set PSDn of all $n\times n$ positive semidefinite matrices is closed under addition and nonnegative scaling. Such a set is said to be a cone or, more precisely, a convex cone. The cone PSDn is pointed (i.e., contains no lines) and closed.

A ray of a cone generated by its (nonzero) element consists of all nonnegative multiples of the element. A ray of a cone is called extreme if it cannot be expressed as a nonnegative linear combination of other rays. Minkowski’s theorem for cones asserts that every ray of a closed pointed (convex) cone is a nonnegative linear combination of its extreme rays [1, Sections II.3 and II.8]. In particular, Minkowski’s theorem applies to PSDn.

For every $X\in S_{n}$ , the condition ${\rm tr}(XY)\geq 0$ , for all $Y\in$ PSDn, is equivalent to $X\in$ PSDn. This is known as self-duality of the PSDn cone [5, Section 2.6.1].

The connection to Hilbert’s theorem can now be explained. A polynomial $p(x,y,z)$ is a sum of squares of homogeneous quadratic polynomials if and only if it can be represented in the form

[TABLE]

where $A\in{\rm PSD}_{6}$ . Indeed, if $A=\sum_{i=1}^{k}a_{i}a_{i}^{\top}$ , where $a_{i}\in{\mathbb{R}}^{6}$ , then (2.1) turns into a desired sum-of-squares representation:

[TABLE]

In fact, representation (2.1) is easy to obtain if we merely require $A$ to be symmetric and thus drop the positive semidefiniteness condition. One such choice is given by

[TABLE]

Moreover, if ${\mathcal{W}}$ is the subspace of $S_{6}$ consisting of the matrices

[TABLE]

then (2.1) holds if and only if $A\in A_{0}+{\mathcal{W}}$ . Thus it suffices to show that

[TABLE]

It turns out that condition (2.2) holds if and only if there is no hyperplane strictly separating the convex sets $A_{0}+{\mathcal{W}}$ and ${\rm PSD}_{6}$ . The two possible scenarios are illustrated in Figure 1.

A hyperplane $\{X\in S_{6}:\ {\rm tr}(XC)=h\}$ can be disjoint from $A_{0}+{\mathcal{W}}$ only if $C$ belongs to ${\mathcal{W}}^{\perp}$ , the orthogonal complement of $\mathcal{W}$ in $S_{6}$ . Note that the subspace ${\mathcal{W}}^{\perp}$ consists of all real symmetric matrices $(a_{ij})_{i,j=1}^{6}$ such that

[TABLE]

Since ${\mathcal{W}}^{\perp}$ contains a positive definite matrix, the intersection of ${\rm PSD}_{6}$ and ${\mathcal{W}}$ contains only the zero matrix. By [2, Proposition 1.4.14] the vector difference ${\rm PSD}_{6}-{\mathcal{W}}$ is closed and consequently there is a hyperplane strictly separating ${\rm PSD}_{6}$ and $A_{0}+{\mathcal{W}}$ [2, Proposition 1.5.3]. Now if PSD6 is contained in the closed half-space

[TABLE]

then $h\leq 0$ and $C\in{\rm PSD}_{6}$ , by the self-duality of ${\rm PSD}_{6}$ . Therefore (2.2) holds if and only if

[TABLE]

The key in proving implication (2.4) is our main result, which we now state.

Theorem 1

Let $\ \mathcal{C}$ be the cone of positive semidefinite matrices in $S_{6}$ satisfying (2.3). Then every extreme ray of $\ \mathcal{C}$ is generated by a rank 1 matrix $vv^{\top}$ , where

[TABLE]

for some $x,\ y,\ z\in\mathbb{R}$ . Thus every element of $\ {\mathcal{C}}$ is a nonnegative linear combination of matrices $vv^{\top}$ .

The second assertion of Theorem 1 follows by Minkowski’s theorem. Note that if $p(x,y,z)$ takes only nonnegative values, we obtain that

[TABLE]

By Theorem 1, each element $C$ of the cone ${\mathcal{C}}={\rm PSD}_{6}\cap{\mathcal{W}}^{\perp}$ is of the form $C=\sum_{i}\rho_{i}v_{i}v_{i}^{\top}$ with $\rho_{i}\geq 0$ and $v_{i}=v(x_{i},y_{i},z_{i})$ , and thus

[TABLE]

Consequently, Theorem 1 proves that a ternary quartic that takes only nonnegative values is a sum of squares.

3 Proof of the Theorem.

We now prove Theorem 1. The argument hinges on the following lemma.

Lemma 2

Let $u,v,w,y\in{\mathbb{R}}^{2}$ be such that $\langle u,v\rangle=\langle w,y\rangle$ . Then there exists a rotation $R$ with the property that

[TABLE]

satisfies $u_{1}v_{1}=w_{1}y_{1}$ and $u_{2}v_{2}=w_{2}y_{2}$ .

We alert the reader that the subscripts in Lemma 2 are used to indicate the components of the rotated, not original, vectors. The same convention applies further below in the proof of Theorem 1.

Proof. Though the statement is about vectors in $u,v,w,y\in{\mathbb{R}}^{2}$ , it is convenient to treat them as complex numbers $\alpha,\beta,\gamma,\delta$ ,

[TABLE]

The task amounts to choosing an angle of rotation $\theta$ so that the rotated complex numbers satisfy

[TABLE]

But since the scalar products $\langle u,v\rangle,\ \langle w,y\rangle$ are rotation invariant, i.e.,

[TABLE]

the assumption $\langle u,v\rangle=\langle w,y\rangle$ means that the sum of numbers

[TABLE]

is zero. Thus, to obtain (3.1) it suffices to choose $\ \theta\$ so that the difference of these numbers is also zero, which reduces to

[TABLE]

The latter is clearly possible and this establishes Lemma 2. $\Box$

Proof of Theorem 1. Let us start by observing that a rank 1 element in the cone ${\mathcal{C}}$ is necessarily of the form $vv^{\top}$ , with $v$ as in (2.5). Indeed, if $A=(a_{ij})_{i,j=1}^{6}\in{\mathcal{C}}$ is of rank 1, then its diagonal entries are nonnegative, and we can introduce

[TABLE]

As $A$ is of rank 1 (and symmetric), we obtain that $a_{14}^{2}=a_{11}a_{44}=x^{4}y^{4}$ . Since $a_{14}=a_{22}\geq 0$ , the equalities $a_{22}=a_{14}=x^{2}y^{2}$ follow. Similarly, $a_{33}=a_{16}=x^{2}z^{2}$ and $a_{55}=a_{46}=y^{2}z^{2}$ . Next, $a_{12}^{2}=a_{11}a_{22}=x^{6}y^{2}$ , and since $x\geq 0$ and ${\rm sign}(a_{12})={\rm sign}(y)$ , we have $a_{12}=x^{3}y$ . Continuing in this way, we obtain expressions for all entries of $A$ in terms of $x,y,$ and $z$ , and $A=vv^{\top}$ follows.

Prior to analyzing the rank 2 elements of ${\mathcal{C}}$ , let us make some useful observations. As before, let $S_{k}$ denote the vector space of $k\times k$ real symmetric matrices equipped with the scalar product $\langle A,B\rangle={\rm tr}(AB)$ . Writing a rank $k$ element $A$ of $\mathcal{C}$ as

[TABLE]

we have that the linear constraints (2.3) are equivalent to the conditions

[TABLE]

where $I_{k}$ is the $k\times k$ identity matrix and

[TABLE]

are matrices in $S_{k}$ . When $k\geq 4$ , there exists a nonzero $F\in S_{k}$ orthogonal to $I_{k},\,E_{1},\ldots,\,E_{6},$ as the dimension of $S_{k}$ is $k(k+1)/2\geq 10$ . Consequently, for small $\varepsilon,$ $A$ is the average of distinct points

[TABLE]

and does not generate an extreme ray.

Now consider the rank 2 case. Let $A$ be as in (3.2) with $k=2$ . Condition (3.3) means that

[TABLE]

Let us assume that no vectors among $a,\ b,\ c,\ d,\ e,\ f$ are multiples of each other. As $\langle b,c\rangle=\langle a,e\rangle,$ by Lemma 2 there exists a rotation $R$ so that

[TABLE]

satisfies $a_{1}e_{1}=b_{1}c_{1}$ and $a_{2}e_{2}=b_{2}c_{2}$ . Now write

[TABLE]

Assuming that $a_{1},\ a_{2}\neq 0,$ let $\tilde{d}=\begin{bmatrix}\frac{b_{1}^{2}}{a_{1}}&\frac{b_{2}^{2}}{a_{2}}\end{bmatrix}^{\top}\!\!\!,$ so that $\langle b,e\rangle=\langle c,\tilde{d}\rangle$ and $\langle a,\tilde{d}\rangle=\langle b,b\rangle.$ This yields

[TABLE]

and as $a$ and $c$ are linearly independent, we get that $d=\tilde{d}$ , yielding $d_{1}=\frac{b_{1}^{2}}{a_{1}}$ and $d_{2}=\frac{b_{2}^{2}}{a_{2}}$ . Similarly, we find that $f_{1}=\frac{c_{1}^{2}}{a_{1}}$ and $f_{2}=\frac{c_{2}^{2}}{a_{2}}$ . So letting

[TABLE]

one easily checks that $A=vv^{\top}+\hat{v}\hat{v}^{\top}$ , where $v$ and $\hat{v}$ are as in (2.5).

Remark 3

In the above reasoning it would have sufficed to have the following equalities from the start:

[TABLE]

for some scalars $\alpha,\beta$ such that $a$ and $\alpha b+\beta d$ are linearly independent. Indeed, the equality $\langle b,c\rangle=\langle a,e\rangle$ gives $a_{1}e_{1}=b_{1}c_{1}$ and $a_{2}e_{2}=b_{2}c_{2}$ . The equalities $\langle b,e\rangle=\langle c,\tilde{d}\rangle,\langle a,\tilde{d}\rangle=\langle b,b\rangle$ then give $d_{1}=\frac{b_{1}^{2}}{a_{1}}$ and $d_{2}=\frac{b_{2}^{2}}{a_{2}}$ . To conclude that $f_{1}=\frac{c_{1}^{2}}{a_{1}}$ and $f_{2}=\frac{c_{2}^{2}}{a_{2}}$ , we use that both $f$ and $\tilde{f}:=\begin{bmatrix}\frac{c_{1}^{2}}{a_{1}}&\frac{c_{2}^{2}}{a_{2}}\end{bmatrix}^{\top}$ satisfy the conditions $\langle f,a\rangle=\langle c,c\rangle$ and $\alpha\langle f,b\rangle+\beta\langle f,d\rangle=\alpha\langle e,c\rangle+\beta\langle e,e\rangle$ . Thus $\langle a,f-\tilde{f}\rangle=0=\langle\alpha b+\beta d,f-\tilde{f}\rangle$ , yielding $f=\tilde{f}$ as $a$ and $\alpha b+\beta d$ are linearly independent.

We have made some generic assumptions in the rank $2$ case, but these can be lifted. When there is some pairwise linear dependence among $a,b,c,d,e,f$ or when $a_{1}a_{2}=0$ , appropriate modifications of the same argument still apply.

Finally, the rank $3$ case remains. Since $\dim{S_{3}}=6$ , the matrices $E_{1},\ldots,E_{6}$ lie in the 5-dimensional subspace $\{I_{3}\}^{\perp}$ and so are linearly dependent. Let us assume that $E_{6}$ is a linear combination of $E_{1},\dots,E_{5}$ . Choose a nonzero $F\in\{E_{1},E_{2},E_{3},E_{4},I_{3}\}^{\perp}$ and $\varepsilon\neq 0$ so that $I_{3}-\varepsilon F$ is positive semidefinite with rank 2. Then

[TABLE]

satisfies four of the linear conditions (2.3) and a linear combination of the remaining two (as $aE_{5}+bE_{6}\in{\rm span}\{E_{1},E_{2},E_{3},E_{4}\}$ ). Due to Remark 3 these equalities suffice to show that $B$ does not generate an extreme ray, and therefore $A$ does not either. This finishes the last outstanding case, thus establishing Theorem 1. $\Box$

4 The number of squares.

As noted in the introduction, Hilbert’s result is actually stronger than what we have shown: the sum-of-squares representation can always be chosen to have at most three squares. The known proofs of this fact, including Hilbert’s original proof [7], are much less elementary [10, 12], and a linear-algebraic argument, if it exists, is yet to be found.

From the proof of Choi and Lam [6] one extracts additional information that every nonnegative ternary quartic is a sum of five squares. Pfister’s proof [9] shows that at most four squares suffice. We note that the four squares conclusion can be reached in a different way, using the geometry of the PSD6 cone. Namely, by letting $\mathcal{A}=A_{0}+\mathcal{W}$ and $n=6$ in the following lemma of Barvinok.

Lemma 4

[1, Chapter II, Lemma 13.6]** Let $\mathcal{A}$ be an $n$ -dimensional affine subspace of $S_{n}$ . If the intersection of $\mathcal{A}$ with PSDn is nonempty and bounded, then $\mathcal{A}$ contains a positive semidefinite matrix of rank at most $n-2$ .

We conclude with a few words on how one may numerically find a sum-of-squares representation using semidefinite programming (SDP). General references on SDP and convex optimization are [2, 5]. When we let $V_{1},\ldots,V_{15}$ be a basis for ${\mathcal{W}}^{\perp}$ , finding a sum-of-squares representation comes down to finding $A\in{\rm PSD}_{6}$ with ${\rm tr}(AV_{i})={\rm tr}(A_{0}V_{i})=:b_{i}$ , $i=1,\ldots,15$ , which is exactly a feasibility problem in SDP. Choosing a positive definite $C$ , one can perform the SDP

[TABLE]

In [11, Section 6] it is observed that for random $C$ there is a positive probability to find a rank 3 optimal $A$ . Thus a repeated performance of the above SDP with random $C$ , ultimately yields a representation as a sum of three squares. Here we accept a solution as having rank at most 3 when its fourth singular value is sufficiently small. It is our experience that this happens after just a few tries.

*Acknowledgment.*The authors wish to thank the anonymous referees for their thoughtful comments which led to improvement in the presentation. They also thank Benjamin Grossmann for reading the article and providing helpful feedback. HJW is supported by Simons Foundation grant 355645.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Barvinok, A. (2002). A Course in Convexity . Providence, RI: American Mathematical Society.
2[2] Bertsekas, D. P. (2009) Convex optimization theory. Nashua, NH: Athena Scientific.
3[3] Blekherman, G., Plaumann, D., Sinn, R., Vinzant, C. (2019). Low-rank sum-of-squares representations on varieties of minimal degree. International Mathematics Research Notices 2019(1): 33–54
4[4] Blekherman, G., Smith, G. S., Velasco, M. (2015). Sums of squares and varieties of minimal degree. J. Am. Math. Soc. 29: 893–913.
5[5] Boyd, S., Vandenberghe, L. (2004). Convex optimization . New York, NY: Cambridge University Press.
6[6] Choi, M.-D., Lam, T.-Y. (1977). Extremal positive semidefinite forms. Math. Ann. 231: 1–18.
7[7] Hilbert, D. (1888). Über die Darstellung definiter Formen als Summe von Formenquadraten. (German). Math. Ann. 32(3): 342–350.
8[8] Horn, R. A., Johnson, C. R. (1995). Matrix Analysis . Cambridge, UK: Cambridge University Press.