The many faces of degeneracy in conic optimization

Dmitriy Drusvyatskiy; Henry Wolkowicz

arXiv:1706.03705·math.OC·June 13, 2017·Found. Trends Optim.

The many faces of degeneracy in conic optimization

Dmitriy Drusvyatskiy, Henry Wolkowicz

PDF

2 Repos

TL;DR

This paper explores the causes and implications of degeneracy in conic optimization, emphasizing the role of facial reduction techniques in addressing issues arising from the loss of strict feasibility.

Contribution

It provides a comprehensive analysis of degeneracy causes in conic optimization and highlights the effectiveness of facial reduction methods for overcoming these challenges.

Findings

01

Loss of strict feasibility affects optimality conditions and numerical methods.

02

Facial reduction offers a mathematically elegant way to handle degeneracy.

03

Rich problem structures can be exploited to improve optimization outcomes.

Abstract

Slater's condition -- existence of a "strictly feasible solution" -- is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact off-the-shelf numerical methods, such as primal-dual interior point methods, in particular. New optimization modelling techniques and convex relaxations for hard nonconvex problems have shown that the loss of strict feasibility is a more pronounced phenomenon than has previously been realized. In this text, we describe various reasons for the loss of strict feasibility, whether due to poor modelling choices or (more interestingly) rich underlying structure, and discuss ways to cope with it and, in many…

Tables2

Table 1. Table 5.1: Empirical results for SNL

# sensors	# anchors	radio range	RMSD	Time
$20000$	9	$.025$	$5 e - 16$	$25$ s
$40000$	9	$.02$	$8 e - 16$	$1$ m $23$ s
$60000$	9	$.015$	$5 e - 16$	$3$ m $13$ s
$100000$	9	$.01$	$6 e - 16$	$9$ m $8$ s

Table 2. Table 5.2: noiseless: r = 4 𝑟 4 r=4 ; m × n 𝑚 𝑛 m\times n size; density p 𝑝 p .

Specifications			Time (s)	Rank	Residual (% $Z$ )
$m$	$n$	mean( $p$ )	Time (s)	Rank	Residual (% $Z$ )
700	2000	0.36	12.80	4.0	1.5217e-12
1000	5000	0.36	49.66	4.0	1.0910e-12
1400	9000	0.36	131.53	4.0	6.0304e-13
1900	14000	0.36	291.22	4.0	3.4847e-11
2500	20000	0.36	798.70	4.0	7.2256e-08

Equations373

⟨ A x, y ⟩ = ⟨ x, A^{*} y ⟩ for all x \in E and y \in F .

⟨ A x, y ⟩ = ⟨ x, A^{*} y ⟩ for all x \in E and y \in F .

A x = (⟨ a_{1}, x ⟩, \dots, ⟨ a_{m}, x ⟩),

A x = (⟨ a_{1}, x ⟩, \dots, ⟨ a_{m}, x ⟩),

A^{*} y = i \sum y_{i} a_{i} .

A^{*} y = i \sum y_{i} a_{i} .

A (X) = (⟨ A_{1}, X ⟩, \dots, ⟨ A_{m}, X ⟩) .

A (X) = (⟨ A_{1}, X ⟩, \dots, ⟨ A_{m}, X ⟩) .

A^{*} y = i \sum y_{i} A_{i} .

A^{*} y = i \sum y_{i} A_{i} .

⟨ X, i \sum y_{i} A_{i} ⟩ = i \sum y_{i} ⟨ A_{i}, X ⟩ = ⟨ A (X), y ⟩,

⟨ X, i \sum y_{i} A_{i} ⟩ = i \sum y_{i} ⟨ A_{i}, X ⟩ = ⟨ A (X), y ⟩,

x, y \in C, α \in [0, 1] ⟹ α x + (1 - α) y \in C .

x, y \in C, α \in [0, 1] ⟹ α x + (1 - α) y \in C .

R_{+}^{n} := {x \in R^{n} : x_{i} \geq 0 for all i = 1, \dots, n}

R_{+}^{n} := {x \in R^{n} : x_{i} \geq 0 for all i = 1, \dots, n}

R_{++}^{n} := {x \in R^{n} : x_{i} > 0 for all i = 1, \dots, n}

R_{++}^{n} := {x \in R^{n} : x_{i} > 0 for all i = 1, \dots, n}

S_{+}^{n} := {X \in S^{n} : v^{T} X v \geq 0 for all v \in R^{n}} .

S_{+}^{n} := {X \in S^{n} : v^{T} X v \geq 0 for all v \in R^{n}} .

v^{T} X v = tr (v^{T} X v) = tr (X v v^{T}) = ⟨ X, v v^{T} ⟩ .

v^{T} X v = tr (v^{T} X v) = tr (X v v^{T}) = ⟨ X, v v^{T} ⟩ .

S_{++}^{n} := {X \in S^{n} : v^{T} X v > 0 for all 0 \neq = v \in R^{n}} .

S_{++}^{n} := {X \in S^{n} : v^{T} X v > 0 for all 0 \neq = v \in R^{n}} .

x ⪰_{K} y

x ⪰_{K} y

x ≻_{K} y

K^{*} := {y \in E : ⟨ x, y ⟩ \geq 0 for all x \in K} .

K^{*} := {y \in E : ⟨ x, y ⟩ \geq 0 for all x \in K} .

⟨ X, Y ⟩ = tr X Y = i \sum λ_{i} tr (v_{i} v_{i}^{T} Y) = i \sum λ_{i} (v_{i}^{T} Y v_{i}) \geq 0.

⟨ X, Y ⟩ = tr X Y = i \sum λ_{i} tr (v_{i} v_{i}^{T} Y) = i \sum λ_{i} (v_{i}^{T} Y v_{i}) \geq 0.

(K_{1} + K_{2})^{*} = K_{1}^{*} \cap K_{2}^{*} and (K_{1} \cap K_{2})^{*} = \mbox cl (K_{1}^{*} + K_{2}^{*}) .

(K_{1} + K_{2})^{*} = K_{1}^{*} \cap K_{2}^{*} and (K_{1} \cap K_{2})^{*} = \mbox cl (K_{1}^{*} + K_{2}^{*}) .

x, y \in K, x + y \in F ⟹ x, y \in F .

x, y \in K, x + y \in F ⟹ x, y \in F .

(v_{1}^{⊥} \cap K) \cap (v_{2}^{⊥} \cap K) = (v_{1} + v_{2})^{⊥} \cap K .

(v_{1}^{⊥} \cap K) \cap (v_{2}^{⊥} \cap K) = (v_{1} + v_{2})^{⊥} \cap K .

F_{I} = {x \in R_{+}^{n} : x_{i} = 0 for all i \in I}

F_{I} = {x \in R_{+}^{n} : x_{i} = 0 for all i \in I}

F_{R} := {X \in S_{+}^{n} : range X \subseteq R}

F_{R} := {X \in S_{+}^{n} : range X \subseteq R}

F_{R} = V S_{+}^{r} V^{T} .

F_{R} = V S_{+}^{r} V^{T} .

F_{R^{⊥}} = U S_{+}^{n - r} U^{T},

F_{R^{⊥}} = U S_{+}^{n - r} U^{T},

\begin{array}[]{cc}\textbf{(P)}\,\begin{array}[]{cc}\inf&\langle c,x\rangle\\ \textrm{ s.t.}&Ax=b\\ &x\succeq_{\mathcal{K}}0\end{array}&\,\qquad\qquad\textbf{(D)}\,\begin{array}[]{rcc}\sup&\langle b,y\rangle\\ \textrm{s.t.}&A^{*}y\preceq_{\mathcal{K}^{*}}c\\ \end{array}\end{array}

\begin{array}[]{cc}\textbf{(P)}\,\begin{array}[]{cc}\inf&\langle c,x\rangle\\ \textrm{ s.t.}&Ax=b\\ &x\succeq_{\mathcal{K}}0\end{array}&\,\qquad\qquad\textbf{(D)}\,\begin{array}[]{rcc}\sup&\langle b,y\rangle\\ \textrm{s.t.}&A^{*}y\preceq_{\mathcal{K}^{*}}c\\ \end{array}\end{array}

F_{p} := {x ⪰_{K} 0 : A x = b} and F_{d} := {y : A^{*} y ⪯_{K^{*}} c} .

F_{p} := {x ⪰_{K} 0 : A x = b} and F_{d} := {y : A^{*} y ⪯_{K^{*}} c} .

\begin{array}[]{rcc}\sup&\langle b,y\rangle\\ \textrm{s.t.}&A^{*}y+s=c\\ &y\in{\bf F},s\in\operatorname{{\mathcal{K}}}^{*}.\end{array}

\begin{array}[]{rcc}\sup&\langle b,y\rangle\\ \textrm{s.t.}&A^{*}y+s=c\\ &y\in{\bf F},s\in\operatorname{{\mathcal{K}}}^{*}.\end{array}

L (x, y) := ⟨ c, x ⟩ + ⟨ y, b - A x ⟩,

L (x, y) := ⟨ c, x ⟩ + ⟨ y, b - A x ⟩,

y max L (x, y) = {⟨ c, x ⟩ + \infty A x = b otherwise .

y max L (x, y) = {⟨ c, x ⟩ + \infty A x = b otherwise .

x ⪰_{K} 0 min (y max L (x, y)) .

x ⪰_{K} 0 min (y max L (x, y)) .

y max (x ⪰_{K} 0 min L (x, y))

y max (x ⪰_{K} 0 min L (x, y))

= max {⟨ b, y ⟩ : A^{*} y ⪯_{K^{*}} c} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\copyrightowner

The many faces of degeneracy

in

conic optimization

last modified on

Dmitriy Drusvyatskiy

Department of Mathematics

University of Washington

[email protected]

Henry Wolkowicz

Faculty of Mathematics

University of Waterloo

[email protected]

Abstract

Slater’s condition – existence of a “strictly feasible solution” – is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact off-the-shelf numerical methods, such as primal-dual interior point methods, in particular. New optimization modelling techniques and convex relaxations for hard nonconvex problems have shown that the loss of strict feasibility is a more pronounced phenomenon than has previously been realized. In this text, we describe various reasons for the loss of strict feasibility, whether due to poor modelling choices or (more interestingly) rich underlying structure, and discuss ways to cope with it and, in many pronounced cases, how to use it as an advantage. In large part, we emphasize the facial reduction preprocessing technique due to its mathematical elegance, geometric transparency, and computational potential.

1 What this paper is about
1.1 Related work
1.2 Outline of the paper
1.3 Reflections on Jonathan Borwein and FR
I Theory
2 Convex geometry
2.1 Notation
2.2 Facial geometry
2.3 Conic optimization problems
2.4 Commentary
3 Virtues of strict feasibility
3.1 Theorem of the alternative
3.2 Stability of the solution
3.3 Distance to infeasibility
3.4 Commentary
4 Facial reduction
4.1 Preprocessing in linear programming
4.2 Facial reduction in conic optimization
4.3 Facial reduction in semi-definite programming
4.4 What facial reduction actually does
4.5 Singularity degree and the Hölder error bound in SDP
4.6 Towards computation
4.7 Commentary
II Applications and illustrations
5 Matrix completions
5.1 Positive semi-definite matrix completion
5.2 Euclidean distance matrix completion, EDMC
5.2.1 EDM and SNL with exact data
5.2.2 Extensions to noisy EDM and SNL problems
5.3 Low-rank matrix completions
5.4 Commentary
6 Hard combinatorial problems
6.1 Quadratic assignment problem, QAP
6.2 Second lift of Max-Cut
6.3 General semi-definite lifts of combinatorial problems
6.4 Elimination method for sparse SOS polynomials
6.5 Commentary
Monomial elimination from SOS problems
Index

List of Figures

2.1 The set $Q$ .
4.1 Nonexposed face of the image set
5.1 Instance of EDMC
6.1 Difference in cpu seconds (without FR $-$ with FR)
6.2 Difference in accuracy values (without FR $-$ with FR)

List of Tables

5.1 Empirical results for SNL
5.2 noiseless: $r=4$ ; $m\times n$ size; density $p$ .

Chapter 1 What this paper is about

Conic optimization has proven to be an elegant and powerful modeling tool with surprisingly many applications. The classical linear programming

problem revolutionized operations research and is still the most widely used optimization model. This is due to the elegant theory and the ability to solve in practice both small and large scale problems efficiently and accurately by the well known simplex method of Dantzig [35] and by more recent interior-point methods, e.g., [148, 98]. The size (number of variables) of linear programs that could be solved before the interior-point revolution was on the order of tens of thousands, whereas it immediately increased to millions for many applications. A large part of modern success is due to preprocessing, which aims to identify (primal and dual slack) variables that are identically zero on the feasible set. The article [96] is a good reference.

The story does not end with linear programming. Dantzig himself recounts in [36]: “the world is nonlinear”. Nonlinear models can significantly improve on linear programs if they can be solved efficiently. Conic optimization has shown its worth in its elegant theory, efficient algorithms, and many applications e.g., [146, 9, 20]. Preprocessing to rectify possible loss of “strict-feasibility” in the primal or the dual problems is appealing for general conic optimization as well. In contrast to linear programming, however, the area of preprocessing for conic optimization is in its infancy; see e.g., [29, 138, 30, 107, 109] and Section 1.1, below. In contrast to linear programming, numerical error makes preprocessing difficult in full generality. This being said, surprisingly, there are many specific applications of conic optimization, where the rich underlying structure makes preprocessing possible, leading to greatly simplified models and strengthened algorithms. Indeed, exploiting structure is essential for making preprocessing viable. In this article, we present the background and the elementary theory of such regularization techniques in the framework of facial reduction (FR). We focus on notable case studies, where such techniques have proven to be useful.

1.1 Related work

To put this text in perspective, it is instructive to consider nonlinear programming. Nontrivial statements in constrained nonlinear optimization always rely on some regularity of the constraints. To illustrate, consider a minimization problem over a set of the form $\{x:f(x)=0\}$ for some smooth $f$ . How general are such constraints? A celebrated result of Whitney [143] shows that any closed set in a Euclidean space can written as a zero-set of some $C^{\infty}$ -smooth function $f$ . Thus, in this generality, there is little difference between minimizing over arbitrary closed sets and sets of the form $\{x:f(x)=0\}$ , for smooth $f$ . Since little can be said about optimizing over arbitrary closed sets, one must make an assumption on the equality constraint. The simplest one, eliminating Whitney’s construction, is that the gradient of $f$ is nonzero on the feasible region – the earliest form of a constraint qualification. There have been numerous papers, developing weakened versions of regularity (and optimality conditions) in nonlinear programming; some good examples are [62, 25, 22].

The Slater constraint qualification, we discuss in this text, is in a similar spirit, but in the context of (convex) conic optimization. Some good early references on the geometry of the Slater condition, and weakened variants, are [57, 93, 94, 144, 19]. The concept of facial reduction for general convex programs was introduced in [23, 24], while an early application to a semi-definite type best-approximation problem was given in [145]. Recently, there has been a significant renewed interest in facial reduction, in large part due to the success in applications for graph related problems, such as Euclidean distance matrix completion and molecular conformation [76, 75, 46, 6] and in polynomial optimization [110, 111, 74, 141, 140]. In particular, a more modern explanation of the facial reduction procedure can be found in [88, 104, 107, 136, 142].

We note in passing that numerous papers show that strict feasibility holds “generically” with respect to unstructured perturbations. In contrast, optimization problems appearing in applications are often highly structured and such genericity results are of little practical use.

1.2 Outline of the paper

The paper is divided into two parts. In Part I, we present the necessary theoretical grounding in conic optimization, including basic optimality and duality theory, connections of Slater’s condition to the distance to infeasibility and sensitivity theory, the facial reduction procedure, and the singularity degree. In Part II, we concentrate on illustrative examples and applications, including matrix completion problems (semi-definite, low-rank, and Euclidean distance), relaxations of hard combinatorial problems (quadratic assignment and max-cut), and sum of squares relaxations of polynomial optimization problems.

1.3 Reflections on Jonathan Borwein and FR

These are some reflections on Jonathan Borwein and his role in the development of the facial reduction technique, by Henry Wolkowicz. Jonathon Borwein passed away unexpectedly on Aug. 2, 2016. Jon was an extraordinary mathematician who made significant contributions in an amazing number of very diverse areas. Many details and personal memories by myself and many others including family, friends, and colleagues, are presented at the memorial website jonborwein.org. This was a terrible loss to his family and all his friends and colleagues, including myself. The facial reduction process we use in this monograph originates in the work of Jon and the second author (myself). This work took place from July of 1978 to July of 1979 when I went to Halifax to work with Jon at Dalhousie University in a lectureship position. The optimality conditions for the general abstract convex program using the facially reduced problem is presented in the two papers [23, 22]. The facial reduction process is then derived in [24].

Part I Theory

Chapter 2 Convex geometry

This section collects preliminaries of linear algebra and convex geometry that will be routinely used in the rest of the manuscript. The main focus is on convex duality, facial structure of convex cones, and the primal-dual conic optimization pair. The two running examples of linear and semi-definite programming illustrate the concepts. We have tried to include proofs of important theorems, when they are both elementary and enlightening. We have omitted arguments that are longer or that are less transparent, so as not to distract the reader from the narrative.

2.1 Notation

Throughout, we will fix a Euclidean space ${\mathbb{\bf E}}$ with an inner product $\langle\cdot,\cdot\rangle$ and the induced norm $\|x\|=\sqrt{\langle x,x\rangle}$ . When referencing another Euclidean space (with its own inner product), we will use the letter ${\bf\mathbb{\bf F}}$ . An open ball of radius $r>0$ around a point $x\in{\mathbb{\bf E}}$ will be denoted by $B_{r}(x)$ . The two most important examples of Euclidean spaces for us will be the space of $n$ -vectors ${\bf R}^{n}$ with the dot product $\langle x,y\rangle=\sum_{i}x_{i}y_{i}$ and the space of $n\times n$ symmetric matrices ${\bf S}^{n}$ with the trace inner product $\langle X,Y\rangle=\operatorname{{trace}}(XY)$ . Throughout, we let $e_{i}$ be the $i$ ’th coordinate vector of ${\bf R}^{n}$ . Note that the trace inner product can be equivalently written as $\operatorname{{trace}}(XY)=\sum_{i,j}X_{ij}Y_{ij}$ . Thus the trace inner product is itself the dot product between the two matrices stretched into vectors. A key property of the trace is invariance under permutations of the arguments: $\operatorname{{tr}}(AB)=\operatorname{{tr}}(BA)$ for any two matrices $A\in{\bf R}^{m\times n}$ and $B\in{\bf R}^{n\times m}$ .

For any linear mapping $A\colon{\mathbb{\bf E}}\to{{\bf\mathbb{\bf F}}}$ , between Euclidean spaces ${\mathbb{\bf E}}$ and ${{\bf\mathbb{\bf F}}}$ , the adjoint mapping $A^{*}\colon{{\bf\mathbb{\bf F}}}\to{\bf E}$ is the unique mapping satisfying

[TABLE]

Notice that the angle brackets on the left refer to the inner product in ${\bf\mathbb{\bf F}}$ , while those on the right refer to the inner product in ${\mathbb{\bf E}}$ .

Let us look at two important examples of adjoint maps.

Example 2.1.1 (Adjoints of mappings between ${\bf R}^{n}$ and ${\bf R}^{m}$ ).

Consider a matrix $A\in{\bf R}^{m\times n}$ as a linear map from ${\bf R}^{n}\to{\bf R}^{m}$ . Then the adjoint $A^{*}$ is simply the transpose $A^{T}$ . To make the parallel with the next example, it is useful to make this description more explicit. Suppose that the linear operator $A\colon{\bf R}^{n}\to{\bf R}^{m}$ is defined by

[TABLE]

where $a_{1},\ldots,a_{m}$ are some vectors in ${\bf R}^{n}$ . When thinking of $A$ as a matrix, the vectors $a_{i}$ would be its rows, and the description (2.1) corresponds to a “row-space” view of matrix-vector multiplication $Ax$ . The adjoint $A^{*}\colon{\bf R}^{m}\to{\bf R}^{n}$ is simply the map

[TABLE]

Again, when thinking of $A$ as a matrix with $a_{i}$ as its rows, the description (2.2) corresponds to the “column-space” view of matrix-vector multiplication $A^{T}y$ .

Example 2.1.2 (Adjoints of mappings between ${\bf S}^{n}$ and ${\bf R}^{m}$ ).

Consider a set of symmetric matrices $A_{1},\ldots,A_{m}$ in ${\bf S}^{n}$ , and define the linear map $A\colon{\bf S}^{n}\to{\bf R}^{m}$ by

[TABLE]

We note that any linear map $A\colon{\bf S}^{n}\to{\bf R}^{m}$ can be written in this way for some matrices $A_{i}\in{\bf S}^{n}$ . Notice the parallel to (2.1). The adjoint $A^{*}\colon{\bf R}^{m}\to{\bf S}^{n}$ is given by

[TABLE]

Notice the parallel to (2.2). To verify that this indeed is the adjoint, simply observe the equation

[TABLE]

for any $X\in{\bf S}^{n}$ and $y\in{\bf R}^{n}$ .

The interior, boundary, and closure of any set $C\subset{\mathbb{\bf E}}$ will be denoted by $\mbox{\rm int}\,C$ , $\mbox{\rm bd}\,C$ , and $\mbox{\rm cl}\,C$ , respectively.

A set $C$ is convex if it contains the line segment joining any two points in $C$ :

[TABLE]

The minimal affine space containing a convex set $C$ is called the affine hull of $C$ , and is denoted by ${\rm aff}\,C$ . We define the relative interior of $C$ , written $\operatorname{{ri}}C$ , to be the interior of $C$ relative to ${\rm aff}\,C$ . It is straightforward to show that a for a nonempty convex set $C$ , the relative interior $\operatorname{{ri}}C$ is never empty.

A subset $\mathcal{K}$ of ${\mathbb{\bf E}}$ is a convex cone if $\mathcal{K}$ is convex and is positively homogeneous, meaning $\lambda K\subseteq K$ for all $\lambda\geq 0$ . Equivalently, $\mathcal{K}$ is a convex cone if, and only if, for any two points $x$ and $y$ in $\mathcal{K}$ and any nonnegative constants $\alpha,\beta\geq 0$ , the sum $\alpha x+\beta y$ lies in $\mathcal{K}$ . We say that a convex cone $\mathcal{K}$ is proper if $\mathcal{K}$ is closed, has nonempty interior, and contains no lines. The symbol $\operatorname{{\mathcal{K}}}^{\perp}$ refers to the orthogonal complement of ${\rm aff}\,\operatorname{{\mathcal{K}}}$ . Let us look at two most important examples of proper cones for this article.

Example 2.1.3 (The nonnegative orthant ${\bf R}^{n}_{+}$ ).

The nonnegative orthant

[TABLE]

is a proper convex cone in ${\bf R}^{n}$ . The interior of ${\bf R}^{n}_{+}$ is the set

[TABLE]

Example 2.1.4 (The positive semi-definite cone ${\bf S}^{n}_{+}$ ).

Consider the set of positive semi-definite matrices

[TABLE]

It is immediate from the definition that ${\bf S}^{n}_{+}$ is a convex cone containing no lines. Let us quickly verify that ${\bf S}^{n}_{+}$ is proper. To see this, observe

[TABLE]

Thus ${\bf S}^{n}_{+}$ is closed because it is the intersection of the halfspaces $\{X\in{\bf S}^{n}:\langle X,vv^{T}\rangle\geq 0\}$ for all $v\in{\bf R}^{n}$ , and arbitrary intersections of closed sets are closed. The interior of ${\bf S}^{n}_{+}$ is the set of positive definite matrices

[TABLE]

Let us quickly verify this description. Showing that ${\bf S}^{n}_{++}$ is open is straightforward; we leave the details to the reader. Conversely, consider a matrix $X\in{\bf S}^{n}_{+}\setminus{\bf S}^{n}_{++}$ and let $v$ be a nonzero vector satisfying $v^{T}Xv=0$ . Then the matrix $X-tvv^{T}$ lies outside of ${\bf S}^{n}_{+}$ for every $t>0$ , and therefore $X$ must lie on the boundary of ${\bf S}^{n}_{+}$ . To summarize, we have shown that ${\bf S}^{n}_{+}$ is a proper convex cone.

Given a convex cone $\mathcal{K}$ in ${\mathbb{\bf E}}$ , we introduce two binary relations $\succeq_{\mathcal{K}}$ and $\succ_{\mathcal{K}}$ on ${\mathbb{\bf E}}$ :

[TABLE]

Assuming that $\mathcal{K}$ is proper makes the relation $\succeq_{\mathcal{K}}$ into a partial order, meaning that for any three points $x,y,z\in{\mathbb{\bf E}}$ , the three conditions hold:

(reflexivity) $\quad x\succeq_{\mathcal{K}}x$ 2. 2.

(antisymmetry) $\quad x\succeq_{\mathcal{K}}y~{}\textrm{ and }~{}y\succeq_{K}x\quad\implies\quad x=y$ 3. 3.

(transitivity) $\quad x\succeq_{\mathcal{K}}y~{}\textrm{ and }~{}y\succeq_{K}z\quad\implies\quad x\succeq_{\mathcal{K}}z.$

As is standard in the literature, we denote the partial order $\succeq_{{\bf R}^{n}_{+}}$ on ${\bf R}^{n}$ by $\geq$ and the partial order $\succeq_{{\bf S}^{n}_{+}}$ on ${\bf S}^{n}$ by $\succeq$ . In particular, the relation $x\geq y$ means $x_{i}\geq y_{i}$ for each coordinate $i$ , while the relation $X\succeq Y$ means that the matrix $X-Y$ is positive semi-definite.

Central to conic geometry is duality. The dual cone of $\mathcal{K}$ is the set

[TABLE]

The following lemma will be used extensively.

Lemma 2.1.5 (Self-duality).

Both ${\bf R}^{n}_{+}$ and ${\bf S}^{n}_{+}$ are self-dual, meaning $({\bf R}^{n}_{+})^{*}={\bf R}^{n}_{+}$ and $({\bf S}^{n}_{+})^{*}={\bf S}^{n}_{+}$ .

Proof 2.1.6.

The equality $({\bf R}^{n}_{+})^{*}={\bf R}^{n}_{+}$ is elementary and we leave the proof to the reader. To see that ${\bf S}^{n}_{+}$ is self-dual, recall that a matrix $X\in{\bf S}^{n}$ is positive semi-definite if, and only if, all of its eigenvalues are nonnegative. Fix two matrices $X,Y\succeq 0$ and let $X=\sum_{i}\lambda_{i}v_{i}v_{i}^{T}$ be the eigenvalue decomposition of $X$ . Then we deduce

[TABLE]

Therefore the inclusion ${\bf S}^{n}_{+}\subseteq({\bf S}^{n}_{+})^{*}$ holds. Conversely, for any $X\in({\bf S}^{n}_{+})^{*}$ and any $v\in{\bf R}^{n}$ the inequality, $0\leq\langle X,vv^{T}\rangle=v^{T}Xv$ , holds. The reverse inclusion ${\bf S}^{n}_{+}\supseteq({\bf S}^{n}_{+})^{*}$ follows, and the proof is complete.

Finally, we end this section with the following two useful results of convex geometry.

Lemma 2.1.7 (Dual cone of a sum).

For any two closed convex cones $\operatorname{{\mathcal{K}}}_{1}$ and $\operatorname{{\mathcal{K}}}_{2}$ , equalities hold:

[TABLE]

Lemma 2.1.8 (Double dual).

A set $\mathcal{K}\subset{\mathbb{\bf E}}$ is a closed convex cone if, and only if, equality $\mathcal{K}=(\mathcal{K}^{*})^{*}$ holds.

In particular, if $\mathcal{K}$ is a proper convex cone, then so is its dual $\mathcal{K}^{*}$ , as the reader can verify.

2.2 Facial geometry

Central to this paper is the decomposition of a cone into faces.

Definition 2.2.1 (Faces).

*Let $\mathcal{K}$ be a convex cone. A convex cone $F\subseteq\mathcal{K}$ is called a face of $\mathcal{K}$ , denoted $F\unlhd\mathcal{K}$ , if the implication holds:

[TABLE]

Let $\mathcal{K}$ be a closed convex cone. Vacuously, the empty set and $\mathcal{K}$ itself are faces. A face $F\unlhd\mathcal{K}$ is proper if it is neither empty nor all of $\mathcal{K}$ . One can readily verify from the definition that the intersection of an arbitrary collection of faces of $\mathcal{K}$ is itself a face of $\mathcal{K}$ . A fundamental result of convex geometry shows that relative interiors of all faces of $\mathcal{K}$ form a partition of $\mathcal{K}$ : every point of $\mathcal{K}$ lies in the relative interior of some face and relative interiors of any two distinct faces are disjoint. In particular, any proper face of $\mathcal{K}$ is disjoint from $\operatorname{{ri}}\mathcal{K}$ .

Definition 2.2.2 (Minimal face).

The minimal face of a convex cone $\mathcal{K}$ containing a set $S\subseteq\mathcal{K}$ is the intersection of all faces of $\mathcal{K}$ containing $S$ , and is denoted by $\operatorname{{face}}(S,\mathcal{K})$ .

A convenient alternate characterization of minimal faces is as follows. If $S\subseteq\operatorname{{\mathcal{K}}}$ is a convex set, then $\operatorname{{face}}(S,\mathcal{K})$ is the smallest face of $\mathcal{K}$ intersecting the relative interior of $S$ . In particular, equality $\operatorname{{face}}(S,\mathcal{K})=\operatorname{{face}}(x,\mathcal{K})$ holds for any point $x\in\operatorname{{ri}}S$ .

There is a special class of faces that admit “dual” descriptions. Namely, for any vector $v\in\mathcal{K}^{*}$ one can readily verify that the set $F=v^{\perp}\cap\mathcal{K}$ is a face of $\mathcal{K}$ .

Definition 2.2.3 (Exposed faces).

Any set of the form $F=v^{\perp}\cap\mathcal{K}$ , for some vector $v\in\mathcal{K}^{*}$ , is called an exposed face of $\mathcal{K}$ . The vector $v$ is then called an exposing vector of $F$ .

The classical hyperplane separation theorem shows that any point $x$ in the relative boundary of $\mathcal{K}$ lies in some proper exposed face. Not all faces are exposed, however, as the following example shows.

Example 2.2.4 (Nonexposed faces).

Consider the set $Q=\{(x,y)\in{\bf R}^{2}:y\geq\max(0,x)^{2}\}$ , and let $\mathcal{K}$ be the closed convex cone generated by $Q\times\{1\}$ . Then the ray $\{(0,0)\}\times{\bf R}_{+}$ is a face of $\mathcal{K}$ but it is not exposed.

The following is a very useful property of exposed faces we will use.

Proposition 2.2.5 (Exposing the intersection of exposed faces).

For any closed convex cone $\mathcal{K}$ and vectors $v_{1},v_{2}\in\operatorname{{\mathcal{K}}}^{*}$ , equality holds:

[TABLE]

Proof 2.2.6.

The inclusion $\subseteq$ is trivial. To see the converse, note for any $x\in(v_{1}+v_{2})^{\perp}\cap\mathcal{K}$ we have $\langle v_{1},x\rangle\geq 0$ , $\langle v_{2},x\rangle\geq 0$ , while $\langle v_{1}+v_{2},x\rangle=0$ . We deduce $x\in v^{\perp}_{1}\cap v^{\perp}_{2}$ as claimed.

In other words, if the faces $F_{1}\unlhd\mathcal{K}$ and $F_{2}\unlhd\mathcal{K}$ are exposed by $v_{1}$ and $v_{2}$ , respectively, then the intersection $F_{1}\cap F_{2}$ is a face exposed by the sum $v_{1}+v_{2}$ .

A convex cone is called facially exposed if all of its faces are exposed. The distinction between faces and exposed faces may appear mild at first sight; however, we will see that it is exactly this distinction that can cause difficulties for preprocessing techniques for general conic problems.

Definition 2.2.7 (Conjugate face).

*With any face $F$ of a convex cone $\mathcal{K}$ , we associate a face of the dual cone $\mathcal{K}^{*}$ , called the conjugate face, $F^{\triangle}:=\mathcal{K}^{*}\cap F^{\perp}$ . *

Equivalently, $F^{\triangle}$ is the face of $\mathcal{K}^{*}$ exposed by any point $x\in\operatorname{{ri}}F$ , that is $F^{\triangle}=\mathcal{K}^{*}\cap x^{\perp}$ . Thus, in particular, conjugate faces are always exposed. Not surprisingly, one can readily verify that equality $(F^{\triangle})^{\triangle}=F$ holds if, and only if, the face $F\unlhd\mathcal{K}$ is exposed.

We illustrate the concepts with our two running examples, ${\bf R}^{n}_{+}$ and ${\bf S}^{n}_{+}$ , keeping in mind the parallels between the two.

Example 2.2.8 (Faces of ${\bf R}^{n}_{+}$ ).

For any index set $I\subseteq\{1,\ldots,n\}$ , the set

[TABLE]

is a face of ${\bf R}^{n}_{+}$ , and all faces of ${\bf R}^{n}_{+}$ are of this form. In particular, observe that all faces of ${\bf R}^{n}_{+}$ are linearly isomorphic to ${\bf R}^{k}_{+}$ for some positive integer $k$ . In this sense, ${\bf R}^{n}_{+}$ is “self-replicating”. The relative interior of $F_{I}$ consists of all points in $F_{I}$ with $x_{i}>0$ for indices $i\notin I$ . The face $F_{I}$ is exposed by the vector $v\in{\bf R}^{n}_{+}$ with $v_{i}=1$ for all $i\in I$ and $v_{i}=0$ for all $i\notin I$ . In particular, ${\bf R}^{n}_{+}$ is a facially exposed convex cone. The face conjugate to $F_{I}$ is $F_{I^{c}}$ .

Example 2.2.9 (Faces of ${\bf S}^{n}_{+}$ ).

There are a number of different ways to think about (and represent) faces of the PSD cone ${\bf S}^{n}_{+}$ . In particular, one can show that faces ${\bf S}^{n}_{+}$ are in correspondence with linear subspaces of ${\bf R}^{n}$ . More precisely, for any $r$ -dimensional linear subspace $\mathcal{R}$ of ${\bf R}^{n}$ , the set

[TABLE]

is a face of ${\bf S}^{n}_{+}$ . Conversely, any face of ${\bf S}^{n}_{+}$ can be written in the form (2.3), where ${\mathcal{R}}$ is the range space of any matrix $X$ lying in the relative interior of the face. The relative interior of $F_{\mathcal{R}}$ consists of all matrices $X\in{\bf S}^{n}_{+}$ whose range space coincides with ${\mathcal{R}}$ . Moreover, for any matrix $V\in{\bf R}^{n\times r}$ satisfying $\operatorname{{range}}V={\mathcal{R}}$ , we have the equivalent description

[TABLE]

In particular, $F_{\mathcal{R}}$ is linearly isomorphic to the $r$ -dimensional positive semi-definite cone ${\bf S}^{r}_{+}$ . The face conjugate to $F_{\mathcal{R}}$ is $F_{\mathcal{R}^{\perp}}$ and can be equivalently written as

[TABLE]

for any matrix $U\in{\bf R}^{n\times(n-r)}$ satisfying $\operatorname{{range}}U={\mathcal{R}}^{\perp}$ . Notice that then the matrix $UU^{T}$ lies in the relative interior of $F^{\triangle}$ and therefore $UU^{T}$ exposes the face $F_{\mathcal{R}}$ . In particular, ${\bf S}^{n}_{+}$ is facially exposed and also self-replicating.

2.3 Conic optimization problems

Modern conic optimization draws fundamentally from “duality”: every conic optimization problem gives rise to a related conic optimization problem, called its dual. Consider the primal-dual pair:

[TABLE]

Here, $\mathcal{K}$ is a closed convex cone in ${\mathbb{\bf E}}$ and the mapping $A\colon{\mathbb{\bf E}}\to{\bf\mathbb{\bf F}}$ is linear. Eliminating the trivial case that the system $Ax=b$ has no solution, we will always assume that $b$ lies in $\operatorname{range}A$ , and that $\mathcal{K}$ has nonempty interior. Two examples of conic optimization are of main importance for us: linear programming (LP) corresponds to $\mathcal{K}={\bf R}^{n}_{+}$ , ${\bf\mathbb{\bf F}}={\bf R}^{m}$ and semi-definite programming (SDP) corresponds to $\mathcal{K}={\bf S}^{n}_{+}$ , ${\bf\mathbb{\bf F}}={\bf R}^{m}$ . The adjoint $A^{*}$ in both cases was computed in Examples 2.1.3 and 2.1.4. We will also use the following notation for the primal and dual feasible regions:

[TABLE]

It is important to note that the dual can be put in a primal form by introducing slack variables $s\in{\mathbb{\bf E}}$ leading to the equivalent formulation

[TABLE]

To a reader unfamiliar with conic optimization, it may be unclear how the dual arises naturally from the primal. Let us see how it can be done. The dual problem (D) can be discovered through “Lagrangian duality” in convex optimization. Define the Lagrangian function

[TABLE]

and observe the equality

[TABLE]

Thus the primal problem (P) is equivalent to

[TABLE]

Formally exchanging min/max, yields exactly the dual problem (D)

[TABLE]

The primal-dual pair always satisfies the weak duality inequality: for any primal feasible $x$ and any dual feasible $y$ , we have

[TABLE]

Thus for any feasible point $y$ of the dual, its objective value $\langle b,y\rangle$ lower-bounds the optimal value of the primal. The weak duality inequality (2.6) leads to the following sufficient conditions for optimality.

Proposition 2.3.1 (Complementary slackness).

Suppose that $(x,y)$ are a primal-dual feasible pair for (P), (D) and suppose that complementary slackness holds:

[TABLE]

Then $x$ is a minimizer of (P) and $y$ is a maximizer of (D).

The sufficient conditions for optimality of Proposition 2.3.1 are often summarized as the primal-dual system:

[TABLE]

Derivations of algorithms generally require necessary optimality conditions, i.e., failure of the necessary conditions at a current approximation of the optimum leads to improvement steps. When are sufficient conditions for optimality expressed in Proposition 2.3.1 necessary? In other words, when can we be sure that optimality of a primal solution $x$ can be certified by the existence of some dual feasible point $y$ , such that the pair satisfies the complementary slackness condition? Conditions guaranteeing existence of Lagrange multipliers are called constraint qualifications. The most important condition of this type is called strict feasibility, or Slater’s condition, and is the main topic of this article.

Definition 2.3.2 (Strict feasibility/Slater condition).

We say that (P) is strictly feasible if there exists a point $x\succ_{\mathcal{K}}0$ satisfying $Ax=b$ . The dual (D) is strictly feasible if there exists a point $y$ satisfying $A^{*}y\prec_{\mathcal{K}^{*}}c$ .

The following result is the cornerstone of conic optimization.

Theorem 2.3.3 (Strong duality).

If the primal objective value is finite and the problem (P) is strictly feasible, then the primal and dual optimal values are equal, and the dual (D) admits an optimal solution. In addition, for any $x$ that is optimal for (P), there exists a vector $y$ such that $(x,y)$ satisfies complementary slackness.

Similarly, if the dual objective value is finite and the dual (D) satisfies strict feasibility, then the primal and dual optimal values are equal and the primal (P) admits an optimal solution. In addition, for any $y$ that is optimal for (D), there exists a point $x$ such that $(x,y)$ satisfies complementary slackness.

Without a constraint qualification such as strict feasibility, the previous theorem is decisively false. The following examples show that without strict feasibility, the primal and dual optimal values may not even be equal, and even if they are equal, the optimal values may be unattained.

Example 2.3.4 (Infinite gap).

Consider the following primal SDP in (2.4):

[TABLE]

The corresponding dual SDP is the infeasible problem

[TABLE]

Both the primal and the dual fail strict feasibility in this example.

Example 2.3.5 (Positive duality gap).

Consider the following primal SDP in (2.4):

[TABLE]

The constraint $X\succeq 0$ with $X_{33}=0$ implies equality $X_{13}=0$ , and hence $X_{22}=1$ . Therefore, $v_{p}=1$ and the matrix $e_{2}e_{2}^{T}$ is optimal.

The corresponding dual SDP is

[TABLE]

This time the SDP constraint implies $y_{2}=0$ . We deduce that $(y_{1},y_{2})=(0,0)$ is optimal for the dual and hence $v_{d}=0<v_{p}=1$ . There is a finite duality gap between the primal and dual problems. The culprit again is that both the primal and the dual fail strict feasibility.

Example 2.3.6 (Zero duality gap, but no attainment).

Consider the dual SDP

[TABLE]

The only feasible point is $y=0$ . Thus the optimal value is $v_{d}=0$ and is attained. The primal SDP is

[TABLE]

Notice $X_{11}>0$ for all feasible $X$ . On the other hand, the sequence $X^{k}=\begin{bmatrix}1/k&1/2\cr 1/2&k\end{bmatrix}$ is feasible and satisfies $X^{k}_{11}\to 0$ . Thus there is no duality gap, meaning $0=v_{p}=v_{d}$ , but the primal optimal value is not attained. The culprit is that the dual SDP is not strictly feasible.

Example 2.3.7 (Convergence to dual optimal value).

Numerical solutions of problems inevitably suffer from some perturbations of the data, that is a perturbed problem is in fact solved. Moreover, often it is tempting to explicitly perturb a constraint in the problem, so that strict feasibility holds. This example shows that this latter strategy results in the dual of the problem being solved, as opposed to the problem under consideration.

We consider the primal-dual SDP pair in Example 2.3.5. In particular, suppose first that we want to solve the dual problem. We canonically perturb the right-hand side of the dual in (2.8)

[TABLE]

for some matrix $P\succ 0$ and real $\epsilon>0$ . Strict feasibility now holds and we hope that the optimal values of the perturbed problems converge to that of the original one $v_{d}(0)$ . We can rewrite feasibility for the perturbed problem as

[TABLE]

A triple $(y_{1},y_{2},y_{3})$ is strictly feasible if, and only if, the leading principal minors $M_{11},M_{12},M_{123}$ of the left-hand side matrix in (2.9) are all positive. We have $M_{11}=\epsilon P_{11}>0$ . The second leading principal minor as a function of $y_{2}$ is

[TABLE]

In particular, rearranging we have $M_{12}(y_{2})>0$ whenever

[TABLE]

The last minor $M_{123}$ is positive for sufficiently negative $y_{1}$ by the Schur complement. Consequently the perturbed problems satisfy

[TABLE]

and therefore

[TABLE]

That is the primal optimal value is obtained in the limit rather than the dual optimal value that is sought.

Let us look at an analogous perturbation to the primal problem. Let $A$ be the linear operator $A(X)=(X_{33},X_{22}+2X_{13})$ and set $b=(0,1)$ . Consider the perturbed problems

[TABLE]

for some fixed real $\epsilon>0$ and a matrix $P\succ 0$ . Each such problem is strictly feasible, since the positive definite matrix $\widehat{X}+\epsilon P$ is feasible for any matrix $\widehat{X}$ that is feasible for the original primal problem.

In long form, the perturbed primal problems are

[TABLE]

Consider the matrix

[TABLE]

*This matrix satisfies the linear system $AX=b$ by construction and is positive semi-definite for all sufficiently large $X_{11}$ . We deduce $v_{p}(\epsilon)=0=v_{d}$ for all $\epsilon>0$ . Again, as $\epsilon$ tends to zero we obtain the dual optimal value rather than the sought after primal optimal value. *

2.4 Commentary

We follow here well-established notation in convex optimization, as illustrated for example in the monographs of Barvinok [17], Ben-Tal-Nemirovski [20], Borwein-Lewis [21], and Rockafellar [123]. The handbook of SDP [146] and online lecture notes [85] are other excellent sources in the context of semi-definite programming. These include discussion on the facial structure. The relevant results stated in the text can all be found for instance in Rockafellar [123]. The example 2.3.5 is a modification of the example in [114]. In addition, note that the three examples 2.3.4, 2.3.5, 2.3.6 have matrices with the special perdiagonal structure. The universality of such special structure in “ill-posed” SDPs has recently been investigated in great length by Pataki [105].

Chapter 3 Virtues of strict feasibility

We have already seen in Theorem 2.3.3 that strict feasibility is essential to guarantee dual attainment and therefore for making the primal-dual optimality conditions (2.7) meaningful. In this section, we continue discussing the impact of strict feasibility on numerical stability. We begin with the theorems of the alternative, akin to the Farkas’ Lemma in linear programming, which quantify the extent to which strict feasibility holds. We then show how such systems appear naturally in stability measures of the underlying problem.

3.1 Theorem of the alternative

The definition we have given of strict feasibility (Slater) is qualitative in nature, that is it involves no measurements. A convenient way to measure the extent to which strict feasibility holds (i.e. its strength) arises from dual characterizations of the property. We will see that strict feasibility corresponds to inconsistency of a certain auxiliary system. Measures of how close the auxiliary system is to being consistent yield estimates of “stability” of the problem.

The aforementioned dual characterizations stem from the basic hyperplane separation theorem for convex sets.

Theorem 3.1.1 (Hyperplane separation theorem).

Let $Q_{1}$ and $Q_{2}$ be two disjoint nonempty convex sets. Then there exists a nonzero vector $v$ and a real number $c$ satisfying

[TABLE]

When one of the sets is a cone, the separation theorem takes the following “homogeneous” form.

Theorem 3.1.2 (Homogeneous separation).

Consider a nonempty closed convex set $Q$ and a closed convex cone $\mathcal{K}$ with nonempty interior. Then exactly one of the following alternatives holds.

The set $Q$ intersects the interior of $\mathcal{K}$ . 2. 2.

There exists a vector $0\neq v\in\mathcal{K}^{*}$ satisfying $\langle v,x\rangle\leq 0$ for all $x\in Q$ .

Moreover, for any vector $v$ satisfying the alternative 2, the region $Q\cap\mathcal{K}$ is contained in the proper face $v^{\perp}\cap\mathcal{K}$ .

Proof 3.1.3.

Suppose that $Q$ does not intersect the interior of $\mathcal{K}$ . Then the convex cone generated by $Q$ , denoted by $\mbox{\rm cone}\,Q$ , does not intersect $\mbox{\rm int}\,\mathcal{K}$ either. The hyperplane separation theorem (Theorem 3.1.1) shows that there is a nonzero vector $v$ and a real number $c$ satisfying

[TABLE]

Setting $x=y=0$ , we deduce $c=0$ . Hence $v$ lies in $\mathcal{K}^{*}$ and 2 holds.

Conversely, suppose that 2 holds. Then the inequalities $0\leq\langle v,x\rangle\leq 0$ hold for all $x\in Q\cap\operatorname{{\mathcal{K}}}$ . Thus we deduce that the intersection $Q\cap\operatorname{{\mathcal{K}}}$ lies in the proper face $v^{\perp}\cap\operatorname{{\mathcal{K}}}$ . Hence the alternative 1 can not hold.

Let us now specialize the previous theorem to the primal problem (P), by letting $Q$ be the affine space $\{x:Ax=b\}$ . Indeed, this is the main result of this subsection and it will be used extensively in what follows.

Theorem 3.1.4 (Theorem of the alternative for the primal).

Suppose that $\operatorname{{\mathcal{K}}}$ is a closed convex cone with nonempty interior. Then exactly one of the following alternatives holds.

The primal (P) is strictly feasible. 2. 2.

The auxiliary system is consistent:

[TABLE]

Suppose that the primal (P) is feasible. Then the auxiliary system (3.1) is equivalent to the system

[TABLE]

Moreover, then any vector $v$ satisfying either of the equivalent systems, (3.1) and (3.2), yields a proper face $(A^{*}y)^{\perp}\cap\operatorname{{\mathcal{K}}}$ containing the primal feasible region ${\mathcal{F}}_{p}$ .

Proof 3.1.5.

Set $Q:=\{x:Ax=b\}$ . Clearly strict feasibility of (P) is equivalent to alternative 1 of Theorem 3.1.2. Thus it suffices to show that the auxiliary system (3.1) is equivalent to the alternative 2 of Theorem 3.1.2. To this end, note that for any vector $y$ satisfying (3.1), the vector $v:=A^{*}y$ satisfies the alternative 2 of Theorem 3.1.2. Conversely, consider a vector $0\neq v\in\operatorname{{\mathcal{K}}}^{*}$ satisfying $\langle v,x\rangle\leq 0$ for all $x\in Q$ . Fix a point $\hat{x}\in Q$ and observe the equality $Q=\hat{x}+\operatorname{{null}}(A)$ . An easy argument then shows that $v$ is orthogonal to $\operatorname{{null}}(A)$ , and therefore can be written as $v=A^{*}y$ for some vector $y$ . We deduce $\langle b,y\rangle=\langle A\hat{x},y\rangle=\langle\hat{x},v\rangle\leq 0$ , and therefore $y$ satisfies (3.1).

Next, assume that (P) is feasible. Suppose $y$ satisfies (3.1). Then for any feasible point $x$ of (P), we deduce $0\leq\langle x,A^{*}y\rangle=\langle b,y\rangle\leq 0$ . Thus $y$ satisfies the system (3.2). It follows that the two systems (3.1) and (3.2) are equivalent and that the proper face $(A^{*}y)^{\perp}\cap\operatorname{{\mathcal{K}}}$ contains the primal feasible region ${\mathcal{F}}_{p}$ , as claimed.

Suppose that $\mathcal{F}_{p}$ is nonempty. Then if strict feasibility fails, there always exists a “witness” (or “short certificate”) $y$ satisfying the auxiliary system (3.2). Indeed, given such a vector $y$ , one immediately deduces, as in the proof, that $\mathcal{F}_{p}$ is contained in the proper face $(A^{*}y)^{\perp}\cap\operatorname{{\mathcal{K}}}$ of $\operatorname{{\mathcal{K}}}$ . Such certificates will in a later section be used constructively to regularize the conic problem through the FR procedure.

The analogue of Theorem 3.1.4 for the dual (D) quickly follows.

Theorem 3.1.6 (Theorem of the alternative for the dual).

Suppose that $\operatorname{{\mathcal{K}}}^{*}$ has nonempty interior. Then exactly one of the following alternatives holds.

The dual (D) is strictly feasible. 2. 2.

The auxiliary system is consistent:

[TABLE]

Suppose that the dual (D) is feasible. Then the auxiliary system (3.3) is equivalent to the system

[TABLE]

Moreover, for any vector $x$ satisfying either of the equivalent systems, (3.3) and (3.4), yields a proper face $x^{\perp}\cap\operatorname{{\mathcal{K}}}^{*}$ containing the feasible slacks $\{c-A^{*}y:y\in{\mathcal{F}}_{d}\}$ .

Proof 3.1.7.

Apply Theorem 3.1.4 to the equivalent formulation (2.5) of the dual (D).

3.2 Stability of the solution

In this section, we explain the impact of strict feasibility on stability of the conic optimization problem through quantities naturally arising from the auxiliary system (3.1). For simplicity we focus on the primal problem (P), though an entirely analogous development is possible for the dual, for example by introducing slack variables.

We begin with a basic question: at what rate does the optimal value of the primal problem (P) change relative to small perturbations of the right-hand-side of the linear equality constraints? To formalize this question, define the value function

[TABLE]

The value function $v\colon{\bf\mathbb{\bf F}}\to[-\infty,+\infty]$ thus defined is convex, meaning that its epigraph

[TABLE]

is a convex set. Seeking to understand stability of the primal (P) under perturbation of the right-hand-side $b$ , it is natural to examine the variational behavior of the value function. There is an immediate obstruction, however. If $A$ is not surjective, then there are arbitrarily small perturbations of $b$ making $v$ take on infinite values. As a result, in conjunction with strict feasibility, we will often make the mild assumption that $A$ is surjective. These two properties taken together are refereed to as the Mangasarian-Fromovitz Constraint Qualification (MFCQ).

Definition 3.2.1 (Mangasarian-Fromovitz CQ).

We say that the Mangasarian-Fromovitz Constraint Qualification (MFCQ) holds for (P) if $A$ is surjective and (P) is strictly feasible.

The following result describes directional derivatives of the value function.

Theorem 3.2.2 (Directional derivative of the value function).

Suppose the primal problem (P) is feasible and its optimal value is finite. Let $\rm{Sol}\textbf{(D)}$ be the set of optimal solutions of the dual (D). Then $\rm{Sol}\textbf{(D)}$ is nonempty and bounded if, and only if, MFCQ holds. Moreover, under MFCQ, the directional derivative $v^{\prime}(0;w)$ of the value function at $\Delta=0$ in direction $w$ admits the representation

[TABLE]

In particular, in the notation of the above theorem, the local Lipschitz constant of $v$ at the origin,

[TABLE]

coincides with the norm of the maximal-norm dual optimal solution, and is finite if, and only if, MFCQ holds. Is there then an upper-bound on the latter that we can easily write down? Clearly, such a quantity must measure the strength of MFCQ, and is therefore intimately related to the auxiliary system (3.1). To this end, let us define the condition number

[TABLE]

This number is a quantitative measure of MFCQ, and will appear in latter sections as well. Some thought shows that it is in essence measuring how close the auxiliary system (3.1) is to being consistent.

Lemma 3.2.3 (Condition number and MFCQ).

The condition number cond(P)** is nonzero if, and only if, MFCQ holds.

Proof 3.2.4.

Suppose cond(P)** is nonzero. Then clearly $A$ is surjective, since otherwise we could find a unit vector $y$ with $A^{*}y=0$ and $\langle b,y\rangle\leq 0$ . Moreover, the auxiliary system (3.1) is clearly inconsistent, and therefore (P) is strictly feasible.

Conversely, suppose MFCQ holds. Assume for the sake of contradiction $\rm cond\textbf{(P)}\,=0$ . Then there exists a unit vector $y$ satisfying $0\neq A^{*}y\in\operatorname{{\mathcal{K}}}^{*}$ and $\langle b,y\rangle\leq 0$ . Thus (3.1) is consistent, a contradiction.

Theorem 3.2.5 (Boundedness of the dual solution set).

Suppose the problem (P) is feasible with a finite optimal value val. If the condition number $\rm cond\textbf{(P)}$ is nonzero, then the inequality

[TABLE]

holds for all dual optimal solutions $y$ .

Proof 3.2.6.

Consider an optimal solution $y$ of the dual (D). The inclusion $c-A^{*}y\in\operatorname{{\mathcal{K}}}^{*}$ implies $\mbox{\rm dist}_{\operatorname{{\mathcal{K}}}^{*}}(-A^{*}y/\|y\|)\leq\|c\|/\|y\|$ . Moreover, we have $\langle b,\frac{-y}{\|y\|}\rangle=\frac{-\mbox{\rm\bf val}}{\|y\|}$ . We deduce $\rm cond\textbf{(P)}\leq\frac{\max\{\|c\|,-\mbox{\rm\bf val}\}}{\|y\|}$ and the result follows.

Thus the Lipschitz constant of the value function depends on the extent to which MFCQ holds through the condition number. What about stability of the solution set itself? The following theorem, whose proof we omit, answers this question.

Theorem 3.2.7 (Stability of the solution set).

Suppose (P) satisfies MFCQ. Let $\mathcal{F}_{p}(g)$ be the solution set of the perturbed system

[TABLE]

Fix a putative solution $\bar{x}\in\mathcal{F}_{p}$ . Then there exist constants $c>0$ and $\epsilon>0$ so that the inequality

[TABLE]

holds for any $x\in\operatorname{{\mathcal{K}}}\cap B_{\epsilon}(\bar{x})$ and $g\in B_{\epsilon}(0)$ . The infimal value of $c$ over all choices of $\epsilon>0$ so that the above inequalities hold is exactly

[TABLE]

In particular, under MFCQ, we can be sure that for any point $\bar{x}\in\mathcal{F}_{p}$ , there exist $c$ and $\epsilon>0$ satisfying

[TABLE]

In other words, the distance $\mbox{\rm dist}(\bar{x};\mathcal{F}_{p}(g))$ , which measures the how far $\mathcal{F}_{p}(g)$ has moved relative to $\bar{x}$ , is bounded by a multiple of the perturbation parameter $\|g\|$ . The proportionality constant $c$ is fully governed by the strength of MFCQ, as measured by the quantity (3.5).

3.3 Distance to infeasibility

In numerical analysis, the notion of stability is closely related to the “distance to infeasibility” – the smallest perturbation needed to make the problem infeasible. A simple example is the problem of solving an equation $Lx=b$ for an invertible matrix $L\colon{\bf R}^{n}\to{\bf R}^{n}$ . Then the Eckart-Young theorem shows equality

[TABLE]

Here $\|G\|$ denotes the operator norm of $G$ . The left-hand-side measures the smallest perturbation $G$ needed to make the system $(L+G)x=b$ singular, while the right-hand-side measures the Lipschitz dependence of the solution to the linear system $Lx=b$ relative to perturbations in $b$ , and yet the two quantities are equal. An entirely analogous situation holds in conic optimization, with MFCQ playing the role of invertibility.

Definition 3.3.1 (Distance to infeasibility).

The distance to infeasibility of (P) is the infimum of the quantity $\max\{\|G\|,\|g\|\}$ over linear mappings $G$ and vectors $g$ such that the system

[TABLE]

This quantity does not change if instead of the loss of feasibility, we consider the loss of strict feasibility. The following fundamental result equates the condition number (measuring the strength of MFCQ) and the distance to infeasibility.

Theorem 3.3.2 (Strict feasibility and distance to infeasibility).

The following exact equation is always true:

[TABLE]

3.4 Commentary

The classical theorem of the alternative is Farkas Lemma that appears in proofs of duality in linear programming, as well as in more general nonlinear programming, after linearizations. This and more general theorems of the alternative are given in e.g., the 1969 book by Mangasarian [92] and in the 1969 survey paper by Ben-Israel [18]. The specific theorems of the alternative that we use are similar to the one used in the FR development in [23, 22, 24].

The Mangasarian-Fromovitz CQ was introduced in [91]. This condition and its equivalence to stability with respect to perturbations in the data and compactness of the multiplier set has been the center of extensive research, e.g., [54]. The analogous conditions for general nonlinear convex constraint systems is the Robinson regularity condition, e.g.,[121, 122]. The notion of distance to infeasibility and relations to condition numbers was initiated by Renegar e.g., [120, 118, 119, 108]. The relation with Slater’s condition is clear. Theorem 3.3.2, as stated, appears in [44], though in essence it is present in Renegar’s work [120, 118].

Chapter 4 Facial reduction

Theorems 3.1.4 and 3.1.6 have already set the stage for the “Facial Reduction” procedure, used for regularizing degenerate conic optimization problems by restricting the problem to smaller and smaller dimensional faces of the cone $\operatorname{{\mathcal{K}}}$ . In this section, we formalize this viewpoint, empathizing semi-definite programming. Before we proceed with a detailed description, it is instructive to look at the simplest example of Linear Programming. In this case, a single iteration of the facial reduction procedure corresponds to finding redundant variables (in the primal) and implicit equality constraints (in the dual).

4.1 Preprocessing in linear programming

Improvements in the solution methods for large-scale linear programming problems have been dramatic since the late 1980’s. A technique that has become essential in commercial software is a preprocessing step for the linear program before sending it to the solver. The preprocessing has many essential features. For example, it removes redundant variables (in the primal) and implicit equality constraints (in the dual) thus potentially dramatically reducing the size of the problem while simultaneously improving the stability of the model. These steps in linear programming are examples of the Facial Reduction procedure, which we will formalize shortly.

Example 4.1.1 (primal facial reduction).

Consider the problem

[TABLE]

If we sum the two constraints we see

[TABLE]

Thus the coordinates $x_{1}$ , $x_{4}$ , and $x_{5}$ are identically zero on the entire feasible set. In other words, the feasible region is contained in the proper face $\{x\geq 0:x_{1}=x_{4}=x_{5}=0\}$ of the cone ${\bf R}^{n}_{+}$ . The zero coordinates can easily be eliminated and the corresponding columns discarded, yielding the equivalent simplified problem in the smaller face:

[TABLE]

The second equality can now also be discarded as it is is equivalent to the first.

How can such implicit zero coordinates be discovered systematically? Not surprisingly, the auxiliary system (3.2) provides the answer:

[TABLE]

Suppose $y$ is feasible for this auxiliary system. Then for any $x$ feasible for the problem, we deduce $0\leq\sum_{i}x_{i}(A^{T}y)_{i}=x^{T}(A^{T}y)=b^{T}y=0$ . Thus all the coordinates $x_{i}$ , for which the strict inequality $(A^{T}y)_{i}>0$ holds, must be zero.

Example 4.1.2 (dual facial reduction).

A similar procedure applies to the dual. Consider the problem

[TABLE]

Twice the third row plus the fourth row sums to zero. We conclude that the last two constraints are implicit equality constraints. Thus after substituting $\begin{pmatrix}y_{1}\cr y_{2}\end{pmatrix}=\begin{pmatrix}1\cr 0\end{pmatrix}+t\begin{pmatrix}1\cr 1\end{pmatrix}$ , we obtain a simple univariate problem. Again, this discovery of implicit equality constraints can be done systematically by considering the auxiliary system (3.4):

[TABLE]

Suppose we find such a vector $x$ . Then for any feasible vector $y$ we deduce $0\leq\sum_{i}x_{i}(c-A^{T}y)_{i}=x^{T}(c-A^{T}y)=0$ . Thus for each positive component $x_{i}>0$ , the corresponding inequality $(c-A^{T}y)_{i}\geq 0$ is fulfilled with equality along the entire feasible region.

4.2 Facial reduction in conic optimization

Keeping in mind the example of Linear Programming, we now formally describe the Facial Reduction procedure. To do this, consider the primal problem (P) failing Slater’s condition. Our goal is to find an equivalent problem that does satisfy Slater’s condition. To this end, suppose that we had a description of $\operatorname{{face}}(\mathcal{F}_{p},\operatorname{{\mathcal{K}}})$ – the minimal face of $\operatorname{{\mathcal{K}}}$ containing the feasible region $\mathcal{F}_{p}$ . Then we could replace $\operatorname{{\mathcal{K}}}$ with $\operatorname{{\mathcal{K}}}^{\prime}:=\operatorname{{face}}(\mathcal{F}_{p},\operatorname{{\mathcal{K}}})$ , ${\mathbb{\bf E}}$ with ${\mathbb{\bf E}}^{\prime}:=\mbox{\rm span}\,\operatorname{{\mathcal{K}}}^{\prime}$ , and $A$ with its restriction to ${\mathbb{\bf E}}^{\prime}$ . The resulting smaller dimensional primal problem would automatically satisfy Slater’s condition, since $\mathcal{F}_{p}$ intersects the relative interior of $\operatorname{{\mathcal{K}}}^{\prime}$ .

The Facial Reduction procedure is a conceptual method that at termination discovers $\operatorname{{face}}(\mathcal{F}_{p},\operatorname{{\mathcal{K}}})$ . Suppose that $\mathcal{K}$ has nonempty interior. In the first iteration, the scheme determines any vector $y$ satisfying the auxiliary system (3.2). If no such vector exists, Slater’s condition holds and the method terminates. Else, Theorem 3.1.4 guarantees that $\mathcal{F}_{p}$ lies in proper face $\mathcal{K}^{\prime}:=(A^{*}y)^{\perp}\cap\mathcal{K}$ . Treating $\mathcal{K}^{\prime}$ as a subset of the ambient Euclidean space ${\mathbb{\bf E}}^{\prime}:=\mbox{\rm span}\,\mathcal{K}^{\prime}$ yields the smaller dimensional reformulation of (P) :

[TABLE]

We can now repeat the process on this problem. Since the dimension of the problem decreases with each facial reduction iteration, the procedure will terminate after at most $\dim{\mathbb{\bf E}}$ steps.

Definition 4.2.1 (Singularity degree).

*The singularity degree of (P), denoted ${\rm sing}\textbf{(P)}$ ,

is the minimal number of iterations that are necessary for the Facial Reduction to terminate, over all possible choices of certificates generated by the auxiliary systems in each iteration.*

The singularity degree of linear programs is at most one, as we will see shortly. More generally, such as for semi-definite programming problems, the singularity degree can be much higher.

The Facial Reduction procedure applies to the dual problem (D) by using the equivalent primal form (2.5) and using Theorem 3.1.6. We leave the details to the reader.

4.3 Facial reduction in semi-definite programming

Before discussing further properties of the Facial Reduction algorithm in conic optimization, let us illustrate the procedure in semi-definite programming. To this end, consider the primal problem (P) with $\mathcal{K}={\bf S}^{n}_{+}$ . Suppose that we have available a vector $y$ feasible for the auxiliary system (3.2). Form now an eigenvalue decomposition

[TABLE]

where $\begin{bmatrix}U&V\end{bmatrix}\in{\bf R}^{n\times n}$ is an orthogonal matrix and $\Lambda\in{\bf S}^{r}_{++}$ is a diagonal matrix. Then as we have seen in Example 2.2.9, the matrix $A^{*}y$ exposes the face $V{\bf S}^{n-r}V^{T}$ of ${\bf S}^{n}_{+}$ . Consequently, defining the linear map $\widetilde{A}(Z)=A(VZV^{T})$ , the primal problem (P) is equivalent to the smaller dimensional SDP

[TABLE]

Thus one step of facial reduction is complete. Similarly let us look at the dual problem (D). Suppose that $X$ is feasible for the auxiliary system (3.4). Let us form an eigenvalue decomposition

[TABLE]

where $\begin{bmatrix}U&V\end{bmatrix}\in{\bf R}^{n\times n}$ is an orthogonal matrix and $\Lambda\in{\bf S}^{r}_{++}$ is a diagonal matrix. The face exposed by $X$ , namely $V{\bf S}^{n-r}V^{T}$ , contains all the feasible slacks $\{C-A^{*}y:y\in\mathcal{F}_{d}\}$ by Theorem 3.1.6. Thus defining the linear map $L(y):=V^{T}(A^{*}y)V$ , the dual (D) is equivalent to the problem

[TABLE]

Thus one step of Facial Reduction is complete and the process can continue.

To drive the point home, the following simple example shows that for SDP the singularity degree can indeed be strictly larger than one.

Example 4.3.1 (Singularity degree larger than one).

Consider the primal SDP feasible region

[TABLE]

Notice the equality $X_{22}=0$ forces the second row and column of $X$ to be zero, i.e. they are redundant. Let us see how this will be discovered by Facial Reduction.

The linear map $A\colon{\bf S^{3}}\to{\bf R}^{3}$ has the form

[TABLE]

for the matrices

[TABLE]

Notice that $\mathcal{F}_{p}$ is nonempty since it contains the rank $1$ matrix $e_{1}e_{1}^{T}$ . The auxiliary system (3.2) then reads

[TABLE]

Looking at the second principal minor, we see $v_{3}=0$ . Thus all feasible $v$ are positive multiples of the vector $e_{3}=\begin{pmatrix}0&0&1\end{pmatrix}^{T}$ . One step of Facial Reduction using the exposing vector $A^{*}e_{3}=e_{2}e_{2}^{T}$ yields the equivalent reduced region

[TABLE]

This reduced problem clearly fails Slater’s condition, and Facial Reduction can continue. Thus the singularity degree of this problem is exactly two.

The pathological Example 4.3.1 can be generalized to higher dimensional space with $n=m$ , by nesting, leading to problems with singularity degree $n-1$ ; the construction is explained in Tunçel [136, page 43].

4.4 What facial reduction actually does

There is a direct and enlightening connection between Facial Reduction and the geometry of the image set $A\operatorname{{\mathcal{K}}}$ . To elucidate this relationship, we first note the following equivalent characterization of Slater’s condition.

Proposition 4.4.1 (Range space characterization of Slater).

The primal problem (P) is strictly feasible if, and only if, the vector $b$ lies in the relative interior of $A(\mathcal{K})$ .

The following is the central result of this section.

Theorem 4.4.2 (Fundamental description of the minimal face).

*Assume that the primal (P) is feasible. Then a vector $v$ exposes a proper face of $A(\mathcal{K})$ containing $b$ if, and only if, $y$ satisfies the auxiliary system (3.2). Defining for notational convenience $N:=\operatorname{{face}}(b,A(\mathcal{K}))$ , the following are true.

(I) We always have:*

[TABLE]

(II) For any vector $y\in{\bf\mathbb{\bf F}}$ the following equivalence holds:

[TABLE]

In particular, the inequality ${\rm sing}\textbf{(P)}\,\leq 1$ holds if, and only if, $\operatorname{{face}}(b,A(\mathcal{K}))$ is an exposed face of $A(\mathcal{K})$ .

Some commentary is in order. First, as noted in Proposition 4.4.1, the primal (P) is strictly feasible if, and only if, the right-hand-side $b$ lies in $\operatorname{{ri}}A(\operatorname{{\mathcal{K}}})$ . Thus when strict feasibility fails, the set $N=\operatorname{{face}}(b,A(\mathcal{K}))$ is a proper face of the image $A(\mathcal{K})$ . The theorem above yields the exact description $\operatorname{{face}}(\mathcal{F}_{p},\mathcal{K})=\mathcal{K}\cap A^{-1}N$ of the object we are after. On the other hand, determining a facial description of $A(\mathcal{K})$ is a difficult proposition. Indeed, even when $\mathcal{K}$ is a simple cone, the image $A(\mathcal{K})$ can be a highly nontrivial object. For instance, the image $A({\bf S}^{n}_{+})$ may fail to be facially exposed or even closed; examples are forthcoming.

Seeking to obtain a description of $N$ , one can instead try to find “certificates” $y$ exposing a proper face of $A(\mathcal{K})$ containing $b$ . Such vectors $y$ are precisely those satisfying the auxiliary system (3.2). In particular, part II of Theorem 4.4.2 yields a direct obstruction to having low singularity degree: ${\rm sing}\textbf{(P)}\,\leq 1$ if, and only if, $\operatorname{{face}}(b,A(\mathcal{K}))$ is an exposed face of $A(\mathcal{K})$ . Thus the lack of facial exposedness of the image $A(\mathcal{K})$ can become an obstruction. For the cone $\mathcal{K}={\bf R}^{n}_{+}$ , the image $A(\mathcal{K})$ is polyhedral and is therefore facially exposed. On the other hand, linear images of the cone $\mathcal{K}={\bf S}^{n}_{+}$ can easily fail to be facially exposed. This in essence is the reason why preprocessing for general conic optimization is much more difficult than its linear programming counterpart (having singularity degree at most one).

The following two examples illustrate the possibly complex geometry of image sets $A({\bf S}^{n})$ .

Example 4.4.3 (Linear image not closed).

Define the linear map $A\colon{\bf S}^{2}\to{\bf R}^{2}$ by

[TABLE]

Then the image $A({\bf S}_{+}^{2})$ is not closed, since

[TABLE]

More broadly, it is easy to see the equality

[TABLE]

Example 4.4.4 (Linear image that is not facially exposed).

Consider the feasible region in Example 4.3.1. There we showed that the singularity degree is equal to two. Consequently, by Theorem 4.4.2 we know that the minimal face of $A(\mathcal{K})$ containing $b=\begin{pmatrix}1&0&0\end{pmatrix}^{T}$ must be nonexposed.

Let us verify this directly. To this end, we can without loss of generality treat $A$ as mapping into ${\bf S}^{2}$ via

[TABLE]

and identify $b$ with $e_{1}e_{1}^{T}$ . Then the image $A({\bf S}^{3}_{+})$ is simply the sum,

[TABLE]

See Figure 4.1.

Consider the set

[TABLE]

We claim that $G$ is a face of $A({\bf S}^{3}_{+})$ and is therefore the minimal face containing $e_{1}e_{1}^{T}$ . Indeed, suppose we may write

[TABLE]

for some matrices $X,X^{\prime}\in{\bf S}^{3}_{+}$ . Comparing the $2,2$ -entries, we deduce $X_{22}=X_{22}^{\prime}=0$ and consequently $X_{12}=X_{12}^{\prime}=0$ . Comparing the 1,2-entries yields $X_{33}=X_{33}^{\prime}=0$ . Thus both summands lie in $G$ ; therefore $G$ is a face of the image $A({\bf S}^{3}_{+})$ . Next, using Lemma 2.1.7, observe

[TABLE]

Consequently, any matrix exposing $G$ must lie in the set

[TABLE]

On the other hand, the set

[TABLE]

is clearly strictly larger than $G$ . Hence $G$ is not an exposed face.

4.5 Singularity degree and the Hölder error bound in SDP

For semi-definite programming, the singularity degree plays an especially important role, controlling the Hölderian stability of the feasible region. Consider two sets $Q_{1}$ and $Q_{2}$ in ${\mathbb{\bf E}}$ . A convenient way to understand the regularity of the intersection $Q_{1}\cap Q_{2}$ is to determine the extent to which the computable residuals, $\mbox{\rm dist}(x,Q_{1})$ and $\mbox{\rm dist}(x,Q_{2})$ , bound the error $\mbox{\rm dist}(x,Q_{1}\cap Q_{2})$ . Relationships of this sort are commonly called error bounds of the intersection and play an important role for convergence and stability of algorithms. Of particular importance are Hölderian error bounds – those asserting the inequalities

[TABLE]

on compact sets, for some powers $q\geq 0$ . For semi-definite programming, the singularity degree precisely dictates the Hölder exponent $q$ .

Theorem 4.5.1 (Hölderian

error bounds from the singularity degree).

Consider a feasible primal SDP problem (P) and define the affine space

[TABLE]

Set $d:={\rm sing}\textbf{(P)}$ . Then for any compact set $U$ , there is a real $c>0$ so that

[TABLE]

What is remarkable about this result is that neither the dimension of the matrices $n$ , the number of inequalities $m$ , nor the rank of the matrices in the region $\mathcal{F}_{p}$ determines the error bound. Instead, it is only the single quantity, the singularity degree, that drives this regularity concept.

Example 4.5.2 (Worst-case example).

Consider the SDP feasible region

[TABLE]

For any feasible $X$ , the constraint $X_{22}=0$ forces $0=X_{12}=X_{33}$ . By an inductive argument, then we deduce $X_{1k}=0$ and $X_{k,k}=0$ for all $k=2,\ldots,n$ . Thus the feasible region $\mathcal{F}_{p}$ coincides with the ray $\mbox{\rm cone}\,(e_{1}e_{1}^{T})$ .

Given $\epsilon>0$ , define the matrix

[TABLE]

Observe that $X(\epsilon)$ violates the linear constraints only in the requirement $X_{22}=0$ . Consequently, the distance of $X(\epsilon)$ to the linear space $\mathcal{V}:=\{X\in{\bf S}^{n}:A(X)=b\}$ is on the order of $\epsilon$ . On the other hand, the distance of $X(\epsilon)$ to the solution set is at least on the order of $\epsilon^{2^{-(n-1)}}$ . This example shows that the Hölder exponent in this case is at least $2^{-(n-1)}$ . Combined with Theorem 4.5.1 and the fact that the feasible region contains rank one matrices, we deduce ${\rm sing}\textbf{(P)}=n-1$ and the Hölder exponent guaranteed by the theorem is sharp.

4.6 Towards computation

The Facial Reduction procedure is conceptual. To implement it, since the error compounds along the iterations, one must be able to either solve the auxiliary systems (3.2) (resp. (3.4)) to machine precision in each iteration or certify that the systems are inconsistent. On the other hand, in general, there is no reason to believe that solving a single auxiliary system is any easier than solving the original problem (P).

One computational approach for facial reduction in SDP, explored by Permenter-Parrilo [110], is to relax the auxiliary problems to ones that are solvable. Instead of considering (3.2), one can choose a convex cone $\widehat{\mathcal{K}}\subseteq{\bf S}^{n}_{+}$ so that consistency of the system can be checked:

[TABLE]

If a vector $y$ satisfying the system is found, then one can perform one step of facial reduction. If not, the scheme quits, possibly without having successfully deduced that Slater’s condition holds. Simple examples of $\widehat{\mathcal{K}}$ are the sets $\operatorname{{Diag}}({\bf R}^{n}_{+})$ and the cone dual to $\{X\in{\bf S}^{n}:\textrm{every }2\times 2\textrm{ minor of }X\textrm{is \text{PSD}}\}$ , where PSD denote positive semi-definite.

The above feasibility problem is then an instance of linear programming in the first case and of second-order cone programming in the second. More details are provided in [110]. Readers may be skeptical of this strategy since this technique will work only for special types of degeneracy. For example, the first relaxation $\operatorname{{Diag}}({\bf R}^{n}_{+})$ can only detect that some diagonal elements of $X$ are identically zero on the feasible region $\mathcal{F}_{p}$ . On the other hand, it does appear that degeneracy typically arising in applications is highly structured, and promising numerical results has been reported in [110].

There exist other influential techniques for regularizing conic optimization problems that are different from the facial reduction procedure. Two notable examples are Ramana’s extended dual [114] and the homogeneous self-dual embedding e.g., [39, 109]. The latter, in particular, is used by MOSEK [8] and SeDuMi [128]. A dual approach, called the conic expansion approach is discussed at length in [142], see also [88, 104, 90, 127]

We do not discuss these techniques here. Instead, we focus on the most promising class of conic optimization problems – those having singularity degree at most one. In the rest of the manuscript, we provide a series of influential examples, where the structure of the problem enables one to obtain feasible points of the auxiliary systems without having to invoke any solvers. Numerical illustrations show that the resulting reduced subproblems are often much smaller and more stable than the original.

4.7 Commentary

Preprocessing is essential in making LP algorithms efficient. A main ingredient is identifying primal and/or dual slack variables that are identically zero on the feasible set. This is equivalent to facial reduction that reduces the problem to faces of the nonnegative orthant, e.g., [67] The facial reduction procedure for general conic optimization started in [23, 22, 24]. The procedure provides a primal-dual pair of conic optimization problems that are proper in the sense that the dual of the dual yields the primal. Example 4.3.1 and extensions can be found in Tunçel [136, page 43]. The notation of singularity degree and its connection to error bounds (Theorem 4.5.1) was discovered by Sturm [129, Sect. 4]; Example 4.5.2 appears in this paper as well. Example 4.4.4 is motivated by Example 1 in [106]. Theorem 4.4.2 appeared in [47]. As mentioned previously, there are many approaches to “regularization” of conic optimization problem, aside from facial reduction, including the self-dual embedding and the approximation approaches in [110, 109, 112]. An alternate view of obtaining a dual without a constraint qualification was given in [114, 115], though a relationship to facial reduction was later explained in [116].

Part II Applications and illustrations

In this chapter, we discuss a number of diverse and important computational problems. This includes various matrix completion problems and discrete optimization problems such as the quadratic assignment problem, graph partitioning, and the strengthened relaxation of the maximum cut problem. In the final section, we also discuss sum of squares relaxations for polynomial optimization problems. In each case, we use the structure of the problem to determine a face of the positive semi-definite cone containing the entire feasible region. One exception is the matrix completion problem, where we instead determine a face containing the optimal face, as opposed to the entire feasible region. Numerical illustrations illustrate the efficacy of the approach.

Chapter 5 Matrix completions

We begin with various matrix completion problems. Broadly speaking, the goal is to complete a partially specified matrix, while taking into account a priori known structural properties such as a given rank or sparsity. There is a great variety of references on matrix completion problems; see for example [95, 84, 81, 68].

5.1 Positive semi-definite matrix completion

We begin with a classical problem of completing a PSD matrix from partially observed entries. To model this problem, consider an undirected graph $G=(V,E)$ with a vertex set $V=\{1,\ldots,n\}$ and graph edge set $E\subseteq\{ij:1\leq i\leq j\leq n\}$ . The symbols $ij$ and $ji$ always refer to the same edge. Notice we allow self-loops $ii\in E$ . For simplicity, we in fact assume that $E$ contains $ii$ for each node $i\in V$ . Elements $\omega$ of ${\bf R}^{E}$ are called partial matrices, as they specify entries of a partially observed symmetric $n\times n$ matrix. Given a partial matrix $\omega\in{\bf R}^{E}$ , the PSD* completion problem* asks to determine, if possible, a matrix $X$ in the set

[TABLE]

That is, we seek to complete the partial matrix $\omega$ to an $n\times n$ positive semi-definite matrix. When do such PSD completions exist, that is, when is $\mathcal{F}_{p}$ nonempty? Clearly, a necessary condition is that $\omega$ is a partial PSD matrix, meaning that its restriction to any specified principal submatrix is PSD. This condition, however, is not always sufficient.

A graph $G=(V,E)$ is called PSD* completable* if every partial PSD matrix $\omega\in{\bf R}^{E}$ is completable to a PSD matrix. It turns out that PSD completable graphs are precisely those that are chordal. Recall that a graph $G$ is called chordal if any cycle of four or more nodes has a chord – an edge joining any two nodes that are not adjacent in the cycle.

Theorem 5.1.1 (PSD completable and chordal graphs).

The graph $G$ is PSD completable if, and only if, $G$ is chordal.111If all the self-loops are not included in $E$ , one needs to add that the two subgraphs with self-loops and without are disconnected from each other, see e.g., [47].

Chordal graphs, or equivalently those that are PSD completable, play a fundamental role for the PSD completion problem. For example, on such graphs, the completion problem admits an efficient combinatorial algorithm [61, 83, 124].

Next, we turn to Slater’s condition. Consider the completion problem

[TABLE]

The question marks ? denote the unknown entries. The underlying graph on four nodes is a path and is therefore chordal. The known entries make up a partial PSD matrix since the three specified principal minors are PSD. Thus by Theorem 5.1.1, the completion problem is solvable. Does Slater’s condition hold? The answer is no. The first leading principal minor is singular, and therefore any PSD completion must be singular.

By the same logic, any singular specified principal minor of a partial matrix $\omega\in{\bf R}^{E}$ certifies that strict feasibility fails. Much more is true, however. We now show how any singular specified principal minor of a partial matrix $\omega\in{\bf R}^{E}$ yields a face of ${\bf S}^{n}_{+}$ containing the entire feasible region, allowing one to reduce the dimension of the problem.

To see how this can be done, let us introduce some notation. Define the coordinate projection map $\mathcal{P}_{E}\colon{\bf S}^{n}\to{\bf R}^{E}$ by setting

[TABLE]

In this notation, we can write

[TABLE]

We will now see how the geometry of the image set ${\mathcal{P}}_{E}({\bf S}^{n}_{+})$ , along with Theorem 4.4.2, helps us discover a face of ${\bf S}^{n}_{+}$ containing the feasible region. We note in passing that the image ${\mathcal{P}}_{E}({\bf S}^{n}_{+})$ is always closed.222If some elements $ii$ do not lie in $E$ , contrary to our simplifying assumption, then the image ${\mathcal{P}}_{E}({\bf S}^{n}_{+})$ can fail to be closed. A precise characterization is given in [47].

Proposition 5.1.2 (Closure of the image).

The image ${\mathcal{P}}_{E}({\bf S}^{n}_{+})$ is closed.

The reader can check that the adjoint $\mathcal{P}_{E}^{*}\colon{\bf R}^{E}\to{\bf S}^{n}$ simply pads partial matrices in ${\bf R}^{E}$ with zeros:

[TABLE]

For any subset of vertices $\alpha\subseteq V$ , we let

$E[\alpha]:=\{ij\in E:i,j\in\alpha\}$ be the edge set induced by $G$ on $\alpha$ and we set $\omega[\alpha]$ to be the restriction of $\omega$ to $E[\alpha]$ . Define the relaxed region

[TABLE]

Clearly $\alpha\subseteq V$ means we have fewer constraints and

[TABLE]

A subset $\alpha\subseteq V$ is called a clique if for any two nodes $i,j\in\alpha$ the edge $ij$ lies in $E$ . Specified principal minors of $\omega$ correspond precisely to cliques in the graph $G$ . We can moreover clearly identify ${\bf R}^{E[\alpha]}$ with the matrix space ${\bf S}^{k}$ . Suppose now that $\omega[\alpha]$ has rank $r<k$ , i.e., the principal submatrix of $\omega$ indexed by $\alpha$ is singular. Then the right-hand-side $\omega[\alpha]$ of (5.3) lies in the boundary of the image set ${\mathcal{P}}_{E[\alpha]}({\bf S}^{n}_{+})\subseteq{\bf S}^{k}_{+}$ . Let $V_{\alpha}$ be an exposing vector of $\operatorname{{face}}(\omega[\alpha],{\bf S}^{k}_{+})$ . Then by Theorem 4.4.2, we can be sure that $W_{\alpha}:={\mathcal{P}}_{\alpha}^{*}(V_{\alpha})$ exposes the minimal face of ${\bf S}^{n}_{+}$ containing the entire region $\mathcal{F}_{p}(\alpha)$ . Given a collection of cliques $\alpha_{1},\alpha_{2},\ldots,\alpha_{l}$ , we can perform the same procedure and deduce that the entire feasible region $\mathcal{F}_{p}$ lies in the face

[TABLE]

which by Proposition 2.2.5 admits the equivalent description

[TABLE]

The following example will clarify the strategy.

Example 5.1.3 (Reducing the PSD completion problem).

Let ${\mathcal{F}}_{p}$ consist of all matrices $X\in{\bf S}^{4}_{+}$ solving the PSD completion problem (5.1). There are three nontrivial cliques in the graph, all of size $2$ . The minimal face of ${\bf S}^{2}_{+}$ containing the matrix

[TABLE]

is exposed by

[TABLE]

Moreover, the matrix $\begin{bmatrix}1&-1\\ -1&\,\,\,2\\ \end{bmatrix}$ is definite and hence the minimal face of ${\bf S}^{2}_{+}$ containing this matrix is exposed by the all-zero matrix.

The intersection of exposed faces is exposed by the sum of their exposing vectors. We deduce that ${\mathcal{F}}_{p}$ is contained in the face of ${\bf S}^{4}_{+}$ exposed by the sum

[TABLE]

After finding the nullspace of this matrix, we deduce

[TABLE]

The following lemma is another nice consequence of the procedure described in the above example.

Lemma 5.1.4 (Completion of banded all ones matrices).

The matrix of all ones is the unique positive semi-definite matrix satisfying $X_{ij}=1$ for all indices with $|i-j|\leq 1$ .

Proof 5.1.5.

Consider the edge set $E=\{ij:|i-j|\leq 1\}$ and let $\omega\in{\bf R}^{E}$ be a partial matrix of all ones. Observe $\omega$ has $n$ specified $2\times 2$ -principal submatrices, each having rank 1. By the same logic as in Example 5.1.3, it follows that the feasible region ${\mathcal{F}}_{p}$ is zero-dimensional, as claimed.

The strategy outlined above suggests an algorithm for finding the minimal face based on exploiting cliques in the graph. This strategy is well-founded at least for chordal graphs.

Theorem 5.1.6 (Finding the minimal face on chordal graphs).

*Suppose that $G$ is chordal and consider a partial PSD matrix $\omega\in{\bf R}^{E}$ . Then the equality *

[TABLE]

*where $\Theta$ denotes the set of all maximal cliques in $G$ . *

On the other hand, it is important to realize that when the graph is not chordal, the minimal face $\operatorname{{face}}({\mathcal{F}}_{p},{\bf S}^{n}_{+})$ is not always guaranteed to be found from cliques alone. The following example shows a PSD completion problem that fails Slater’s condition but where all the faces arising from cliques are trivial.

Example 5.1.7 (Slater condition & nonchordal graphs, [47]).

Let $G=(V,E)$ be the graph with $V=\{1,2,3,4\}$ and $E=\{12,23,34,14\}\cup\{11,22,33,44\}$ . Define the corresponding PSD completion problems $C(\epsilon)$ , parametrized by $\epsilon\geq 0$ :

[TABLE]

Let $\omega(\epsilon)\in{\bf R}^{E}$ denote the corresponding partial matrices. From Lemma 5.1.4, the PSD completion problem $C(0)$ is infeasible, that is $\omega(0)$ lies outside of ${\mathcal{P}}_{E}({\bf S}^{4}_{+})$ . On the other hand, for all sufficiently large $\epsilon$ , the partial matrices $\omega(\epsilon)$ lie in ${\rm int\,}{\mathcal{P}}({\bf S}^{4}_{+})$ by diagonal dominance. Since ${\mathcal{P}}_{E}({\bf S}^{4}_{+})$ is closed by Proposition 5.1.2, we deduce that there exists $\hat{\epsilon}>0$ , so that $\omega(\hat{\epsilon})$ lies on the boundary of ${\mathcal{P}}_{E}({\bf S}^{4}_{+})$ , that is Slater’s condition fails for the completion problem $C(\hat{\epsilon})$ . In fact, it can be shown that the smallest such $\epsilon$ is $\hat{\epsilon}=\sqrt{2}-1$ with completion values of [math] in the $?$ positions in $C(\hat{\epsilon})$ . On the other hand, all specified principal matrices of $\omega(\epsilon)$ for $\epsilon>0$ are clearly positive definite, and therefore all the corresponding faces are trivial. Thus we have found a partial matrix $\omega(\hat{\epsilon})$ that has a singular completion but the minimal face cannot be found from an intersection using the cliques of the graph.

Given the importance of singularity degree, the following question arises naturally. Which graphs $G=(V,E)$ have the property that the cone ${\mathcal{P}}_{E}({\bf S}^{n}_{+})$ is facially exposed? Equivalently, on which graphs $G=(V,E)$ does every feasible PSD completion problem have singularity degree at most one? Let us make the following definition. The singularity degree of a graph $G=(V,E)$ is the maximal singularity degree among all completion problems with PSD completable partial matrices $\omega\in{\bf R}^{E}$ .

Chordal graphs have singularity degree one [47], and surprisingly these are the only graphs with this property [132].

Corollary 5.1.8 (Singularity degree of chordal completions).

The graph $G$ has singularity degree one if, and only if, $G$ is chordal.

5.2 Euclidean distance matrix completion, EDMC

In this section, we discuss a problem that is closely related to the PSD completion problem of the previous section, namely the Euclidean distance matrix completion, EDMC, problem. As we will see, the EDMC problem inherently fails Slater’s condition, and facial reduction once again becomes applicable by analyzing certain cliques in a graph.

Setting the stage, fix an undirected graph $G=(V,E)$ on a vertex set $V=\{1,2,\ldots,n\}$ with an edge set $E\subseteq\{ij:1\leq i<j\leq n\}$ . Given a partial matrix $d\in{\bf R}^{E}$ , the Euclidean distance matrix completion problem asks to determine if possible an integer $k$ and a collection of points $p_{1},\ldots,p_{n}\in{\bf R}^{k}$ satisfying

[TABLE]

See figure 5.1 for an illustration.

We now see how this problem can be modeled as an SDP. To this end, let us introduce the following notation. A matrix $D\in{\bf S}^{n}$ is called a Euclidean distance matrix, EDM, if there exists an integer $k$

and points $p_{1},\ldots,p_{n}\in{\bf R}^{k}$ satisfying

[TABLE]

Such points $p_{1},\ldots,p_{n}$ are said to realize $D$ in ${\bf R}^{k}$ . The smallest integer $k$ such that there exist points in ${\bf R}^{k}$ realizing $D$ is called the embedding dimension of $D$ , and is denoted by $\operatorname{{embdim}}D$ .

We let ${{\mathcal{E}}^{n}}$ denote the set of all $n\times n$ EDM matrices. In this language the EDM completion problem reads: given a partial matrix $d\in{\bf R}^{E}$ determine a matrix in the set

[TABLE]

Thus the EDM completion problem is a conic feasibility problem. Since we are interested in facial reduction, the facial structure of ${{\mathcal{E}}^{n}}$ is central.

Notice that ${{\mathcal{E}}^{n}}$ has empty interior since it is contained in the space of hollow matrices

[TABLE]

A fundamental fact often used in the literature is that ${{\mathcal{E}}^{n}}$ is linearly isomorphic to ${\bf S}^{n-1}$ . More precisely, consider the mapping

[TABLE]

defined by

[TABLE]

Clearly $\operatorname{{\mathcal{K}}}$ maps into the space of hollow matrices ${\bf S}^{n}_{H}$ . One can quickly verify that the adjoint is given by

[TABLE]

Moreover, the range of the adjoint $\operatorname{{\mathcal{K}}}^{*}$ is the space of centered matrices

[TABLE]

The following result is fundamental.

Theorem 5.2.1 (Parametrization of the EDM cone).

The map $\operatorname{{\mathcal{K}}}\colon{\bf S}^{n}_{c}\to{\bf S}^{n}_{H}$ is a linear isomorphism carrying ${\bf S}^{n}_{c}\cap{\bf S}^{n}_{+}$ onto ${{\mathcal{E}}^{n}}$ . The inverse $\operatorname{{\mathcal{K}}}^{\dagger}\colon{\bf S}^{n}_{H}\to{\bf S}^{n}_{c}$ is the map $\operatorname{{\mathcal{K}}}^{\dagger}(D)=-\frac{1}{2}JDJ$ , where $J:=I-\frac{1}{n}ee^{T}$ is the orthogonal projection onto $e^{\perp}$ .333In fact, we can consider the map as $\operatorname{{\mathcal{K}}}\colon{\bf S}^{n}\to{\bf S}^{n}$ . Then we still have $\operatorname{{\mathcal{K}}}({\bf S}^{n}_{+})={{\mathcal{E}}^{n}}$ and the Moore-Penrose pseudoinverse $\operatorname{{\mathcal{K}}}^{\dagger}(D)=-\frac{1}{2}J\operatorname{{offDiag}}\left(D\right)J$ , where $\operatorname{{offDiag}}$ zeros out the diagonal.

In particular, the cone ${{\mathcal{E}}^{n}}$ is linearly isomorphic to the cone ${\bf S}^{n}_{c}\cap{\bf S}^{n}_{+}$ . On the other hand, observe for any matrix $X\in{\bf S}^{n}_{+}$ the equivalence

[TABLE]

Thus ${{\mathcal{E}}^{n}}$ is linearly isomorphic to the face

[TABLE]

as claimed. More explicitly, forming an $n\times n$ orthogonal matrix $\begin{bmatrix}\frac{1}{\sqrt{n}}e&U\cr\end{bmatrix}$ yields the equality ${\bf S}^{n}_{c}\cap{\bf S}^{n}_{+}=U{\bf S}^{n-1}_{+}U$ .

Thus the EDM completion problem amounts to finding a matrix $X$ in the set

[TABLE]

To see how to recover the realizing points $p_{i}$ of $d\in{\bf R}^{E}$ , consider a matrix $X\in\mathcal{F}_{p}$ and form a factorization $X=PP^{T}$ for some matrix $P\in{\bf R}^{n\times k}$ with $k=\operatorname{{rank}}X$ . Let $p_{1},\ldots,p_{n}\in{\bf R}^{k}$ be the rows of $P$ . Then $X$ lying in ${\bf S}^{n}_{c}$ implies $\sum^{n}_{i=1}p_{i}=0$ , that is, the points $p_{i}$ are centered around the origin, while the constraint $\mathcal{P}_{E}\circ\operatorname{{\mathcal{K}}}(X)=d$ implies

[TABLE]

for all $ij\in E$ . Hence the points $p_{1},\ldots,p_{n}$ solve the EDM completion problem.

Let us turn now to understanding (strict) feasibility of $\mathcal{F}_{p}$ . A vector $d\in{\bf R}^{E}$ is called a partial EDM * if the restriction of $d$ to every specified principal submatrix is an EDM. The graph $G$ is EDM completable* if every partial EDM $d\in{\bf R}^{E}$ is EDM* completable*. The following result, proved in [15], is a direct analogue of Theorem 5.1.1 for the PSD completion problem.

Theorem 5.2.2 (EDM completability & chordal graphs).

The graph $G$ is EDM completable if, and only if, $G$ is chordal.

We also mention in passing the following observation from [47].

Theorem 5.2.3 (Closedness of the projected EDM cone).

The projected image ${\mathcal{P}}({{\mathcal{E}}^{n}})$ is always closed. ∎

Given a clique $\alpha$ in $G$ , we let ${\mathcal{E}}^{\alpha}$ denote the set of $|\alpha|\times|\alpha|$ Euclidean distance matrices indexed by $\alpha$ . In what follows, given a partial matrix $d\in{\bf R}^{E}$ , the restriction $d[{\alpha}]$ can then be thought of either as a vector in ${\bf R}^{E(\alpha)}$ or as a hollow matrix in ${\bf S}^{\alpha}$ . We also use the symbol $\operatorname{{\mathcal{K}}}_{\alpha}\colon{\bf S}^{\alpha}\to{\bf S}^{\alpha}$ to indicate the mapping $\operatorname{{\mathcal{K}}}$ acting on ${\bf S}^{\alpha}$ . The following recipe provides a simple way to discover faces of the PSD cone containing the feasible region from specified cliques in the graph.

Theorem 5.2.4 (Clique facial reduction for EDM completions).

Let $\alpha$ be any $k$ -clique in the graph $G$ . Let $d\in{\bf R}^{E}$ be a partial Euclidean distance matrix and define the relaxation

[TABLE]

Then for any matrix $V_{\alpha}$ exposing $\operatorname{{face}}\big{(}\operatorname{{\mathcal{K}}}^{{\dagger}}_{\alpha}(d[\alpha]),{\bf S}^{\alpha}_{+}\cap{\bf S}^{\alpha}_{c}\big{)}$ , the matrix

[TABLE]

In other words, the recipe is as follows. Given a clique $\alpha$ in $G$ , consider the matrix $\mathcal{K}^{{\dagger}}_{\alpha}(d[\alpha])\in{\bf S}^{\alpha}_{+}\cap{\bf S}^{\alpha}_{c}$ . Let $V_{\alpha}\in{\bf S}^{\alpha}_{+}\cap{\bf S}^{\alpha}_{c}$ be an exposing vector of $\operatorname{{face}}\big{(}\operatorname{{\mathcal{K}}}^{{\dagger}}_{\alpha}(d[\alpha]),{\bf S}^{\alpha}_{+}\cap{\bf S}^{n}_{c}\big{)}$ . Then ${\mathcal{P}}^{*}_{E[\alpha]}V_{\alpha}$ is an extension of $V_{\alpha}$ to ${\bf S}^{n}$ obtained by padding $V_{\alpha}$ with zeroes. The above theorem guarantees that the entire feasible region of (5.5) is contained in the face of ${\bf S}_{c}^{n}\cap{\bf S}^{n}_{+}$ exposed by ${\mathcal{P}}^{*}_{E[\alpha]}V_{\alpha}$ .

In direct analogy with Theorem 5.1.6 for PSD completions, the minimal face is sure to be discovered in this way for chordal graphs.

Theorem 5.2.5 (Clique facial reduction for EDM is sufficient).

Suppose that $G$ is chordal, and consider a partial Euclidean distance matrix $d\in{\bf R}^{E}$ and the region

[TABLE]

Let $\Theta$ denote the set of all maximal cliques in $G$ , and for each $\alpha\in\Theta$ define

[TABLE]

Then the equality

[TABLE]

Corollary 5.2.6 (Singularity degree of chordal completions).

If the graph $G=(V,E)$ is chordal, then the EDM completion problem has singularity degree at most one, when feasible.

Finally, in analogy with Corollary 5.2.7, the following is true. Define the singularity degree of a graph $G=(V,E)$ to be the maximal singularity degree among all EDM completion problems with EDM completable partial matrices $d\in{\bf R}^{E}$ .

Corollary 5.2.7 (Singularity degree of chordal completions).

The graph $G$ has singularity degree one if, and only if, $G$ is chordal.

5.2.1 EDM and SNL with exact data

The material above explains in part the surprising success of the algorithm in [76] for the sensor network localization problem, SNL.

The SNL problem differs from the EDM completion problem only in that some of the points or sensors $p_{i}$ that define the problem are in fact anchors and their positions are known. The algorithm proceeds by iteratively finding faces of the PSD cone from cliques and intersecting them two at a time, thereby decreasing the dimension of the problem in each step. In practice, this procedure often terminates with a unique solution of the problem. We should mention that the anchors are a red herring. Indeed, they should only be treated differently than the other sensors after all the sensors have been localized. In the post-processing step, a so-called Procrustes problem is solved to bring the putative anchors as close as possible to their original (known) positions and thus rotating the sensor positions appropriately. Another important point in applications is that the distances for sensors that are close enough to each other are often known. This suggests that there are often many local cliques in the graph. This means that the resulting SDP relaxation is highly degenerate but this degeneracy can be exploited as we have seen above.

Some numerical results from the year 2010 in [76] appear in Table 5.1.

These results are on random, noiseless problems using a 2.16 GHz Intel Core 2 Duo, 2 GB of RAM. The embedding dimension is $r=2$ and the sensors are in a square region $[0,1]\times[0,1]$ with $m=9$ anchors. We use the Root Mean Square Deviation to measure the quality of the solution:

[TABLE]

The huge expected number of constraints and variables in the four problems in Table 5.1 are

[TABLE]

respectively444The 2016 tarfile with MATLAB codes is available:

.

5.2.2 Extensions to noisy EDM and SNL problems

When there is noise in the distance measurements – the much more realistic setting – the approach requires an intriguing modification. Let us see what goes wrong, in the standard approach. Given a clique $\alpha$ , let us form $\operatorname{{\mathcal{K}}}^{{\dagger}}_{\alpha}(d[\alpha])$ as in Theorem 5.2.4. The difficulty is that this matrix is no longer PSD. On the other hand, it is simple enough to find the nearest matrix $W$ of ${\bf S}^{\alpha}_{+}\cap{\bf S}^{n}_{c}$ to $\operatorname{{\mathcal{K}}}^{{\dagger}}_{\alpha}(d[\alpha])$ . Let then $V_{\alpha}$ be a vector exposing $\operatorname{{face}}(W,{\bf S}^{\alpha}_{+}\cap{\bf S}^{n}_{c})$ . Letting $\Theta$ be the collection of cliques under consideration, we thus obtain faces $F_{\alpha}$ exposed by ${\mathcal{P}}^{*}_{E[\alpha]}V_{\alpha}$ for $\alpha\in\Theta$ . In the noiseless regime, the entire feasible region is contained in the intersection $\bigcap_{\alpha\in\Theta}F_{\alpha}$ . In the noisy regime, this intersection likely consists only of the origin for the simple reason that randomly perturbed faces typically intersect only at the origin. Here is an elementary fix that makes the algorithm robust to noise. Form the sum

[TABLE]

Again in the noiseless regime, Proposition 2.2.5 implies that $V$ exposes precisely the intersection $\bigcap_{\alpha\in\Theta}F_{\alpha}$ . When noise is present, the matrix $V$ will likely have only one zero eigenvalue corresponding to the vector of all ones $e$ and the rest of the eigenvalues will be strictly positive. Suppose we know that the realization of the graph should lie in $r$ -dimensional space. Then we can find a rank $n-{r+1}$ best PSD approximation of $V$ and use it to expose a face of the PSD cone. Under appropriate conditions, this procedure is indeed provably robust to noise and extremely effective in practice. A detailed description of such a scheme is presented in [46].

5.3 Low-rank matrix completions

In this section, we consider another example inspired by facial reduction. We will be considering matrices $Z\in{\bf R}^{m\times n}$ ; for convenience, we will index the rows of $Z$ by $i\in\{1,\ldots,m\}$ and the columns using $j\in\{m+1,\ldots,m+n\}$ . Consider two vertex sets $V_{1}=\{1,\ldots,m\}$ and $V_{2}:=\{m+1,\ldots,m+n\}$ and a bipartite graph $G=(V_{1}\cup V_{2},E)$ .

Given a partial matrix $z\in{\bf R}^{E}$ , the low-rank matrix completion problem, LRMC,

aims to find a rank $r$ matrix $Z$ from the partially observed elements $z$ . A common approach (with statistical guarantees) is to instead solve the convex problem:

[TABLE]

where $\|Z\|_{*}$ is the nuclear norm – the sum of the singular values of $Z$ . Throughout the section, we will make the following assumption: the solution of the convex problem (5.6) coincides with the rank $r$ matrix $Z$ that we seek. There are standard statistical assumptions that one makes in order to guarantee this to be the case [50, 117, 27].

It is known that this problem (5.6)

can be solved efficiently using SDP. At first glance it appears that this does not fit into our framework for problems where strict feasibility fails; indeed strict feasibility holds under the appropriate reformulation below. We will see, however, that one can exploit the special structure at the optimum and discover a face of the PSD cone containing an optimal solution, thereby decreasing the dimension of the problem. Even though, this is not facial reduction exactly, the ideas behind facial reduction play the main role.

Let us first show that the problem (5.6) can be written equivalently as the SDP:

[TABLE]

To see this, we recall a classical fact that the operator norm $\|\cdot\|_{2}$ of the matrix is dual to the nuclear norm $\|\cdot\|_{*}$ , that is

[TABLE]

Note the equivalence

[TABLE]

Thus we may represent the nuclear norm through an SDP:

[TABLE]

The dual of this SDP is

[TABLE]

Thus the problems (5.6) and (5.7) are indeed equivalent. Let us moreover make the following important observation. Suppose that $Z$ is optimal for (5.6). Let $Z=U\Sigma V^{T}$ be a compact SVD of $Z$ and set $A:=U\Sigma U^{T}$ and $B:=V\Sigma V^{T}$ . Then the triple $(Z,A,B)$ is feasible for (5.7) since

[TABLE]

Moreover $\operatorname{{rank}}(Y)=\operatorname{{rank}}(Z)$ and $\frac{1}{2}\operatorname{{tr}}(Y)=\frac{1}{2}(\operatorname{{tr}}(A)+\operatorname{{tr}}(B))=\operatorname{{tr}}(\Sigma)=\|Z\|_{*}$ . Thus $Y$ is optimal for (5.7).

Let us see now how we can exploit the structure and target rank $r$ of the problem to find an exposing vector of a face containing an optimal solution of the SDP. Fix two numbers $p,q>r$ and let $\alpha$ be any $p\times q$ complete bipartite subgraph of $G$ . Let also $z[\alpha]$ be the restriction of $z$ to $\alpha$ . Thus $z[\alpha]$ corresponds to a fully specified submatrix.

For almost any555This is in the sense of Lebesgue measure on the factors $P\in{\bf R}^{m\times r}$ , $Q\in{\bf R}^{n\times r}$ satisfying $Z=PQ^{T}$ . rank $r$ underlying matrix $Z$ , it will be the case that $\operatorname{{rank}}(z[\alpha])=r$ .

Without loss of generality, after row and column permutations if needed, we can assume that $\alpha$ encodes the bottom left corner of $Z$ :

[TABLE]

that is $\alpha=\{m-p+1,\ldots,m\}\times\{m+1,\ldots,m+q\}$ . Form now the factorization $z[\alpha]=\bar{P}\bar{D}\bar{Q}^{T}$ obtained using the compact SVD. Both $\bar{P},\bar{Q}$ have rank $r$ .

Let $Z=U\Sigma V^{T}$ be a compact SVD of $Z$ and define

[TABLE]

As we saw previously, $Y$ is optimal for the SDP (5.7). Subdividing $U$ and $V$ into two blocks each, we deduce

[TABLE]

Therefore, we conclude that $z[\alpha]=U_{2}\Sigma V_{1}^{T}=\bar{P}\bar{D}\bar{Q}^{T}$ . Taking into account that $z[\alpha]$ has rank $r$ and the matrices $U_{2},V_{1},\bar{P},\bar{Q}$ have exactly $r$ columns we deduce

[TABLE]

We can now use the exposing vector form of FR formed from $\bar{P}$ and/or $\bar{Q}$ . Using the calculated $\bar{P},\bar{Q}$ , let $\bar{E}\in{\bf R}^{p\times(p-r)}$ and $\bar{F}\in{\bf R}^{q\times(q-r)}$ satisfy $\operatorname{range}\bar{E}=(\operatorname{range}\bar{P})^{\perp}$ and $\operatorname{range}\bar{F}=(\operatorname{range}\bar{Q})^{\perp}$ . Define then the PSD matrix

[TABLE]

By construction $\overline{W}Y=0$ . Hence $\overline{W}$ exposes a face of the PSD cone containing the optimal $Y$ .

Performing this procedure for many specified submatrices $z\in{\bf R}^{E}$ , can yield a dramatic decrease in the dimension of the final SDP that needs to be solved. When noise is present, the strategy can be made robust in exactly the same way as for the EDM problem in Section 5.2.2.

We include one of the tables of numerics from [66] in Table 5.2, page 5.2. Results are for the average of five instances. We have recovered the correct rank $4$ each time without calling an SDP solver at all. Note that the largest matrices recovered have $(2,500)\times(20,000)=50,000,000$ elements.

5.4 Commentary

The work using chordal graphs for PSD completions was done in [61] and extended the special case of banded structure in [48]. Surveys for matrix completion are given in e.g., [70, 68, 4, 63, 32, 33, 71, 69]. A survey specifically related to chordality is given in [139]. More details on early algorithms for PSD completion are in e.g., [72, 4].

The origin of distance geometry problems can be traced back to the work of Grassmann in 1896 [60]. More recent work appeared in e.g., [59, 58, 34, 64, 45]. Many of these papers emphasized the relationships with molecular conformation. Chordality and relations with positive definiteness are studied in [73] and more recently in [80, 82]. The book [16] has a chapter on matrix completions with the connections to EDM completions, see also [134] for the relations with faces. An excellent online reference for EDM is the book by Dattorro [37]. In addition, there are many excellent survey articles, e.g., [77, 43, 87, 2, 97]. The survey [86] contains many open problems in EDMC and references for application areas.

Early work using SDP interior point algorithms for EDM completion problems is given in [3]. Exploiting the clique structure for SNL type problems is done in e.g., [42, 75, 76]. The improved robust algorithm based on averaging approximate exposing vectors was developed in [46], while a parallel viewpoint based on rigidity theory was developed in [126]. In fact, a parallel view on facial reduction is based on rigidity theory, e.g., [55, 31, 1, 56]. The facial structure for the EDM cone is studied in e.g., [5, 133]. Applications of the technique to molecular conformation are in [7].

The LRMC problem has parallels in the compressed sensing framework that is currently of great interest. The renewed interest followed the work in [51, 50, 27, 117] that used the nuclear norm as a convex relaxation of the rank function. Exploiting the structure of the optimal face using FR is introduced recently in [66]. An alternative approach, which applies much more broadly, is described in [112].

Chapter 6 Hard combinatorial problems

6.1 Quadratic assignment problem, QAP

The quadratic assignment problem, QAP, is arguably the hardest of the

so-called NP-hard combinatorial optimization problems. The problem can best be described in terms of facility location. We have $n$ given facilities that need to be located among $n$ specified locations. As input data, we have information on the distances $D_{ij}$ between pairs of locations $i,j$ and the flow values (weights) $F_{st}$ between pairs of facilities $s,t$ . The (quadratic) cost of a possible location is the flow between each pair of facilities multiplied by the distance between their assigned locations. Surprisingly, problems of size $n\geq 30$ are still considered hard to solve. As well, we can have a (linear) cost $C_{kl}$ of locating facility $k$ in location $l$ . The unknown variable that decides which facility goes into which location is an $n\times n$ permutation matrix $X=(X_{kl})\in\Pi$ with

[TABLE]

This problem has the elegant trace formulation

[TABLE]

Notice that the objective is a quadratic function, and typically the quadratic form, $\operatorname{{tr}}(FXDX^{T})$ , is indefinite.111One can perturb the objective function by exploiting the structure of the permutation matrices and obtain positive definiteness of the quadratic form. However, this can result in deterioration of the bounds from any relaxations.

Notice also that the feasible region consists of permutation matrices, a discrete set. There is a standard strategy for forming a semi-definite programming relaxation for such a problem. Consider the vectorization $x:=\operatorname{{vec}}(X)\in{\bf R}^{n^{2}}$ and define the lifting to the rank one block matrix

[TABLE]

where the matrix consists of $n^{2},n\times n$ blocks $X_{:,i}X_{:,j}^{T}$ beginning in row and column $2$ . The idea is then to reformulate the objective and a relaxation of the feasible region linearly in terms of $Y$ , and then simply insist that $Y$ is PSD, though not necessarily rank one. In particular, the objective function can easily be rewritten as a linear function of $Y$ , namely $\operatorname{{tr}}(LY)$ , where

[TABLE]

and we denote the Kronecker product, $D\otimes F$ .

Next we turn to the constraints. We seek to replace the set of permutation matrices by more favorable constraints that permutation matrices satisfy. For example, observe that the permutation matrices are doubly stochastic and hence the row sums and column sums are one, yielding the following linear assignment constraints

[TABLE]

There are of course many more possible constraints one can utilize; the greater their number, even if redundant, the tighter the SDP relaxation in general. Some prominent ones, including the ones above, are

[TABLE]

where $\circ$ denotes the Hadamard (elementwise) product. Note that including both equivalent orthogonality constraints $XX^{T}=X^{T}X=I$ is not redundant in the relaxations.

Let us see how to reformulate the constraints linearly in $Y$ . We first consider the $n$ linear row sum constraints $(Xe-e)_{i}=0$ in (6.2a). To this end, observe

[TABLE]

We obtain

[TABLE]

Defining now the matrix

[TABLE]

we obtain the equivalent linear homogeneous constraint $\operatorname{{tr}}(YE_{r}E_{r}^{T})=0$ . Similarly the $n$ linear column sum constraints $(X^{T}e-e)_{i}=0$ amount to the equality $\operatorname{{tr}}(YE_{c}E_{c}^{T})=0$ , where

[TABLE]

Thus the feasible region of the SDP relaxation lies in the face of ${{\bf S}^{n^{2}+1}_{+}}$ exposed by $D_{0}:=E_{r}E_{r}^{T}+E_{c}E_{c}^{T}$ . Henceforth, let $\widehat{V}$ be full column rank and satisfying $\operatorname{{range}}(\widehat{V})=\operatorname{{null}}(D_{0})$ .

The other three constraints in (6.2) can be rephrased linearly in $Y$ as well: (6.2b) results in the so-called arrow constraint (the first row (column) and the diagonal of $Y$ are equal); the constraints (6.2c) yield the block-diagonal constraint (diagonal blocks sum to the identity matrix) and the off-diagonal contraint (the traces of the off-diagonal blocks are zero); and the Hadamard orthogonality constraints (6.2d) are called the gangster constraints and guarantee that the diagonal blocks are diagonal matrices and the diagonals of the off-diagonal blocks are zero. We omit further details222See more details in [149]. but denote the resulting constraints with the additional $Y_{00}=1$ in the form ${\mathcal{A}}(Y)=b$ . We note that the transformation ${\mathcal{A}}$ without the Hadamard orthogonality constraints is onto while ${\mathcal{A}}$ is not. We numerically test both settings with and without the gangster constraints and together with and without facial reduction below in this section.

Now the standard relaxation of the problem is obtained by letting $Y$ be a positive semi-definite matrix with no constraint on its rank:

[TABLE]

All in all, the number of linear constraints is

[TABLE]

i.e.,

[TABLE]

As discussed above, the matrix $D_{0}$ certifies that this relaxation fails strict feasibility. Indeed the entire feasible region lies in the face of ${\bf S}^{n^{2}+1}_{+}$ exposed by $D_{0}$ . Surprisingly, after restricting to the face exposed by $D_{0}$ , the constraints $\mathcal{A}(Y)=b$ simplify dramatically. The resulting equivalent formulation becomes

[TABLE]

where $e_{0}$ is the first unit vector, as we start indexing at [math], and

[TABLE]

and $\bar{\mathcal{J}}$ is an appropriately defined index set; see [149]. Roughly speaking, this index set guarantees that the diagonal blocks of $Y=\widehat{V}R\widehat{V}^{T}$ are diagonal matrices and the diagonal elements of the off-diagonal blocks of $Y$ are all zero. In particular, one can show that the resulting linear constraint is surjective.

In fact, the gangster operator and gangster constraints guarantee that most of the Hadamard product constraints in (6.2d) hold. And the constraints corresponding to the linear constraints in (6.2a), the arrow constraint in (6.2b), the block-diagonal and off-diagonal constraints in (6.2c) and some of the gangster constraints in (6.2d) have all become redundant, thereby illuminating the strength of the facial reduction together with the gangster constraints (6.2d).

Moreover, we can rewrite the linear constraints in (6.4) as

[TABLE]

We see that these low rank333These constraints are rank two. Low rank constraints can be exploited in several of the current software packages for SDP. constraints are linearly independent and the number of constraints has been reduced from $m_{A}=n^{3}+\frac{n^{2}}{2}+\frac{n}{2}+2$ to

[TABLE]

i.e., the number of constraints is still $O(n^{3})$ but has decreased by ${1+\frac{5n^{2}+n}{2}}$ .

Finally, we should mention that the singularity degree of the SDP relaxation of the QAP is $d=1$ . The problem (6.4) has a strictly feasible point $\hat{R}$ . Moreover, one can show that the dual of (6.4) also has a strictly feasible point, see [149, 100].

Let us illustrate empirically the improvement in accuracy and cputime for the facially reduced SDP relaxation of the QAP. We use the model in (6.3) and compare it to the simplified facially reduced model in (6.4). See Figure 6.1, page 6.1, and Figure 6.2, page 6.2. The improvement in accuracy and cputime is evident.

6.2 Second lift of Max-Cut

Recall that for a given weighted undirected graph $G=(V,E,W)$ , the maximum cut problem is to determine a vertex set $S$ such that the total weight of the edges between $S$ and its complement $S^{c}$ is as large as possible. Thus enumerating the vertices $V=\{1,\ldots,n\}$ , we are interested in the problem

[TABLE]

Here, we have $x_{i}=1$ for $i\in S$ and $x_{i}=-1$ for $i\notin S$ . Notice the constraints $x_{i}\in\{\pm 1\}$ can equivalently be written with the quadratic constraint

[TABLE]

Relaxing $xx^{T}$ to a positive semi-definite matrix $X$ , we arrive at the celebrated SDP relaxation of Max-Cut:

[TABLE]

Here $L$ denotes the weighted Laplacian matrix of the graph, which will not play a role in our discussion. This SDP is clearly strictly feasible.

Another idea now to improve the accuracy of the relaxation is to “extend the lifting”. Namely, with the goal of tightening the approximation to the original Max-Cut problem, we can certainly add the following quadratic constraints to the SDP relaxation:

[TABLE]

Let us see how to form a relaxation with these nonlinear constraints.

For $X\in{\bf S}^{n}$ , let $\operatorname{{s2vec}}(X)$ denote the vector formed from the upper triangular part of $X$ taken columnwise with the strict upper triangular part multiplied by $\sqrt{2}$ . By abuse of notation, we let $x=\operatorname{{s2vec}}(X)$ and define the matrix $Y=\begin{pmatrix}1\\ x\end{pmatrix}\begin{pmatrix}1\\ x\end{pmatrix}^{T}$ . We can now form a new SDP relaxation by insisting that $Y$ is PSD (though not rank one) and rewriting the constraints linearly in $Y$ . The nonlinear constraints (6.7) can indeed be written linearly in $Y$ ; we omit the details. On the other hand, note that the $i$ -th constraint in the original SDP relaxation (6.6) is equivalent to

[TABLE]

Exactly, the same way as in Section 6.1, we can define the matrix

[TABLE]

which certifies that strict feasibility fails and that the entire feasible region lies in the face of ${\bf S}^{\frac{n(n+1)}{2}+1}_{+}$ exposed by $EE^{T}$ . It turns out that this second lift of max-cut, in practice, provides much tighter bounds than the original SDP relaxation (6.6), and the elementary facial reduction step using $EE^{T}$ serves to stabilize the problem.

6.3 General semi-definite lifts of combinatorial problems

Let us next look at a general recipe often used for obtaining SDP relaxations of NP-hard problems; elements of this technique were already used in the previous sections. Consider a nonconvex feasible region of the form

[TABLE]

where $\mathcal{A}\colon{\bf S}^{n}\to{\bf R}^{m}$ is a linear transformation. An SDP relaxation of this region is the set

[TABLE]

Indeed, $\mathcal{F}$ is the image of a linear projection of the intersection of $\widehat{\mathcal{F}}$ with rank one matrices. For this reason $\widehat{\mathcal{F}}$ is often called an SDP* lift* of $\mathcal{F}$ .

In applications, such as the ones in the previous sections, the affine hull of $\mathcal{F}$ may not be full dimensional. For example, the affine hull of the set of permutation matrices (used for QAP) has empty interior. To this end, suppose that the affine hull of $\mathcal{F}$ is given by $\{x:Lx=l\}$ , where $L$ is a linear transformation and $l$ a vector. Define the matrix $\widehat{L}=\begin{bmatrix}-l&L\\ \end{bmatrix}$ . Then clearly there is no harm in including the redundant constraint

[TABLE]

in the very definition of $\mathcal{F}$ . Notice then $\widehat{\mathcal{F}}$ is clearly contained in the face of ${\bf S}^{n}_{+}$ exposed by $\widehat{L}^{T}\widehat{L}$ . Indeed, this is the minimal face of ${\bf S}^{n}_{+}$ containing $\widehat{\mathcal{F}}$ . To see this, suppose that the affine span of $\mathcal{F}$ has dimension $d$ , and consider any affinely independent vectors $x_{1},\ldots,x_{d+1}\in\mathcal{F}$ . Then the vectors $\begin{pmatrix}1\\ x_{1}\end{pmatrix},\begin{pmatrix}1\\ x_{2}\end{pmatrix},\ldots,\begin{pmatrix}1\\ x_{d+1}\end{pmatrix}$ are linearly independent, and therefore the barycenter

[TABLE]

is a rank $d+1$ matrix lying in $\widehat{\mathcal{F}}$ . On the other hand, it is immediate that the face of ${\bf S}^{n}_{+}$ exposed by $\widehat{L}^{T}\widehat{L}$ also has dimension $d+1$ . The claimed minimality follows. It is curious to note that if the constraint (6.8) were not explicitly included in the definition $\mathcal{F}$ , then the SDP lift $\widehat{\mathcal{F}}$ could nevertheless be strictly feasible, and hence unnecessarily large.

Example 6.3.1 (Strictly feasible SDP lifts).

Consider the region:

[TABLE]

There are only four feasible points, namely $\left\{\pm\begin{pmatrix}1\\ 1\\ -1\end{pmatrix},\pm\begin{pmatrix}1\\ -1\\ -1\end{pmatrix}\right\}$ , and they affinely span the two dimensional subspace perpendicular to the vector $\begin{pmatrix}1&0&1\end{pmatrix}^{T}$ . If this constraint is not included explicitly, then the SDP lift is given by

[TABLE]

In particular, the identity matrix is feasible.

6.4 Elimination method for sparse SOS polynomials

Checking whether a polynomial is always nonnegative is a ubiquitous task in computational mathematics. This problem is NP-hard, as it for example encompasses a great variety of hard combinatorial problems. Instead a common approach utilizes sum of squares formulations. Indeed, checking whether a polynomial is a sum of squares of polynomials can be modeled as an SDP . A certain hierarchy of sum of squares problems [78, 103] can then be used to determine the nonnegativity of the original polynomial. The size of the SDP arising from a sum of squares problem depends on the number of monomials that must be used in the formulation. In this section, we show how facial reduction iterations on the cone of sums of squares polynomials can be used to eliminate monomials yielding a smaller and better conditioned equivalent SDP formulation. A rigorous explanation of the material in this section requires some heavier notation; therefore we only outline the techniques.

Let ${\bf R}[x]_{n,2d}$ denote the vector space of polynomials in $n$ variable with real coefficients of degree at most $2d$ . We will write a polynomial $f\in{\bf R}[x]_{n,2d}$ using multi-index notation

[TABLE]

where $N$ is some subset of $\mathbb{N}^{n}$ , we set $x^{\alpha}=x_{1}^{\alpha_{1}}\cdots x_{n}^{\alpha_{n}}$ , and $c_{\alpha}$ are some real coefficients. We will think of ${\bf R}[x]_{n,2d}$ as a Euclidean space with the inner product being the usual dot product between coefficient vectors. Let $\Sigma_{n,2d}\subseteq{\bf R}[x]_{n,2d}$ be the set of polynomials $f\in{\bf R}[x]_{n,2d}$ that are sums of squares, meaning that $f$ can be written as $\sum_{i}f_{i}^{2}$ for some polynomials $f_{i}$ . Clearly $\Sigma_{n,2d}\subseteq{\bf R}[x]_{n,2d}$ is a closed convex cone, often called the SOS cone.

A fundamental fact is that membership in the SOS cone $\Sigma_{n,2d}$ can be checked by solving an SDP .

Theorem 6.4.1.

Fix a set of monomials $M\subset\mathbb{N}^{n}$ . Then a polynomial $f\in{\bf R}[x]_{n,2d}$ is a sum of squares of polynomials over the monomial set $M$ if and only if there exists a matrix $Q\succeq 0$ so that $f(x)=[x]_{M}^{T}Q[x]_{M}$ , where $[x]_{M}$ is a vector of monomials in $M$ .

Proof 6.4.2.

If $f$ is a sum of squares $f=\sum_{i}f_{i}^{2}$ , then we can form a matrix $P$ whose rows are the coefficient vectors of $\{f_{i}\}_{i}$ . Then $Q=P^{T}P$ is the PSD matrix we seek. Conversely, given a PSD matrix $Q$ satisfying $f(x)=[x]_{M}^{T}Q[x]_{M}$ , we can form a factorization $Q=P^{T}P$ , and read off the coefficients of each polynomial $f_{i}$ from the rows of $P$ .

Notice that the relation $f=[x]_{M}^{T}Q[x]_{M}$ can be easily rewritten as a linear relation on $Q$ by matching coefficient of the left and right-hand-sides. The size of $Q$ is completely dictated by the number of monomials.

More generally, instead of certifying whether a polynomial is SOS, we mught be interested in minimizing a linear functional over an affine slice of the SOS cone. More precisely, consider a problem of the form:

[TABLE]

where $u\in{\bf R}^{m}$ is the decision variable, $g_{i}\in{\bf R}[x]_{n,2d}$ are specified polynomials and $w\in{\bf R}^{m}$ is a fixed vector. Clearly this problem can be converted to an SDP. The size of the decision matrix $X$ is determined by the number of monomials. Parsing algorithms attempt to choose a (small) set of monomials $M$ so that every feasible $f$ for (6.9) can be written as a sum of squares over the monomial set $M$ , thereby decreasing the size of the SDP. Not surprisingly, some parsing strategies can be interpreted as facial reduction iterations on (6.9).

We next outline such a strategy closely following [111]. To this end, we must first explain which faces of the SOS cone $\Sigma_{n,2d}$ correspond to eliminating monomials. Indeed, there are faces of $\Sigma_{n,2d}$ that do not have such a description.

To answer this question, we will need extra notation. Henceforth, fix a set of monomials $M\subseteq\mathbb{N}^{n}$ and set $d:=\max\{\sum_{i=1}^{n}z_{i}:z\in M\}$ to be the maximal degree of monomials in $M$ . Let $\Sigma(M)$ be the set of all polynomials that can be written as sums of squares over the monomial set $M$ . Finally, set $M^{+}$ to be the set of points in $M$ that are not midpoints of any points in $M$ , namely

[TABLE]

Let us now look two types of faces that arise from elimination of monomials.

Theorem 6.4.3 (Type I face).

If equality,

[TABLE]

holds, then $\Sigma(M)$ is a face of $\Sigma_{n,2d}$ .

In other words, if the convex hull $\mbox{\rm conv}(M)$ contains no grid points other than those already in $M$ , then $\Sigma(M)$ is a face of $\Sigma_{n,2d}$ .

Theorem 6.4.4 (Type II face).

If $\Sigma(M)$ is a face of $\Sigma_{n,2d}$ , then $\Sigma(M\setminus\beta)$ is a face of $\Sigma_{n,2d}$ for any $\beta\in M^{+}$ .

Thus given a face $\Sigma(M)$ , we can recursively make the face smaller by deleting any $\beta\in M^{+}$ .

Let us now turn to facial reduction. In the first step of facial reduction for (6.9), we must find an exposing vector $v\in\Sigma_{n,2d}^{*}$ that is orthogonal to all the affine constraints. Doing so in full generality is a difficult proposition. Instead, let us try to replace $\Sigma_{n,2d}^{*}$ by a polyhedral inner approximation. Then the search for $v$ is a linear program.

Theorem 6.4.5.

The polyhedral set

[TABLE]

satisfies $\Lambda\subseteq\Sigma(M)^{*}.$

Thus if we can find $v\in\Lambda$ that is orthogonal to the affine constraints (6.9), then we can use $v$ to expose a face of $\Sigma_{n,2d}$ containing the feasible region. Remarkably, this face can indeed be represented by eliminating monomials from $M$ .

Theorem 6.4.6.

Consider a vector

[TABLE]

for some $p\in\Sigma(M)^{\perp}$ and nonnegative numbers $\lambda_{2\alpha}\geq 0$ . Define the monomial set $\mathcal{I}:=\{\alpha\in M^{+}:\lambda_{2\alpha}>0\}$ . Then the face $\Sigma(M)\cap v^{\perp}$ coincides with $\Sigma(M\setminus\mathcal{I})$ .

Thus we can inductively use this procedure to eliminate monomials. At the end, one would hope that we would be left with a small dimensional SDP to solve. Promising numerical results and further explanations of methods of this type can be found in [110, 111, 74, 141, 140].

6.5 Commentary

Quadratic assignment problem, QAP

Many survey articles and books have appeared on the QAP, e.g., [101, 102, 28, 89]. More recent work on implementation of SDP relaxations include [41, 100, 150]. That the quadratic assignment problem is NP-hard is shown in [125]. The elegant trace formulation we used was introduced in [49].

The classic Nugent test set for QAP is given in [99].444It is maintained within QAPLIB [26] currently online. These problems have proven to be extremely hard to solve to optimality, see e.g., [28]. The difficulty of these problems is illustrated in the fact that many of them were not solved for $30$ odd years, see e.g., [13].

The semi-definite relaxation described here was introduced in [149]. It was derived by using the Lagrangian relaxation after modelling the permutation matrix constraint by various quadratic constraints. The semi-definite relaxation is then the dual of the Lagrangian relaxation, i.e., the dual of the dual. Application of FR then results in the surprisingly simplified gangster operator formulation.

This gangster formulation along with symmetry for certain QAP models is exploited in [41, 40] to significantly increase the size of QAP problems that can be solved. Other relaxations of QAP based on e.g., eigenvalue bounds are studied in e.g., [52, 53, 14].

Graph partitioning, GP

The graph partitioning, GP, problem is very similar to the QAP in that it involves a trace quadratic objective with a $0,1$ matrix variable $X$ , i.e. the matrix whose columns are the incidence vectors of the sets for the partition. A similar successful SDP relaxation can be found [147]. More recently, successful bounding results have been found in [113, 130, 38].

Second lift of Max-Cut

The second lifting from Section 6.2 is derived in [11, 12, 10] but in a different way, i.e. using the nullspace of the barycenter approach. The bounds found were extremely successful and, in fact, found the optimal solution of the MC in almost all but very special cases. The algorithm used for the SDP relaxation was the spectral bundle approach [65] and only problems of limited size could be solved. More recently an ADMM approach was much more successful in solving larger problems in [131].

Lifts of combinatorial problems

The SDP lifting of combinatorial regions described in Section 6.3 is standard; see for [137] for many examples and references. The material on the minimal face of the SDP lift follows [135], though our explanation here is stated in dual terms, i.e. using exposing vectors.

Monomial elimination from SOS problems

The topic of eliminating monomials from sum of squares problems has a rich history. The section in the current text follows entirely the exposition in [111]. The technique of solving linear programs in order to approximate exposing vectors was extensively studied in [110]. Important earlier references on monomial elimination include [74, 141, 140]. For an exposition of how to use SOS hierarchies to solve polynomial optimization problems see the monograph [79].

Acknowledgements.

We would like to thank Jiyoung (Haesol) Im for her helpful comments and help with proofreading the manuscript. Research of the first author was partially supported by the AFOSR YIP award FA9550-15-1-0237. Research of the second author was supported by The Natural Sciences and Engineering Research Council of Canada.

Index

$\widehat{V}$ §6.1
$\circ$ §6.1
$\|Z\|_{*}$ , nuclear norm §5.3
adjoint §2.1
adjoint mapping §2.1
anchors §5.2.1
arrow constraint §6.1
$\mbox{\rm bd}\,C$ §2.1
block-diagonal §6.1
centered matrices, ${\bf S}^{n}_{c}$ §5.2
chordal §5.1
$\mbox{\rm cl}\,C$ §2.1
clique §5.1
complementary slackness Proposition 2.3.1
conic expansion approach §4.6
conjugate face, $F^{\triangle}:=\mathcal{K}^{*}\cap F^{\perp}$ Definition 2.2.7
constraint qualifications §2.3
convex cone §2.1
coordinate projection, $\mathcal{P}_{E}$ §5.1
$D\otimes F$ , Kronecker product §6.1
dual cone §2.1
$E[\alpha]:=\{ij\in E:i,j\in\alpha\}$ §5.1
EDM completable §5.2
EDM, Euclidean distance matrix §5.2
EDMC, Euclidean distance matrix completion §5.2
$\operatorname{{embdim}}D$ , embedding dimension of $D$ §5.2
embedding dimension of $D$ §5.2
Euclidean distance matrix completion, EDMC §5.2
Euclidean distance matrix, EDM §5.2
exposed face Definition 2.2.3
exposing vector Definition 2.2.3, §5.3
$F\unlhd\mathcal{K}$ , face of a cone Definition 2.2.1
$F^{\triangle}:=\mathcal{K}^{*}\cap F^{\perp}$ , conjugate face Definition 2.2.7
face of a cone, $F\unlhd\mathcal{K}$ Definition 2.2.1
$\operatorname{{face}}(S,\mathcal{K})$ , minimal face §2.2
facial reduction (FR) Chapter 1
facially exposed §2.2, Example 2.2.9
$\mathcal{F}_{p}$ §5.1
$\mathcal{F}_{p}(\alpha)$ 5.3
FR, facial reduction Chapter 1
gangster constraints §6.1
gangster operator §6.1, §6.5
graph §5.1
graph partitioning, GP §6.5
Hadamard (elementwise) product §6.1
Hölder error bound §4.5
hollow matrices, ${\bf S}^{n}_{H}$ §5.2
$\mbox{\rm int}\,C$ §2.1
$K^{*}$ , dual cone §2.1
$K^{*}$ , polar cone §2.1
$\mathcal{K}^{**}:=(\mathcal{K}^{*})^{*}$ , second polar cone §2.1
Kronecker product, $D\otimes F$ §6.1
Lagrangian function §2.3
lifting §6.1
linear assignment constraints §6.1
linear program, LP Chapter 1
low-rank matrix completion problem, LRMC §5.3
LP, linear program Chapter 1
LRMC, low-rank matrix completion problem §5.3
Mangasarian-Fromovitz Constraint Qualification (MFCQ) Definition 3.2.1
minimal face, $\operatorname{{face}}(S,\mathcal{K})$ §2.2
Moore-Penrose pseudoinverse footnote 3
nuclear norm §5.3
off-diagonal §6.1
optimal face Part II
partial order §2.1
$\mathcal{P}_{E}$ , coordinate projection §5.1
perdiagonal §2.4
permutation matrix $X=(X_{kl})\in\Pi$ §6.1
perturbed problems Example 2.3.7
positive semi-definite, PSD §4.6
preprocessing Chapter 1
preprocessing step §4.1
primal-dual pair §2.3
proper convex cone §2.1
proper face §2.2
PSD completion problem §5.1
PSD, positive semi-definite §4.6
QAP, quadratic assignment problem §6.1
quadratic assignment problem, QAP §6.1
${\bf R}^{E[\alpha]}$ §5.1
${\bf R}^{n}_{+}$ Example 2.1.3
Robinson regularity condition §3.4
$\operatorname{{s2vec}}(X)$ §6.2
self-replicating Example 2.2.8, Example 2.2.9
sensor network localization problem, SNL §5.2.1
sensors §5.2.1
${\rm sing}\textbf{(P)}$ , singularity degree of (P) Definition 4.2.1
singularity degree of (P), ${\rm sing}\textbf{(P)}$ Definition 4.2.1
singularity degree of a graph §5.1, §5.2
${\bf S}^{n}_{c}$ , centered matrices §5.2
${\bf S}^{n}_{H}$ , hollow matrices §5.2
SNL, sensor network localization problem §5.2.1
specified submatrix, $z\in{\bf R}^{p\times q}$ §5.3
strict feasibility §2.3
trace inner product §2.1
vectorization $x:=\operatorname{{vec}}(X)\in{\bf R}^{n^{2}}$ §6.1
weak duality inequality §2.3
$\omega[\alpha]$ §5.1

Bibliography150

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Alfakih. Graph rigidity via Euclidean distance matrices. Linear Algebra Appl. , 310(1-3):49–165, 2000.
2[2] A. Alfakih, M.F. Anjos, V. Piccialli, and H. Wolkowicz. Euclidean distance matrices, semidefinite programming, and sensor network localization. Portug. Math. , 68(1):53–102, 2011.
3[3] A. Alfakih, A. Khandani, and H. Wolkowicz. Solving Euclidean distance matrix completion problems via semidefinite programming. Comput. Optim. Appl. , 12(1-3):13–30, 1999. A tribute to Olvi Mangasarian.
4[4] A. Alfakih and H. Wolkowicz. Matrix completion problems. In Handbook of semidefinite programming , volume 27 of Internat. Ser. Oper. Res. Management Sci. , pages 533–545. Kluwer Acad. Publ., Boston, MA, 2000.
5[5] A.Y. Alfakih. A remark on the faces of the cone of Euclidean distance matrices. Linear Algebra Appl. , 414(1):266–270, 2006.
6[6] B. Alipanahi, N. Krislock, A. Ghodsi, H. Wolkowicz, L. Donaldson, and M. Li. Determining protein structures from NOESY distance constraints by semidefinite programming. J. Comput. Biol. , 20(4):296–310, 2013.
7[7] B. Alipanahi, N. Krislock, A. Ghodsi, H. Wolkowicz, L. Donaldson, and M. Li. Determining protein structures from NOESY distance constraints by semidefinite programming. J. Comput. Biol. , 20(4):296–310, 2013.
8[8] E.D. Andersen and K.D. Andersen. The Mosek interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization , volume 33 of Appl. Optim. , pages 197–232. Kluwer Acad. Publ., Dordrecht, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

The many faces of degeneracy

Abstract

Contents

List of Figures

List of Tables

Chapter 1 What this paper is about

1.1 Related work

1.2 Outline of the paper

1.3 Reflections on Jonathan Borwein and FR

Part I Theory

Chapter 2 Convex geometry

2.1 Notation

Example 2.1.1** (Adjoints of mappings between Rn{\bf R}^{n}Rn and Rm{\bf R}^{m}Rm).**

Example 2.1.2** (Adjoints of mappings between Sn{\bf S}^{n}Sn and Rm{\bf R}^{m}Rm).**

Example 2.1.3** (The nonnegative orthant R+n{\bf R}^{n}_{+}R+n​).**

Example 2.1.4** (The positive semi-definite cone S+n{\bf S}^{n}_{+}S+n​).**

Lemma 2.1.5** (Self-duality).**

Proof 2.1.6**.**

Lemma 2.1.7** (Dual cone of a sum).**

Lemma 2.1.8** (Double dual).**

2.2 Facial geometry

Definition 2.2.1** (Faces).**

Definition 2.2.2** (Minimal face).**

Definition 2.2.3** (Exposed faces).**

Example 2.2.4** (Nonexposed faces).**

Proposition 2.2.5** (Exposing the intersection of exposed faces).**

Proof 2.2.6**.**

Definition 2.2.7** (Conjugate face).**

Example 2.2.8** (Faces of R+n{\bf R}^{n}_{+}R+n​).**

Example 2.2.9** (Faces of S+n{\bf S}^{n}_{+}S+n​).**

2.3 Conic optimization problems

Proposition 2.3.1** (Complementary slackness).**

Definition 2.3.2** (Strict feasibility/Slater condition).**

Theorem 2.3.3** (Strong duality).**

Example 2.3.4** (Infinite gap).**

Example 2.3.5** (Positive duality gap).**

Example 2.3.6** (Zero duality gap, but no attainment).**

Example 2.3.7** (Convergence to dual optimal value).**

2.4 Commentary

Chapter 3 Virtues of strict feasibility

3.1 Theorem of the alternative

Theorem 3.1.1** (Hyperplane separation theorem).**

Theorem 3.1.2** (Homogeneous separation).**

Proof 3.1.3**.**

Theorem 3.1.4** (Theorem of the alternative for the primal).**

Proof 3.1.5**.**

Theorem 3.1.6** (Theorem of the alternative for the dual).**

Proof 3.1.7**.**

3.2 Stability of the solution

Definition 3.2.1** (Mangasarian-Fromovitz CQ).**

Theorem 3.2.2** (Directional derivative of the value function).**

Lemma 3.2.3** (Condition number and MFCQ).**

Proof 3.2.4**.**

Theorem 3.2.5** (Boundedness of the dual solution set).**

Proof 3.2.6**.**

Theorem 3.2.7** (Stability of the solution set).**

3.3 Distance to infeasibility

Definition 3.3.1** (Distance to infeasibility).**

Theorem 3.3.2** (Strict feasibility and distance to infeasibility).**

3.4 Commentary

Chapter 4 Facial reduction

4.1 Preprocessing in linear programming

Example 4.1.1** (primal facial reduction).**

Example 4.1.2** (dual facial reduction).**

4.2 Facial reduction in conic optimization

Definition 4.2.1** (Singularity degree).**

4.3 Facial reduction in semi-definite programming

Example 4.3.1** (Singularity degree larger than one).**

4.4 What facial reduction actually does

Proposition 4.4.1** (Range space characterization of Slater).**

Theorem 4.4.2** (Fundamental description of the minimal face).**

Example 4.4.3** (Linear image not closed).**

Example 4.4.4** (Linear image that is not facially exposed).**

Example 2.1.1 (Adjoints of mappings between ${\bf R}^{n}$ and ${\bf R}^{m}$ ).

Example 2.1.2 (Adjoints of mappings between ${\bf S}^{n}$ and ${\bf R}^{m}$ ).

Example 2.1.3 (The nonnegative orthant ${\bf R}^{n}_{+}$ ).

Example 2.1.4 (The positive semi-definite cone ${\bf S}^{n}_{+}$ ).

Lemma 2.1.5 (Self-duality).

Proof 2.1.6.

Lemma 2.1.7 (Dual cone of a sum).

Lemma 2.1.8 (Double dual).

Definition 2.2.1 (Faces).

Definition 2.2.2 (Minimal face).

Definition 2.2.3 (Exposed faces).

Example 2.2.4 (Nonexposed faces).

Proposition 2.2.5 (Exposing the intersection of exposed faces).

Proof 2.2.6.

Definition 2.2.7 (Conjugate face).

Example 2.2.8 (Faces of ${\bf R}^{n}_{+}$ ).

Example 2.2.9 (Faces of ${\bf S}^{n}_{+}$ ).

Proposition 2.3.1 (Complementary slackness).

Definition 2.3.2 (Strict feasibility/Slater condition).

Theorem 2.3.3 (Strong duality).

Example 2.3.4 (Infinite gap).

Example 2.3.5 (Positive duality gap).

Example 2.3.6 (Zero duality gap, but no attainment).

Example 2.3.7 (Convergence to dual optimal value).

Theorem 3.1.1 (Hyperplane separation theorem).

Theorem 3.1.2 (Homogeneous separation).

Proof 3.1.3.

Theorem 3.1.4 (Theorem of the alternative for the primal).

Proof 3.1.5.

Theorem 3.1.6 (Theorem of the alternative for the dual).

Proof 3.1.7.

Definition 3.2.1 (Mangasarian-Fromovitz CQ).

Theorem 3.2.2 (Directional derivative of the value function).

Lemma 3.2.3 (Condition number and MFCQ).

Proof 3.2.4.

Theorem 3.2.5 (Boundedness of the dual solution set).

Proof 3.2.6.

Theorem 3.2.7 (Stability of the solution set).

Definition 3.3.1 (Distance to infeasibility).

Theorem 3.3.2 (Strict feasibility and distance to infeasibility).

Example 4.1.1 (primal facial reduction).

Example 4.1.2 (dual facial reduction).

Definition 4.2.1 (Singularity degree).

Example 4.3.1 (Singularity degree larger than one).

Proposition 4.4.1 (Range space characterization of Slater).

Theorem 4.4.2 (Fundamental description of the minimal face).

Example 4.4.3 (Linear image not closed).

Example 4.4.4 (Linear image that is not facially exposed).

Theorem 4.5.1 (Hölderian

Example 4.5.2 (Worst-case example).

Theorem 5.1.1 (PSD completable and chordal graphs).

Proposition 5.1.2 (Closure of the image).

Example 5.1.3 (Reducing the PSD completion problem).

Lemma 5.1.4 (Completion of banded all ones matrices).

Proof 5.1.5.

Theorem 5.1.6 (Finding the minimal face on chordal graphs).

Example 5.1.7 (Slater condition & nonchordal graphs, [47]).

Corollary 5.1.8 (Singularity degree of chordal completions).

Theorem 5.2.1 (Parametrization of the EDM cone).

Theorem 5.2.2 (EDM completability & chordal graphs).

Theorem 5.2.3 (Closedness of the projected EDM cone).

Theorem 5.2.4 (Clique facial reduction for EDM completions).

Theorem 5.2.5 (Clique facial reduction for EDM is sufficient).

Corollary 5.2.6 (Singularity degree of chordal completions).

Corollary 5.2.7 (Singularity degree of chordal completions).

Example 6.3.1 (Strictly feasible SDP lifts).

Theorem 6.4.1.

Proof 6.4.2.

Theorem 6.4.3 (Type I face).

Theorem 6.4.4 (Type II face).

Theorem 6.4.5.

Theorem 6.4.6.