TL;DR
This paper explores the causes and implications of degeneracy in conic optimization, emphasizing the role of facial reduction techniques in addressing issues arising from the loss of strict feasibility.
Contribution
It provides a comprehensive analysis of degeneracy causes in conic optimization and highlights the effectiveness of facial reduction methods for overcoming these challenges.
Findings
Loss of strict feasibility affects optimality conditions and numerical methods.
Facial reduction offers a mathematically elegant way to handle degeneracy.
Rich problem structures can be exploited to improve optimization outcomes.
Abstract
Slater's condition -- existence of a "strictly feasible solution" -- is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact off-the-shelf numerical methods, such as primal-dual interior point methods, in particular. New optimization modelling techniques and convex relaxations for hard nonconvex problems have shown that the loss of strict feasibility is a more pronounced phenomenon than has previously been realized. In this text, we describe various reasons for the loss of strict feasibility, whether due to poor modelling choices or (more interestingly) rich underlying structure, and discuss ways to cope with it and, in many…
| # sensors | # anchors | radio range | RMSD | Time |
|---|---|---|---|---|
| 9 | s | |||
| 9 | m s | |||
| 9 | m s | |||
| 9 | m s |
| Specifications | Time (s) | Rank | Residual (%) | ||
|---|---|---|---|---|---|
| mean() | |||||
| 700 | 2000 | 0.36 | 12.80 | 4.0 | 1.5217e-12 |
| 1000 | 5000 | 0.36 | 49.66 | 4.0 | 1.0910e-12 |
| 1400 | 9000 | 0.36 | 131.53 | 4.0 | 6.0304e-13 |
| 1900 | 14000 | 0.36 | 291.22 | 4.0 | 3.4847e-11 |
| 2500 | 20000 | 0.36 | 798.70 | 4.0 | 7.2256e-08 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\copyrightowner
The many faces of degeneracy
in
conic optimization
last modified on
Dmitriy Drusvyatskiy
Department of Mathematics
University of Washington
Henry Wolkowicz
Faculty of Mathematics
University of Waterloo
Abstract
Slater’s condition – existence of a “strictly feasible solution” – is a common assumption in conic optimization. Without strict feasibility, first-order optimality conditions may be meaningless, the dual problem may yield little information about the primal, and small changes in the data may render the problem infeasible. Hence, failure of strict feasibility can negatively impact off-the-shelf numerical methods, such as primal-dual interior point methods, in particular. New optimization modelling techniques and convex relaxations for hard nonconvex problems have shown that the loss of strict feasibility is a more pronounced phenomenon than has previously been realized. In this text, we describe various reasons for the loss of strict feasibility, whether due to poor modelling choices or (more interestingly) rich underlying structure, and discuss ways to cope with it and, in many pronounced cases, how to use it as an advantage. In large part, we emphasize the facial reduction preprocessing technique due to its mathematical elegance, geometric transparency, and computational potential.
Contents
List of Figures
- 2.1 The set .
- 4.1 Nonexposed face of the image set
- 5.1 Instance of EDMC
- 6.1 Difference in cpu seconds (without FR with FR)
- 6.2 Difference in accuracy values (without FR with FR)
List of Tables
Chapter 1 What this paper is about
Conic optimization has proven to be an elegant and powerful modeling tool with surprisingly many applications. The classical linear programming
problem revolutionized operations research and is still the most widely used optimization model. This is due to the elegant theory and the ability to solve in practice both small and large scale problems efficiently and accurately by the well known simplex method of Dantzig [35] and by more recent interior-point methods, e.g., [148, 98]. The size (number of variables) of linear programs that could be solved before the interior-point revolution was on the order of tens of thousands, whereas it immediately increased to millions for many applications. A large part of modern success is due to preprocessing, which aims to identify (primal and dual slack) variables that are identically zero on the feasible set. The article [96] is a good reference.
The story does not end with linear programming. Dantzig himself recounts in [36]: “the world is nonlinear”. Nonlinear models can significantly improve on linear programs if they can be solved efficiently. Conic optimization has shown its worth in its elegant theory, efficient algorithms, and many applications e.g., [146, 9, 20]. Preprocessing to rectify possible loss of “strict-feasibility” in the primal or the dual problems is appealing for general conic optimization as well. In contrast to linear programming, however, the area of preprocessing for conic optimization is in its infancy; see e.g., [29, 138, 30, 107, 109] and Section 1.1, below. In contrast to linear programming, numerical error makes preprocessing difficult in full generality. This being said, surprisingly, there are many specific applications of conic optimization, where the rich underlying structure makes preprocessing possible, leading to greatly simplified models and strengthened algorithms. Indeed, exploiting structure is essential for making preprocessing viable. In this article, we present the background and the elementary theory of such regularization techniques in the framework of facial reduction (FR). We focus on notable case studies, where such techniques have proven to be useful.
1.1 Related work
To put this text in perspective, it is instructive to consider nonlinear programming. Nontrivial statements in constrained nonlinear optimization always rely on some regularity of the constraints. To illustrate, consider a minimization problem over a set of the form for some smooth . How general are such constraints? A celebrated result of Whitney [143] shows that any closed set in a Euclidean space can written as a zero-set of some -smooth function . Thus, in this generality, there is little difference between minimizing over arbitrary closed sets and sets of the form , for smooth . Since little can be said about optimizing over arbitrary closed sets, one must make an assumption on the equality constraint. The simplest one, eliminating Whitney’s construction, is that the gradient of is nonzero on the feasible region – the earliest form of a constraint qualification. There have been numerous papers, developing weakened versions of regularity (and optimality conditions) in nonlinear programming; some good examples are [62, 25, 22].
The Slater constraint qualification, we discuss in this text, is in a similar spirit, but in the context of (convex) conic optimization. Some good early references on the geometry of the Slater condition, and weakened variants, are [57, 93, 94, 144, 19]. The concept of facial reduction for general convex programs was introduced in [23, 24], while an early application to a semi-definite type best-approximation problem was given in [145]. Recently, there has been a significant renewed interest in facial reduction, in large part due to the success in applications for graph related problems, such as Euclidean distance matrix completion and molecular conformation [76, 75, 46, 6] and in polynomial optimization [110, 111, 74, 141, 140]. In particular, a more modern explanation of the facial reduction procedure can be found in [88, 104, 107, 136, 142].
We note in passing that numerous papers show that strict feasibility holds “generically” with respect to unstructured perturbations. In contrast, optimization problems appearing in applications are often highly structured and such genericity results are of little practical use.
1.2 Outline of the paper
The paper is divided into two parts. In Part I, we present the necessary theoretical grounding in conic optimization, including basic optimality and duality theory, connections of Slater’s condition to the distance to infeasibility and sensitivity theory, the facial reduction procedure, and the singularity degree. In Part II, we concentrate on illustrative examples and applications, including matrix completion problems (semi-definite, low-rank, and Euclidean distance), relaxations of hard combinatorial problems (quadratic assignment and max-cut), and sum of squares relaxations of polynomial optimization problems.
1.3 Reflections on Jonathan Borwein and FR
These are some reflections on Jonathan Borwein and his role in the development of the facial reduction technique, by Henry Wolkowicz. Jonathon Borwein passed away unexpectedly on Aug. 2, 2016. Jon was an extraordinary mathematician who made significant contributions in an amazing number of very diverse areas. Many details and personal memories by myself and many others including family, friends, and colleagues, are presented at the memorial website jonborwein.org. This was a terrible loss to his family and all his friends and colleagues, including myself. The facial reduction process we use in this monograph originates in the work of Jon and the second author (myself). This work took place from July of 1978 to July of 1979 when I went to Halifax to work with Jon at Dalhousie University in a lectureship position. The optimality conditions for the general abstract convex program using the facially reduced problem is presented in the two papers [23, 22]. The facial reduction process is then derived in [24].
Part I Theory
Chapter 2 Convex geometry
This section collects preliminaries of linear algebra and convex geometry that will be routinely used in the rest of the manuscript. The main focus is on convex duality, facial structure of convex cones, and the primal-dual conic optimization pair. The two running examples of linear and semi-definite programming illustrate the concepts. We have tried to include proofs of important theorems, when they are both elementary and enlightening. We have omitted arguments that are longer or that are less transparent, so as not to distract the reader from the narrative.
2.1 Notation
Throughout, we will fix a Euclidean space with an inner product and the induced norm . When referencing another Euclidean space (with its own inner product), we will use the letter . An open ball of radius around a point will be denoted by . The two most important examples of Euclidean spaces for us will be the space of -vectors with the dot product and the space of symmetric matrices with the trace inner product . Throughout, we let be the ’th coordinate vector of . Note that the trace inner product can be equivalently written as . Thus the trace inner product is itself the dot product between the two matrices stretched into vectors. A key property of the trace is invariance under permutations of the arguments: for any two matrices and .
For any linear mapping , between Euclidean spaces and , the adjoint mapping is the unique mapping satisfying
[TABLE]
Notice that the angle brackets on the left refer to the inner product in , while those on the right refer to the inner product in .
Let us look at two important examples of adjoint maps.
Example 2.1.1** (Adjoints of mappings between and ).**
Consider a matrix as a linear map from . Then the adjoint is simply the transpose . To make the parallel with the next example, it is useful to make this description more explicit. Suppose that the linear operator is defined by
[TABLE]
where are some vectors in . When thinking of as a matrix, the vectors would be its rows, and the description (2.1) corresponds to a “row-space” view of matrix-vector multiplication . The adjoint is simply the map
[TABLE]
Again, when thinking of as a matrix with as its rows, the description (2.2) corresponds to the “column-space” view of matrix-vector multiplication .
Example 2.1.2** (Adjoints of mappings between and ).**
Consider a set of symmetric matrices in , and define the linear map by
[TABLE]
We note that any linear map can be written in this way for some matrices . Notice the parallel to (2.1). The adjoint is given by
[TABLE]
Notice the parallel to (2.2). To verify that this indeed is the adjoint, simply observe the equation
[TABLE]
for any and .
The interior, boundary, and closure of any set will be denoted by , , and , respectively.
A set is convex if it contains the line segment joining any two points in :
[TABLE]
The minimal affine space containing a convex set is called the affine hull of , and is denoted by . We define the relative interior of , written , to be the interior of relative to . It is straightforward to show that a for a nonempty convex set , the relative interior is never empty.
A subset of is a convex cone if is convex and is positively homogeneous, meaning for all . Equivalently, is a convex cone if, and only if, for any two points and in and any nonnegative constants , the sum lies in . We say that a convex cone is proper if is closed, has nonempty interior, and contains no lines. The symbol refers to the orthogonal complement of . Let us look at two most important examples of proper cones for this article.
Example 2.1.3** (The nonnegative orthant ).**
The nonnegative orthant
[TABLE]
is a proper convex cone in . The interior of is the set
[TABLE]
Example 2.1.4** (The positive semi-definite cone ).**
Consider the set of positive semi-definite matrices
[TABLE]
It is immediate from the definition that is a convex cone containing no lines. Let us quickly verify that is proper. To see this, observe
[TABLE]
Thus is closed because it is the intersection of the halfspaces for all , and arbitrary intersections of closed sets are closed. The interior of is the set of positive definite matrices
[TABLE]
Let us quickly verify this description. Showing that is open is straightforward; we leave the details to the reader. Conversely, consider a matrix and let be a nonzero vector satisfying . Then the matrix lies outside of for every , and therefore must lie on the boundary of . To summarize, we have shown that is a proper convex cone.
Given a convex cone in , we introduce two binary relations and on :
[TABLE]
Assuming that is proper makes the relation into a partial order, meaning that for any three points , the three conditions hold:
(reflexivity) 2. 2.
(antisymmetry) 3. 3.
(transitivity)
As is standard in the literature, we denote the partial order on by and the partial order on by . In particular, the relation means for each coordinate , while the relation means that the matrix is positive semi-definite.
Central to conic geometry is duality. The dual cone of is the set
[TABLE]
The following lemma will be used extensively.
Lemma 2.1.5** (Self-duality).**
Both and are self-dual, meaning and .
Proof 2.1.6**.**
The equality is elementary and we leave the proof to the reader. To see that is self-dual, recall that a matrix is positive semi-definite if, and only if, all of its eigenvalues are nonnegative. Fix two matrices and let be the eigenvalue decomposition of . Then we deduce
[TABLE]
Therefore the inclusion holds. Conversely, for any and any the inequality, , holds. The reverse inclusion follows, and the proof is complete.
Finally, we end this section with the following two useful results of convex geometry.
Lemma 2.1.7** (Dual cone of a sum).**
For any two closed convex cones and , equalities hold:
[TABLE]
Lemma 2.1.8** (Double dual).**
A set is a closed convex cone if, and only if, equality holds.
In particular, if is a proper convex cone, then so is its dual , as the reader can verify.
2.2 Facial geometry
Central to this paper is the decomposition of a cone into faces.
Definition 2.2.1** (Faces).**
*Let be a convex cone. A convex cone is called a face of , denoted , if the implication holds:
[TABLE]
Let be a closed convex cone. Vacuously, the empty set and itself are faces. A face is proper if it is neither empty nor all of . One can readily verify from the definition that the intersection of an arbitrary collection of faces of is itself a face of . A fundamental result of convex geometry shows that relative interiors of all faces of form a partition of : every point of lies in the relative interior of some face and relative interiors of any two distinct faces are disjoint. In particular, any proper face of is disjoint from .
Definition 2.2.2** (Minimal face).**
The minimal face of a convex cone containing a set is the intersection of all faces of containing , and is denoted by .
A convenient alternate characterization of minimal faces is as follows. If is a convex set, then is the smallest face of intersecting the relative interior of . In particular, equality holds for any point .
There is a special class of faces that admit “dual” descriptions. Namely, for any vector one can readily verify that the set is a face of .
Definition 2.2.3** (Exposed faces).**
Any set of the form , for some vector , is called an exposed face of . The vector is then called an exposing vector of .
The classical hyperplane separation theorem shows that any point in the relative boundary of lies in some proper exposed face. Not all faces are exposed, however, as the following example shows.
Example 2.2.4** (Nonexposed faces).**
Consider the set , and let be the closed convex cone generated by . Then the ray is a face of but it is not exposed.
The following is a very useful property of exposed faces we will use.
Proposition 2.2.5** (Exposing the intersection of exposed faces).**
For any closed convex cone and vectors , equality holds:
[TABLE]
Proof 2.2.6**.**
The inclusion is trivial. To see the converse, note for any we have , , while . We deduce as claimed.
In other words, if the faces and are exposed by and , respectively, then the intersection is a face exposed by the sum .
A convex cone is called facially exposed if all of its faces are exposed. The distinction between faces and exposed faces may appear mild at first sight; however, we will see that it is exactly this distinction that can cause difficulties for preprocessing techniques for general conic problems.
Definition 2.2.7** (Conjugate face).**
*With any face of a convex cone , we associate a face of the dual cone , called the conjugate face, . *
Equivalently, is the face of exposed by any point , that is . Thus, in particular, conjugate faces are always exposed. Not surprisingly, one can readily verify that equality holds if, and only if, the face is exposed.
We illustrate the concepts with our two running examples, and , keeping in mind the parallels between the two.
Example 2.2.8** (Faces of ).**
For any index set , the set
[TABLE]
is a face of , and all faces of are of this form. In particular, observe that all faces of are linearly isomorphic to for some positive integer . In this sense, is “self-replicating”. The relative interior of consists of all points in with for indices . The face is exposed by the vector with for all and for all . In particular, is a facially exposed convex cone. The face conjugate to is .
Example 2.2.9** (Faces of ).**
There are a number of different ways to think about (and represent) faces of the PSD cone . In particular, one can show that faces are in correspondence with linear subspaces of . More precisely, for any -dimensional linear subspace of , the set
[TABLE]
is a face of . Conversely, any face of can be written in the form (2.3), where is the range space of any matrix lying in the relative interior of the face. The relative interior of consists of all matrices whose range space coincides with . Moreover, for any matrix satisfying , we have the equivalent description
[TABLE]
In particular, is linearly isomorphic to the -dimensional positive semi-definite cone . The face conjugate to is and can be equivalently written as
[TABLE]
for any matrix satisfying . Notice that then the matrix lies in the relative interior of and therefore exposes the face . In particular, is facially exposed and also self-replicating.
2.3 Conic optimization problems
Modern conic optimization draws fundamentally from “duality”: every conic optimization problem gives rise to a related conic optimization problem, called its dual. Consider the primal-dual pair:
[TABLE]
Here, is a closed convex cone in and the mapping is linear. Eliminating the trivial case that the system has no solution, we will always assume that lies in , and that has nonempty interior. Two examples of conic optimization are of main importance for us: linear programming (LP) corresponds to , and semi-definite programming (SDP) corresponds to , . The adjoint in both cases was computed in Examples 2.1.3 and 2.1.4. We will also use the following notation for the primal and dual feasible regions:
[TABLE]
It is important to note that the dual can be put in a primal form by introducing slack variables leading to the equivalent formulation
[TABLE]
To a reader unfamiliar with conic optimization, it may be unclear how the dual arises naturally from the primal. Let us see how it can be done. The dual problem (D) can be discovered through “Lagrangian duality” in convex optimization. Define the Lagrangian function
[TABLE]
and observe the equality
[TABLE]
Thus the primal problem (P) is equivalent to
[TABLE]
Formally exchanging min/max, yields exactly the dual problem (D)
[TABLE]
The primal-dual pair always satisfies the weak duality inequality: for any primal feasible and any dual feasible , we have
[TABLE]
Thus for any feasible point of the dual, its objective value lower-bounds the optimal value of the primal. The weak duality inequality (2.6) leads to the following sufficient conditions for optimality.
Proposition 2.3.1** (Complementary slackness).**
Suppose that are a primal-dual feasible pair for (P), (D) and suppose that complementary slackness holds:
[TABLE]
Then is a minimizer of (P) and is a maximizer of (D).
The sufficient conditions for optimality of Proposition 2.3.1 are often summarized as the primal-dual system:
[TABLE]
Derivations of algorithms generally require necessary optimality conditions, i.e., failure of the necessary conditions at a current approximation of the optimum leads to improvement steps. When are sufficient conditions for optimality expressed in Proposition 2.3.1 necessary? In other words, when can we be sure that optimality of a primal solution can be certified by the existence of some dual feasible point , such that the pair satisfies the complementary slackness condition? Conditions guaranteeing existence of Lagrange multipliers are called constraint qualifications. The most important condition of this type is called strict feasibility, or Slater’s condition, and is the main topic of this article.
Definition 2.3.2** (Strict feasibility/Slater condition).**
We say that (P) is strictly feasible if there exists a point satisfying . The dual (D) is strictly feasible if there exists a point satisfying .
The following result is the cornerstone of conic optimization.
Theorem 2.3.3** (Strong duality).**
If the primal objective value is finite and the problem (P) is strictly feasible, then the primal and dual optimal values are equal, and the dual (D) admits an optimal solution. In addition, for any that is optimal for (P), there exists a vector such that satisfies complementary slackness.
Similarly, if the dual objective value is finite and the dual (D) satisfies strict feasibility, then the primal and dual optimal values are equal and the primal (P) admits an optimal solution. In addition, for any that is optimal for (D), there exists a point such that satisfies complementary slackness.
Without a constraint qualification such as strict feasibility, the previous theorem is decisively false. The following examples show that without strict feasibility, the primal and dual optimal values may not even be equal, and even if they are equal, the optimal values may be unattained.
Example 2.3.4** (Infinite gap).**
Consider the following primal SDP in (2.4):
[TABLE]
The corresponding dual SDP is the infeasible problem
[TABLE]
Both the primal and the dual fail strict feasibility in this example.
Example 2.3.5** (Positive duality gap).**
Consider the following primal SDP in (2.4):
[TABLE]
The constraint with implies equality , and hence . Therefore, and the matrix is optimal.
The corresponding dual SDP is
[TABLE]
This time the SDP constraint implies . We deduce that is optimal for the dual and hence . There is a finite duality gap between the primal and dual problems. The culprit again is that both the primal and the dual fail strict feasibility.
Example 2.3.6** (Zero duality gap, but no attainment).**
Consider the dual SDP
[TABLE]
The only feasible point is . Thus the optimal value is and is attained. The primal SDP is
[TABLE]
Notice for all feasible . On the other hand, the sequence is feasible and satisfies . Thus there is no duality gap, meaning , but the primal optimal value is not attained. The culprit is that the dual SDP is not strictly feasible.
Example 2.3.7** (Convergence to dual optimal value).**
Numerical solutions of problems inevitably suffer from some perturbations of the data, that is a perturbed problem is in fact solved. Moreover, often it is tempting to explicitly perturb a constraint in the problem, so that strict feasibility holds. This example shows that this latter strategy results in the dual of the problem being solved, as opposed to the problem under consideration.
We consider the primal-dual SDP pair in Example 2.3.5. In particular, suppose first that we want to solve the dual problem. We canonically perturb the right-hand side of the dual in (2.8)
[TABLE]
for some matrix and real . Strict feasibility now holds and we hope that the optimal values of the perturbed problems converge to that of the original one . We can rewrite feasibility for the perturbed problem as
[TABLE]
A triple is strictly feasible if, and only if, the leading principal minors of the left-hand side matrix in (2.9) are all positive. We have . The second leading principal minor as a function of is
[TABLE]
In particular, rearranging we have whenever
[TABLE]
The last minor is positive for sufficiently negative by the Schur complement. Consequently the perturbed problems satisfy
[TABLE]
and therefore
[TABLE]
That is the primal optimal value is obtained in the limit rather than the dual optimal value that is sought.
Let us look at an analogous perturbation to the primal problem. Let be the linear operator and set . Consider the perturbed problems
[TABLE]
for some fixed real and a matrix . Each such problem is strictly feasible, since the positive definite matrix is feasible for any matrix that is feasible for the original primal problem.
In long form, the perturbed primal problems are
[TABLE]
Consider the matrix
[TABLE]
*This matrix satisfies the linear system by construction and is positive semi-definite for all sufficiently large . We deduce for all . Again, as tends to zero we obtain the dual optimal value rather than the sought after primal optimal value. *
2.4 Commentary
We follow here well-established notation in convex optimization, as illustrated for example in the monographs of Barvinok [17], Ben-Tal-Nemirovski [20], Borwein-Lewis [21], and Rockafellar [123]. The handbook of SDP [146] and online lecture notes [85] are other excellent sources in the context of semi-definite programming. These include discussion on the facial structure. The relevant results stated in the text can all be found for instance in Rockafellar [123]. The example 2.3.5 is a modification of the example in [114]. In addition, note that the three examples 2.3.4, 2.3.5, 2.3.6 have matrices with the special perdiagonal structure. The universality of such special structure in “ill-posed” SDPs has recently been investigated in great length by Pataki [105].
Chapter 3 Virtues of strict feasibility
We have already seen in Theorem 2.3.3 that strict feasibility is essential to guarantee dual attainment and therefore for making the primal-dual optimality conditions (2.7) meaningful. In this section, we continue discussing the impact of strict feasibility on numerical stability. We begin with the theorems of the alternative, akin to the Farkas’ Lemma in linear programming, which quantify the extent to which strict feasibility holds. We then show how such systems appear naturally in stability measures of the underlying problem.
3.1 Theorem of the alternative
The definition we have given of strict feasibility (Slater) is qualitative in nature, that is it involves no measurements. A convenient way to measure the extent to which strict feasibility holds (i.e. its strength) arises from dual characterizations of the property. We will see that strict feasibility corresponds to inconsistency of a certain auxiliary system. Measures of how close the auxiliary system is to being consistent yield estimates of “stability” of the problem.
The aforementioned dual characterizations stem from the basic hyperplane separation theorem for convex sets.
Theorem 3.1.1** (Hyperplane separation theorem).**
Let and be two disjoint nonempty convex sets. Then there exists a nonzero vector and a real number satisfying
[TABLE]
When one of the sets is a cone, the separation theorem takes the following “homogeneous” form.
Theorem 3.1.2** (Homogeneous separation).**
Consider a nonempty closed convex set and a closed convex cone with nonempty interior. Then exactly one of the following alternatives holds.
The set intersects the interior of . 2. 2.
There exists a vector satisfying for all .
Moreover, for any vector satisfying the alternative 2, the region is contained in the proper face .
Proof 3.1.3**.**
Suppose that does not intersect the interior of . Then the convex cone generated by , denoted by , does not intersect either. The hyperplane separation theorem (Theorem 3.1.1) shows that there is a nonzero vector and a real number satisfying
[TABLE]
Setting , we deduce . Hence lies in and 2 holds.
Conversely, suppose that 2 holds. Then the inequalities hold for all . Thus we deduce that the intersection lies in the proper face . Hence the alternative 1 can not hold.
Let us now specialize the previous theorem to the primal problem (P), by letting be the affine space . Indeed, this is the main result of this subsection and it will be used extensively in what follows.
Theorem 3.1.4** (Theorem of the alternative for the primal).**
Suppose that is a closed convex cone with nonempty interior. Then exactly one of the following alternatives holds.
The primal (P) is strictly feasible. 2. 2.
The auxiliary system is consistent:
[TABLE]
Suppose that the primal (P) is feasible. Then the auxiliary system (3.1) is equivalent to the system
[TABLE]
Moreover, then any vector satisfying either of the equivalent systems, (3.1) and (3.2), yields a proper face containing the primal feasible region .
Proof 3.1.5**.**
Set . Clearly strict feasibility of (P) is equivalent to alternative 1 of Theorem 3.1.2. Thus it suffices to show that the auxiliary system (3.1) is equivalent to the alternative 2 of Theorem 3.1.2. To this end, note that for any vector satisfying (3.1), the vector satisfies the alternative 2 of Theorem 3.1.2. Conversely, consider a vector satisfying for all . Fix a point and observe the equality . An easy argument then shows that is orthogonal to , and therefore can be written as for some vector . We deduce , and therefore satisfies (3.1).
Next, assume that (P) is feasible. Suppose satisfies (3.1). Then for any feasible point of (P), we deduce . Thus satisfies the system (3.2). It follows that the two systems (3.1) and (3.2) are equivalent and that the proper face contains the primal feasible region , as claimed.
Suppose that is nonempty. Then if strict feasibility fails, there always exists a “witness” (or “short certificate”) satisfying the auxiliary system (3.2). Indeed, given such a vector , one immediately deduces, as in the proof, that is contained in the proper face of . Such certificates will in a later section be used constructively to regularize the conic problem through the FR procedure.
The analogue of Theorem 3.1.4 for the dual (D) quickly follows.
Theorem 3.1.6** (Theorem of the alternative for the dual).**
Suppose that has nonempty interior. Then exactly one of the following alternatives holds.
The dual (D) is strictly feasible. 2. 2.
The auxiliary system is consistent:
[TABLE]
Suppose that the dual (D) is feasible. Then the auxiliary system (3.3) is equivalent to the system
[TABLE]
Moreover, for any vector satisfying either of the equivalent systems, (3.3) and (3.4), yields a proper face containing the feasible slacks .
Proof 3.1.7**.**
Apply Theorem 3.1.4 to the equivalent formulation (2.5) of the dual (D).
3.2 Stability of the solution
In this section, we explain the impact of strict feasibility on stability of the conic optimization problem through quantities naturally arising from the auxiliary system (3.1). For simplicity we focus on the primal problem (P), though an entirely analogous development is possible for the dual, for example by introducing slack variables.
We begin with a basic question: at what rate does the optimal value of the primal problem (P) change relative to small perturbations of the right-hand-side of the linear equality constraints? To formalize this question, define the value function
[TABLE]
The value function thus defined is convex, meaning that its epigraph
[TABLE]
is a convex set. Seeking to understand stability of the primal (P) under perturbation of the right-hand-side , it is natural to examine the variational behavior of the value function. There is an immediate obstruction, however. If is not surjective, then there are arbitrarily small perturbations of making take on infinite values. As a result, in conjunction with strict feasibility, we will often make the mild assumption that is surjective. These two properties taken together are refereed to as the Mangasarian-Fromovitz Constraint Qualification (MFCQ).
Definition 3.2.1** (Mangasarian-Fromovitz CQ).**
We say that the Mangasarian-Fromovitz Constraint Qualification (MFCQ) holds for (P) if is surjective and (P) is strictly feasible.
The following result describes directional derivatives of the value function.
Theorem 3.2.2** (Directional derivative of the value function).**
Suppose the primal problem (P) is feasible and its optimal value is finite. Let be the set of optimal solutions of the dual (D). Then is nonempty and bounded if, and only if, MFCQ holds. Moreover, under MFCQ, the directional derivative of the value function at in direction admits the representation
[TABLE]
In particular, in the notation of the above theorem, the local Lipschitz constant of at the origin,
[TABLE]
coincides with the norm of the maximal-norm dual optimal solution, and is finite if, and only if, MFCQ holds. Is there then an upper-bound on the latter that we can easily write down? Clearly, such a quantity must measure the strength of MFCQ, and is therefore intimately related to the auxiliary system (3.1). To this end, let us define the condition number
[TABLE]
This number is a quantitative measure of MFCQ, and will appear in latter sections as well. Some thought shows that it is in essence measuring how close the auxiliary system (3.1) is to being consistent.
Lemma 3.2.3** (Condition number and MFCQ).**
The condition number cond(P)** is nonzero if, and only if, MFCQ holds.
Proof 3.2.4**.**
Suppose cond(P)** is nonzero. Then clearly is surjective, since otherwise we could find a unit vector with and . Moreover, the auxiliary system (3.1) is clearly inconsistent, and therefore (P) is strictly feasible.
Conversely, suppose MFCQ holds. Assume for the sake of contradiction . Then there exists a unit vector satisfying and . Thus (3.1) is consistent, a contradiction.
Theorem 3.2.5** (Boundedness of the dual solution set).**
Suppose the problem (P) is feasible with a finite optimal value val. If the condition number is nonzero, then the inequality
[TABLE]
holds for all dual optimal solutions .
Proof 3.2.6**.**
Consider an optimal solution of the dual (D). The inclusion implies . Moreover, we have . We deduce and the result follows.
Thus the Lipschitz constant of the value function depends on the extent to which MFCQ holds through the condition number. What about stability of the solution set itself? The following theorem, whose proof we omit, answers this question.
Theorem 3.2.7** (Stability of the solution set).**
Suppose (P) satisfies MFCQ. Let be the solution set of the perturbed system
[TABLE]
Fix a putative solution . Then there exist constants and so that the inequality
[TABLE]
holds for any and . The infimal value of over all choices of so that the above inequalities hold is exactly
[TABLE]
In particular, under MFCQ, we can be sure that for any point , there exist and satisfying
[TABLE]
In other words, the distance , which measures the how far has moved relative to , is bounded by a multiple of the perturbation parameter . The proportionality constant is fully governed by the strength of MFCQ, as measured by the quantity (3.5).
3.3 Distance to infeasibility
In numerical analysis, the notion of stability is closely related to the “distance to infeasibility” – the smallest perturbation needed to make the problem infeasible. A simple example is the problem of solving an equation for an invertible matrix . Then the Eckart-Young theorem shows equality
[TABLE]
Here denotes the operator norm of . The left-hand-side measures the smallest perturbation needed to make the system singular, while the right-hand-side measures the Lipschitz dependence of the solution to the linear system relative to perturbations in , and yet the two quantities are equal. An entirely analogous situation holds in conic optimization, with MFCQ playing the role of invertibility.
Definition 3.3.1** (Distance to infeasibility).**
The distance to infeasibility of (P) is the infimum of the quantity over linear mappings and vectors such that the system
[TABLE]
This quantity does not change if instead of the loss of feasibility, we consider the loss of strict feasibility. The following fundamental result equates the condition number (measuring the strength of MFCQ) and the distance to infeasibility.
Theorem 3.3.2** (Strict feasibility and distance to infeasibility).**
The following exact equation is always true:
[TABLE]
3.4 Commentary
The classical theorem of the alternative is Farkas Lemma that appears in proofs of duality in linear programming, as well as in more general nonlinear programming, after linearizations. This and more general theorems of the alternative are given in e.g., the 1969 book by Mangasarian [92] and in the 1969 survey paper by Ben-Israel [18]. The specific theorems of the alternative that we use are similar to the one used in the FR development in [23, 22, 24].
The Mangasarian-Fromovitz CQ was introduced in [91]. This condition and its equivalence to stability with respect to perturbations in the data and compactness of the multiplier set has been the center of extensive research, e.g., [54]. The analogous conditions for general nonlinear convex constraint systems is the Robinson regularity condition, e.g.,[121, 122]. The notion of distance to infeasibility and relations to condition numbers was initiated by Renegar e.g., [120, 118, 119, 108]. The relation with Slater’s condition is clear. Theorem 3.3.2, as stated, appears in [44], though in essence it is present in Renegar’s work [120, 118].
Chapter 4 Facial reduction
Theorems 3.1.4 and 3.1.6 have already set the stage for the “Facial Reduction” procedure, used for regularizing degenerate conic optimization problems by restricting the problem to smaller and smaller dimensional faces of the cone . In this section, we formalize this viewpoint, empathizing semi-definite programming. Before we proceed with a detailed description, it is instructive to look at the simplest example of Linear Programming. In this case, a single iteration of the facial reduction procedure corresponds to finding redundant variables (in the primal) and implicit equality constraints (in the dual).
4.1 Preprocessing in linear programming
Improvements in the solution methods for large-scale linear programming problems have been dramatic since the late 1980’s. A technique that has become essential in commercial software is a preprocessing step for the linear program before sending it to the solver. The preprocessing has many essential features. For example, it removes redundant variables (in the primal) and implicit equality constraints (in the dual) thus potentially dramatically reducing the size of the problem while simultaneously improving the stability of the model. These steps in linear programming are examples of the Facial Reduction procedure, which we will formalize shortly.
Example 4.1.1** (primal facial reduction).**
Consider the problem
[TABLE]
If we sum the two constraints we see
[TABLE]
Thus the coordinates , , and are identically zero on the entire feasible set. In other words, the feasible region is contained in the proper face of the cone . The zero coordinates can easily be eliminated and the corresponding columns discarded, yielding the equivalent simplified problem in the smaller face:
[TABLE]
The second equality can now also be discarded as it is is equivalent to the first.
How can such implicit zero coordinates be discovered systematically? Not surprisingly, the auxiliary system (3.2) provides the answer:
[TABLE]
Suppose is feasible for this auxiliary system. Then for any feasible for the problem, we deduce . Thus all the coordinates , for which the strict inequality holds, must be zero.
Example 4.1.2** (dual facial reduction).**
A similar procedure applies to the dual. Consider the problem
[TABLE]
Twice the third row plus the fourth row sums to zero. We conclude that the last two constraints are implicit equality constraints. Thus after substituting , we obtain a simple univariate problem. Again, this discovery of implicit equality constraints can be done systematically by considering the auxiliary system (3.4):
[TABLE]
Suppose we find such a vector . Then for any feasible vector we deduce . Thus for each positive component , the corresponding inequality is fulfilled with equality along the entire feasible region.
4.2 Facial reduction in conic optimization
Keeping in mind the example of Linear Programming, we now formally describe the Facial Reduction procedure. To do this, consider the primal problem (P) failing Slater’s condition. Our goal is to find an equivalent problem that does satisfy Slater’s condition. To this end, suppose that we had a description of – the minimal face of containing the feasible region . Then we could replace with , with , and with its restriction to . The resulting smaller dimensional primal problem would automatically satisfy Slater’s condition, since intersects the relative interior of .
The Facial Reduction procedure is a conceptual method that at termination discovers . Suppose that has nonempty interior. In the first iteration, the scheme determines any vector satisfying the auxiliary system (3.2). If no such vector exists, Slater’s condition holds and the method terminates. Else, Theorem 3.1.4 guarantees that lies in proper face . Treating as a subset of the ambient Euclidean space yields the smaller dimensional reformulation of (P) :
[TABLE]
We can now repeat the process on this problem. Since the dimension of the problem decreases with each facial reduction iteration, the procedure will terminate after at most steps.
Definition 4.2.1** (Singularity degree).**
*The singularity degree of (P), denoted ,
is the minimal number of iterations that are necessary for the Facial Reduction to terminate, over all possible choices of certificates generated by the auxiliary systems in each iteration.*
The singularity degree of linear programs is at most one, as we will see shortly. More generally, such as for semi-definite programming problems, the singularity degree can be much higher.
The Facial Reduction procedure applies to the dual problem (D) by using the equivalent primal form (2.5) and using Theorem 3.1.6. We leave the details to the reader.
4.3 Facial reduction in semi-definite programming
Before discussing further properties of the Facial Reduction algorithm in conic optimization, let us illustrate the procedure in semi-definite programming. To this end, consider the primal problem (P) with . Suppose that we have available a vector feasible for the auxiliary system (3.2). Form now an eigenvalue decomposition
[TABLE]
where is an orthogonal matrix and is a diagonal matrix. Then as we have seen in Example 2.2.9, the matrix exposes the face of . Consequently, defining the linear map , the primal problem (P) is equivalent to the smaller dimensional SDP
[TABLE]
Thus one step of facial reduction is complete. Similarly let us look at the dual problem (D). Suppose that is feasible for the auxiliary system (3.4). Let us form an eigenvalue decomposition
[TABLE]
where is an orthogonal matrix and is a diagonal matrix. The face exposed by , namely , contains all the feasible slacks by Theorem 3.1.6. Thus defining the linear map , the dual (D) is equivalent to the problem
[TABLE]
Thus one step of Facial Reduction is complete and the process can continue.
To drive the point home, the following simple example shows that for SDP the singularity degree can indeed be strictly larger than one.
Example 4.3.1** (Singularity degree larger than one).**
Consider the primal SDP feasible region
[TABLE]
Notice the equality forces the second row and column of to be zero, i.e. they are redundant. Let us see how this will be discovered by Facial Reduction.
The linear map has the form
[TABLE]
for the matrices
[TABLE]
Notice that is nonempty since it contains the rank matrix . The auxiliary system (3.2) then reads
[TABLE]
Looking at the second principal minor, we see . Thus all feasible are positive multiples of the vector . One step of Facial Reduction using the exposing vector yields the equivalent reduced region
[TABLE]
This reduced problem clearly fails Slater’s condition, and Facial Reduction can continue. Thus the singularity degree of this problem is exactly two.
The pathological Example 4.3.1 can be generalized to higher dimensional space with , by nesting, leading to problems with singularity degree ; the construction is explained in Tunçel [136, page 43].
4.4 What facial reduction actually does
There is a direct and enlightening connection between Facial Reduction and the geometry of the image set . To elucidate this relationship, we first note the following equivalent characterization of Slater’s condition.
Proposition 4.4.1** (Range space characterization of Slater).**
The primal problem (P) is strictly feasible if, and only if, the vector lies in the relative interior of .
The following is the central result of this section.
Theorem 4.4.2** (Fundamental description of the minimal face).**
*Assume that the primal (P) is feasible. Then a vector exposes a proper face of containing if, and only if, satisfies the auxiliary system (3.2). Defining for notational convenience , the following are true.
(I) We always have:*
[TABLE]
(II) For any vector the following equivalence holds:
[TABLE]
In particular, the inequality holds if, and only if, is an exposed face of .
Some commentary is in order. First, as noted in Proposition 4.4.1, the primal (P) is strictly feasible if, and only if, the right-hand-side lies in . Thus when strict feasibility fails, the set is a proper face of the image . The theorem above yields the exact description of the object we are after. On the other hand, determining a facial description of is a difficult proposition. Indeed, even when is a simple cone, the image can be a highly nontrivial object. For instance, the image may fail to be facially exposed or even closed; examples are forthcoming.
Seeking to obtain a description of , one can instead try to find “certificates” exposing a proper face of containing . Such vectors are precisely those satisfying the auxiliary system (3.2). In particular, part II of Theorem 4.4.2 yields a direct obstruction to having low singularity degree: if, and only if, is an exposed face of . Thus the lack of facial exposedness of the image can become an obstruction. For the cone , the image is polyhedral and is therefore facially exposed. On the other hand, linear images of the cone can easily fail to be facially exposed. This in essence is the reason why preprocessing for general conic optimization is much more difficult than its linear programming counterpart (having singularity degree at most one).
The following two examples illustrate the possibly complex geometry of image sets .
Example 4.4.3** (Linear image not closed).**
Define the linear map by
[TABLE]
Then the image is not closed, since
[TABLE]
More broadly, it is easy to see the equality
[TABLE]
Example 4.4.4** (Linear image that is not facially exposed).**
Consider the feasible region in Example 4.3.1. There we showed that the singularity degree is equal to two. Consequently, by Theorem 4.4.2 we know that the minimal face of containing must be nonexposed.
Let us verify this directly. To this end, we can without loss of generality treat as mapping into via
[TABLE]
and identify with . Then the image is simply the sum,
[TABLE]
See Figure 4.1.
Consider the set
[TABLE]
We claim that is a face of and is therefore the minimal face containing . Indeed, suppose we may write
[TABLE]
for some matrices . Comparing the -entries, we deduce and consequently . Comparing the 1,2-entries yields . Thus both summands lie in ; therefore is a face of the image . Next, using Lemma 2.1.7, observe
[TABLE]
Consequently, any matrix exposing must lie in the set
[TABLE]
On the other hand, the set
[TABLE]
is clearly strictly larger than . Hence is not an exposed face.
4.5 Singularity degree and the Hölder error bound in SDP
For semi-definite programming, the singularity degree plays an especially important role, controlling the Hölderian stability of the feasible region. Consider two sets and in . A convenient way to understand the regularity of the intersection is to determine the extent to which the computable residuals, and , bound the error . Relationships of this sort are commonly called error bounds of the intersection and play an important role for convergence and stability of algorithms. Of particular importance are Hölderian error bounds – those asserting the inequalities
[TABLE]
on compact sets, for some powers . For semi-definite programming, the singularity degree precisely dictates the Hölder exponent .
Theorem 4.5.1** **(Hölderian
error bounds from the singularity degree).
Consider a feasible primal SDP problem (P) and define the affine space
[TABLE]
Set . Then for any compact set , there is a real so that
[TABLE]
What is remarkable about this result is that neither the dimension of the matrices , the number of inequalities , nor the rank of the matrices in the region determines the error bound. Instead, it is only the single quantity, the singularity degree, that drives this regularity concept.
Example 4.5.2** (Worst-case example).**
Consider the SDP feasible region
[TABLE]
For any feasible , the constraint forces . By an inductive argument, then we deduce and for all . Thus the feasible region coincides with the ray .
Given , define the matrix
[TABLE]
Observe that violates the linear constraints only in the requirement . Consequently, the distance of to the linear space is on the order of . On the other hand, the distance of to the solution set is at least on the order of . This example shows that the Hölder exponent in this case is at least . Combined with Theorem 4.5.1 and the fact that the feasible region contains rank one matrices, we deduce and the Hölder exponent guaranteed by the theorem is sharp.
4.6 Towards computation
The Facial Reduction procedure is conceptual. To implement it, since the error compounds along the iterations, one must be able to either solve the auxiliary systems (3.2) (resp. (3.4)) to machine precision in each iteration or certify that the systems are inconsistent. On the other hand, in general, there is no reason to believe that solving a single auxiliary system is any easier than solving the original problem (P).
One computational approach for facial reduction in SDP, explored by Permenter-Parrilo [110], is to relax the auxiliary problems to ones that are solvable. Instead of considering (3.2), one can choose a convex cone so that consistency of the system can be checked:
[TABLE]
If a vector satisfying the system is found, then one can perform one step of facial reduction. If not, the scheme quits, possibly without having successfully deduced that Slater’s condition holds. Simple examples of are the sets and the cone dual to , where PSD denote positive semi-definite.
The above feasibility problem is then an instance of linear programming in the first case and of second-order cone programming in the second. More details are provided in [110]. Readers may be skeptical of this strategy since this technique will work only for special types of degeneracy. For example, the first relaxation can only detect that some diagonal elements of are identically zero on the feasible region . On the other hand, it does appear that degeneracy typically arising in applications is highly structured, and promising numerical results has been reported in [110].
There exist other influential techniques for regularizing conic optimization problems that are different from the facial reduction procedure. Two notable examples are Ramana’s extended dual [114] and the homogeneous self-dual embedding e.g., [39, 109]. The latter, in particular, is used by MOSEK [8] and SeDuMi [128]. A dual approach, called the conic expansion approach is discussed at length in [142], see also [88, 104, 90, 127]
We do not discuss these techniques here. Instead, we focus on the most promising class of conic optimization problems – those having singularity degree at most one. In the rest of the manuscript, we provide a series of influential examples, where the structure of the problem enables one to obtain feasible points of the auxiliary systems without having to invoke any solvers. Numerical illustrations show that the resulting reduced subproblems are often much smaller and more stable than the original.
4.7 Commentary
Preprocessing is essential in making LP algorithms efficient. A main ingredient is identifying primal and/or dual slack variables that are identically zero on the feasible set. This is equivalent to facial reduction that reduces the problem to faces of the nonnegative orthant, e.g., [67] The facial reduction procedure for general conic optimization started in [23, 22, 24]. The procedure provides a primal-dual pair of conic optimization problems that are proper in the sense that the dual of the dual yields the primal. Example 4.3.1 and extensions can be found in Tunçel [136, page 43]. The notation of singularity degree and its connection to error bounds (Theorem 4.5.1) was discovered by Sturm [129, Sect. 4]; Example 4.5.2 appears in this paper as well. Example 4.4.4 is motivated by Example 1 in [106]. Theorem 4.4.2 appeared in [47]. As mentioned previously, there are many approaches to “regularization” of conic optimization problem, aside from facial reduction, including the self-dual embedding and the approximation approaches in [110, 109, 112]. An alternate view of obtaining a dual without a constraint qualification was given in [114, 115], though a relationship to facial reduction was later explained in [116].
Part II Applications and illustrations
In this chapter, we discuss a number of diverse and important computational problems. This includes various matrix completion problems and discrete optimization problems such as the quadratic assignment problem, graph partitioning, and the strengthened relaxation of the maximum cut problem. In the final section, we also discuss sum of squares relaxations for polynomial optimization problems. In each case, we use the structure of the problem to determine a face of the positive semi-definite cone containing the entire feasible region. One exception is the matrix completion problem, where we instead determine a face containing the optimal face, as opposed to the entire feasible region. Numerical illustrations illustrate the efficacy of the approach.
Chapter 5 Matrix completions
We begin with various matrix completion problems. Broadly speaking, the goal is to complete a partially specified matrix, while taking into account a priori known structural properties such as a given rank or sparsity. There is a great variety of references on matrix completion problems; see for example [95, 84, 81, 68].
5.1 Positive semi-definite matrix completion
We begin with a classical problem of completing a PSD matrix from partially observed entries. To model this problem, consider an undirected graph with a vertex set and graph edge set . The symbols and always refer to the same edge. Notice we allow self-loops . For simplicity, we in fact assume that contains for each node . Elements of are called partial matrices, as they specify entries of a partially observed symmetric matrix. Given a partial matrix , the PSD* completion problem* asks to determine, if possible, a matrix in the set
[TABLE]
That is, we seek to complete the partial matrix to an positive semi-definite matrix. When do such PSD completions exist, that is, when is nonempty? Clearly, a necessary condition is that is a partial PSD matrix, meaning that its restriction to any specified principal submatrix is PSD. This condition, however, is not always sufficient.
A graph is called PSD* completable* if every partial PSD matrix is completable to a PSD matrix. It turns out that PSD completable graphs are precisely those that are chordal. Recall that a graph is called chordal if any cycle of four or more nodes has a chord – an edge joining any two nodes that are not adjacent in the cycle.
Theorem 5.1.1** (PSD completable and chordal graphs).**
The graph is PSD completable if, and only if, is chordal.111If all the self-loops are not included in , one needs to add that the two subgraphs with self-loops and without are disconnected from each other, see e.g., [47].
Chordal graphs, or equivalently those that are PSD completable, play a fundamental role for the PSD completion problem. For example, on such graphs, the completion problem admits an efficient combinatorial algorithm [61, 83, 124].
Next, we turn to Slater’s condition. Consider the completion problem
[TABLE]
The question marks ? denote the unknown entries. The underlying graph on four nodes is a path and is therefore chordal. The known entries make up a partial PSD matrix since the three specified principal minors are PSD. Thus by Theorem 5.1.1, the completion problem is solvable. Does Slater’s condition hold? The answer is no. The first leading principal minor is singular, and therefore any PSD completion must be singular.
By the same logic, any singular specified principal minor of a partial matrix certifies that strict feasibility fails. Much more is true, however. We now show how any singular specified principal minor of a partial matrix yields a face of containing the entire feasible region, allowing one to reduce the dimension of the problem.
To see how this can be done, let us introduce some notation. Define the coordinate projection map by setting
[TABLE]
In this notation, we can write
[TABLE]
We will now see how the geometry of the image set , along with Theorem 4.4.2, helps us discover a face of containing the feasible region. We note in passing that the image is always closed.222If some elements do not lie in , contrary to our simplifying assumption, then the image can fail to be closed. A precise characterization is given in [47].
Proposition 5.1.2** (Closure of the image).**
The image is closed.
The reader can check that the adjoint simply pads partial matrices in with zeros:
[TABLE]
For any subset of vertices , we let
be the edge set induced by on and we set to be the restriction of to . Define the relaxed region
[TABLE]
Clearly means we have fewer constraints and
[TABLE]
A subset is called a clique if for any two nodes the edge lies in . Specified principal minors of correspond precisely to cliques in the graph . We can moreover clearly identify with the matrix space . Suppose now that has rank , i.e., the principal submatrix of indexed by is singular. Then the right-hand-side of (5.3) lies in the boundary of the image set . Let be an exposing vector of . Then by Theorem 4.4.2, we can be sure that exposes the minimal face of containing the entire region . Given a collection of cliques , we can perform the same procedure and deduce that the entire feasible region lies in the face
[TABLE]
which by Proposition 2.2.5 admits the equivalent description
[TABLE]
The following example will clarify the strategy.
Example 5.1.3** (Reducing the PSD completion problem).**
Let consist of all matrices solving the PSD completion problem (5.1). There are three nontrivial cliques in the graph, all of size . The minimal face of containing the matrix
[TABLE]
is exposed by
[TABLE]
Moreover, the matrix is definite and hence the minimal face of containing this matrix is exposed by the all-zero matrix.
The intersection of exposed faces is exposed by the sum of their exposing vectors. We deduce that is contained in the face of exposed by the sum
[TABLE]
After finding the nullspace of this matrix, we deduce
[TABLE]
The following lemma is another nice consequence of the procedure described in the above example.
Lemma 5.1.4** (Completion of banded all ones matrices).**
The matrix of all ones is the unique positive semi-definite matrix satisfying for all indices with .
Proof 5.1.5**.**
Consider the edge set and let be a partial matrix of all ones. Observe has specified -principal submatrices, each having rank 1. By the same logic as in Example 5.1.3, it follows that the feasible region is zero-dimensional, as claimed.
The strategy outlined above suggests an algorithm for finding the minimal face based on exploiting cliques in the graph. This strategy is well-founded at least for chordal graphs.
Theorem 5.1.6** (Finding the minimal face on chordal graphs).**
*Suppose that is chordal and consider a partial PSD matrix . Then the equality *
[TABLE]
*where denotes the set of all maximal cliques in . *
On the other hand, it is important to realize that when the graph is not chordal, the minimal face is not always guaranteed to be found from cliques alone. The following example shows a PSD completion problem that fails Slater’s condition but where all the faces arising from cliques are trivial.
Example 5.1.7** (Slater condition & nonchordal graphs, [47]).**
Let be the graph with and . Define the corresponding PSD completion problems , parametrized by :
[TABLE]
Let denote the corresponding partial matrices. From Lemma 5.1.4, the PSD completion problem is infeasible, that is lies outside of . On the other hand, for all sufficiently large , the partial matrices lie in by diagonal dominance. Since is closed by Proposition 5.1.2, we deduce that there exists , so that lies on the boundary of , that is Slater’s condition fails for the completion problem . In fact, it can be shown that the smallest such is with completion values of [math] in the positions in . On the other hand, all specified principal matrices of for are clearly positive definite, and therefore all the corresponding faces are trivial. Thus we have found a partial matrix that has a singular completion but the minimal face cannot be found from an intersection using the cliques of the graph.
Given the importance of singularity degree, the following question arises naturally. Which graphs have the property that the cone is facially exposed? Equivalently, on which graphs does every feasible PSD completion problem have singularity degree at most one? Let us make the following definition. The singularity degree of a graph is the maximal singularity degree among all completion problems with PSD completable partial matrices .
Chordal graphs have singularity degree one [47], and surprisingly these are the only graphs with this property [132].
Corollary 5.1.8** (Singularity degree of chordal completions).**
The graph has singularity degree one if, and only if, is chordal.
5.2 Euclidean distance matrix completion, EDMC
In this section, we discuss a problem that is closely related to the PSD completion problem of the previous section, namely the Euclidean distance matrix completion, EDMC, problem. As we will see, the EDMC problem inherently fails Slater’s condition, and facial reduction once again becomes applicable by analyzing certain cliques in a graph.
Setting the stage, fix an undirected graph on a vertex set with an edge set . Given a partial matrix , the Euclidean distance matrix completion problem asks to determine if possible an integer and a collection of points satisfying
[TABLE]
See figure 5.1 for an illustration.
We now see how this problem can be modeled as an SDP. To this end, let us introduce the following notation. A matrix is called a Euclidean distance matrix, EDM, if there exists an integer
and points satisfying
[TABLE]
Such points are said to realize in . The smallest integer such that there exist points in realizing is called the embedding dimension of , and is denoted by .
We let denote the set of all EDM matrices. In this language the EDM completion problem reads: given a partial matrix determine a matrix in the set
[TABLE]
Thus the EDM completion problem is a conic feasibility problem. Since we are interested in facial reduction, the facial structure of is central.
Notice that has empty interior since it is contained in the space of hollow matrices
[TABLE]
A fundamental fact often used in the literature is that is linearly isomorphic to . More precisely, consider the mapping
[TABLE]
defined by
[TABLE]
Clearly maps into the space of hollow matrices . One can quickly verify that the adjoint is given by
[TABLE]
Moreover, the range of the adjoint is the space of centered matrices
[TABLE]
The following result is fundamental.
Theorem 5.2.1** (Parametrization of the EDM cone).**
The map is a linear isomorphism carrying onto . The inverse is the map , where is the orthogonal projection onto .333In fact, we can consider the map as . Then we still have and the Moore-Penrose pseudoinverse , where zeros out the diagonal.
In particular, the cone is linearly isomorphic to the cone . On the other hand, observe for any matrix the equivalence
[TABLE]
Thus is linearly isomorphic to the face
[TABLE]
as claimed. More explicitly, forming an orthogonal matrix yields the equality .
Thus the EDM completion problem amounts to finding a matrix in the set
[TABLE]
To see how to recover the realizing points of , consider a matrix and form a factorization for some matrix with . Let be the rows of . Then lying in implies , that is, the points are centered around the origin, while the constraint implies
[TABLE]
for all . Hence the points solve the EDM completion problem.
Let us turn now to understanding (strict) feasibility of . A vector is called a partial EDM * if the restriction of to every specified principal submatrix is an EDM. The graph is EDM completable* if every partial EDM is EDM* completable*. The following result, proved in [15], is a direct analogue of Theorem 5.1.1 for the PSD completion problem.
Theorem 5.2.2** (EDM completability & chordal graphs).**
The graph is EDM completable if, and only if, is chordal.
We also mention in passing the following observation from [47].
Theorem 5.2.3** (Closedness of the projected EDM cone).**
The projected image is always closed. ∎
Given a clique in , we let denote the set of Euclidean distance matrices indexed by . In what follows, given a partial matrix , the restriction can then be thought of either as a vector in or as a hollow matrix in . We also use the symbol to indicate the mapping acting on . The following recipe provides a simple way to discover faces of the PSD cone containing the feasible region from specified cliques in the graph.
Theorem 5.2.4** (Clique facial reduction for EDM completions).**
Let be any -clique in the graph . Let be a partial Euclidean distance matrix and define the relaxation
[TABLE]
Then for any matrix exposing \operatorname{{face}}\big{(}\operatorname{{\mathcal{K}}}^{{\dagger}}_{\alpha}(d[\alpha]),{\bf S}^{\alpha}_{+}\cap{\bf S}^{\alpha}_{c}\big{)}, the matrix
[TABLE]
In other words, the recipe is as follows. Given a clique in , consider the matrix . Let be an exposing vector of \operatorname{{face}}\big{(}\operatorname{{\mathcal{K}}}^{{\dagger}}_{\alpha}(d[\alpha]),{\bf S}^{\alpha}_{+}\cap{\bf S}^{n}_{c}\big{)}. Then is an extension of to obtained by padding with zeroes. The above theorem guarantees that the entire feasible region of (5.5) is contained in the face of exposed by .
In direct analogy with Theorem 5.1.6 for PSD completions, the minimal face is sure to be discovered in this way for chordal graphs.
Theorem 5.2.5** (Clique facial reduction for EDM is sufficient).**
Suppose that is chordal, and consider a partial Euclidean distance matrix and the region
[TABLE]
Let denote the set of all maximal cliques in , and for each define
[TABLE]
Then the equality
[TABLE]
Corollary 5.2.6** (Singularity degree of chordal completions).**
If the graph is chordal, then the EDM completion problem has singularity degree at most one, when feasible.
Finally, in analogy with Corollary 5.2.7, the following is true. Define the singularity degree of a graph to be the maximal singularity degree among all EDM completion problems with EDM completable partial matrices .
Corollary 5.2.7** (Singularity degree of chordal completions).**
The graph has singularity degree one if, and only if, is chordal.
5.2.1 EDM and SNL with exact data
The material above explains in part the surprising success of the algorithm in [76] for the sensor network localization problem, SNL.
The SNL problem differs from the EDM completion problem only in that some of the points or sensors that define the problem are in fact anchors and their positions are known. The algorithm proceeds by iteratively finding faces of the PSD cone from cliques and intersecting them two at a time, thereby decreasing the dimension of the problem in each step. In practice, this procedure often terminates with a unique solution of the problem. We should mention that the anchors are a red herring. Indeed, they should only be treated differently than the other sensors after all the sensors have been localized. In the post-processing step, a so-called Procrustes problem is solved to bring the putative anchors as close as possible to their original (known) positions and thus rotating the sensor positions appropriately. Another important point in applications is that the distances for sensors that are close enough to each other are often known. This suggests that there are often many local cliques in the graph. This means that the resulting SDP relaxation is highly degenerate but this degeneracy can be exploited as we have seen above.
Some numerical results from the year 2010 in [76] appear in Table 5.1.
These results are on random, noiseless problems using a 2.16 GHz Intel Core 2 Duo, 2 GB of RAM. The embedding dimension is and the sensors are in a square region with anchors. We use the Root Mean Square Deviation to measure the quality of the solution:
[TABLE]
The huge expected number of constraints and variables in the four problems in Table 5.1 are
[TABLE]
respectively444The 2016 tarfile with MATLAB codes is available:
.
5.2.2 Extensions to noisy EDM and SNL problems
When there is noise in the distance measurements – the much more realistic setting – the approach requires an intriguing modification. Let us see what goes wrong, in the standard approach. Given a clique , let us form as in Theorem 5.2.4. The difficulty is that this matrix is no longer PSD. On the other hand, it is simple enough to find the nearest matrix of to . Let then be a vector exposing . Letting be the collection of cliques under consideration, we thus obtain faces exposed by for . In the noiseless regime, the entire feasible region is contained in the intersection . In the noisy regime, this intersection likely consists only of the origin for the simple reason that randomly perturbed faces typically intersect only at the origin. Here is an elementary fix that makes the algorithm robust to noise. Form the sum
[TABLE]
Again in the noiseless regime, Proposition 2.2.5 implies that exposes precisely the intersection . When noise is present, the matrix will likely have only one zero eigenvalue corresponding to the vector of all ones and the rest of the eigenvalues will be strictly positive. Suppose we know that the realization of the graph should lie in -dimensional space. Then we can find a rank best PSD approximation of and use it to expose a face of the PSD cone. Under appropriate conditions, this procedure is indeed provably robust to noise and extremely effective in practice. A detailed description of such a scheme is presented in [46].
5.3 Low-rank matrix completions
In this section, we consider another example inspired by facial reduction. We will be considering matrices ; for convenience, we will index the rows of by and the columns using . Consider two vertex sets and and a bipartite graph .
Given a partial matrix , the low-rank matrix completion problem, LRMC,
aims to find a rank matrix from the partially observed elements . A common approach (with statistical guarantees) is to instead solve the convex problem:
[TABLE]
where is the nuclear norm – the sum of the singular values of . Throughout the section, we will make the following assumption: the solution of the convex problem (5.6) coincides with the rank matrix that we seek. There are standard statistical assumptions that one makes in order to guarantee this to be the case [50, 117, 27].
It is known that this problem (5.6)
can be solved efficiently using SDP. At first glance it appears that this does not fit into our framework for problems where strict feasibility fails; indeed strict feasibility holds under the appropriate reformulation below. We will see, however, that one can exploit the special structure at the optimum and discover a face of the PSD cone containing an optimal solution, thereby decreasing the dimension of the problem. Even though, this is not facial reduction exactly, the ideas behind facial reduction play the main role.
Let us first show that the problem (5.6) can be written equivalently as the SDP:
[TABLE]
To see this, we recall a classical fact that the operator norm of the matrix is dual to the nuclear norm , that is
[TABLE]
Note the equivalence
[TABLE]
Thus we may represent the nuclear norm through an SDP:
[TABLE]
The dual of this SDP is
[TABLE]
Thus the problems (5.6) and (5.7) are indeed equivalent. Let us moreover make the following important observation. Suppose that is optimal for (5.6). Let be a compact SVD of and set and . Then the triple is feasible for (5.7) since
[TABLE]
Moreover and . Thus is optimal for (5.7).
Let us see now how we can exploit the structure and target rank of the problem to find an exposing vector of a face containing an optimal solution of the SDP. Fix two numbers and let be any complete bipartite subgraph of . Let also be the restriction of to . Thus corresponds to a fully specified submatrix.
For almost any555This is in the sense of Lebesgue measure on the factors , satisfying . rank underlying matrix , it will be the case that .
Without loss of generality, after row and column permutations if needed, we can assume that encodes the bottom left corner of :
[TABLE]
that is . Form now the factorization obtained using the compact SVD. Both have rank .
Let be a compact SVD of and define
[TABLE]
As we saw previously, is optimal for the SDP (5.7). Subdividing and into two blocks each, we deduce
[TABLE]
Therefore, we conclude that . Taking into account that has rank and the matrices have exactly columns we deduce
[TABLE]
We can now use the exposing vector form of FR formed from and/or . Using the calculated , let and satisfy and . Define then the PSD matrix
[TABLE]
By construction . Hence exposes a face of the PSD cone containing the optimal .
Performing this procedure for many specified submatrices , can yield a dramatic decrease in the dimension of the final SDP that needs to be solved. When noise is present, the strategy can be made robust in exactly the same way as for the EDM problem in Section 5.2.2.
We include one of the tables of numerics from [66] in Table 5.2, page 5.2. Results are for the average of five instances. We have recovered the correct rank each time without calling an SDP solver at all. Note that the largest matrices recovered have elements.
5.4 Commentary
The work using chordal graphs for PSD completions was done in [61] and extended the special case of banded structure in [48]. Surveys for matrix completion are given in e.g., [70, 68, 4, 63, 32, 33, 71, 69]. A survey specifically related to chordality is given in [139]. More details on early algorithms for PSD completion are in e.g., [72, 4].
The origin of distance geometry problems can be traced back to the work of Grassmann in 1896 [60]. More recent work appeared in e.g., [59, 58, 34, 64, 45]. Many of these papers emphasized the relationships with molecular conformation. Chordality and relations with positive definiteness are studied in [73] and more recently in [80, 82]. The book [16] has a chapter on matrix completions with the connections to EDM completions, see also [134] for the relations with faces. An excellent online reference for EDM is the book by Dattorro [37]. In addition, there are many excellent survey articles, e.g., [77, 43, 87, 2, 97]. The survey [86] contains many open problems in EDMC and references for application areas.
Early work using SDP interior point algorithms for EDM completion problems is given in [3]. Exploiting the clique structure for SNL type problems is done in e.g., [42, 75, 76]. The improved robust algorithm based on averaging approximate exposing vectors was developed in [46], while a parallel viewpoint based on rigidity theory was developed in [126]. In fact, a parallel view on facial reduction is based on rigidity theory, e.g., [55, 31, 1, 56]. The facial structure for the EDM cone is studied in e.g., [5, 133]. Applications of the technique to molecular conformation are in [7].
The LRMC problem has parallels in the compressed sensing framework that is currently of great interest. The renewed interest followed the work in [51, 50, 27, 117] that used the nuclear norm as a convex relaxation of the rank function. Exploiting the structure of the optimal face using FR is introduced recently in [66]. An alternative approach, which applies much more broadly, is described in [112].
Chapter 6 Hard combinatorial problems
6.1 Quadratic assignment problem, QAP
The quadratic assignment problem, QAP, is arguably the hardest of the
so-called NP-hard combinatorial optimization problems. The problem can best be described in terms of facility location. We have given facilities that need to be located among specified locations. As input data, we have information on the distances between pairs of locations and the flow values (weights) between pairs of facilities . The (quadratic) cost of a possible location is the flow between each pair of facilities multiplied by the distance between their assigned locations. Surprisingly, problems of size are still considered hard to solve. As well, we can have a (linear) cost of locating facility in location . The unknown variable that decides which facility goes into which location is an permutation matrix with
[TABLE]
This problem has the elegant trace formulation
[TABLE]
Notice that the objective is a quadratic function, and typically the quadratic form, , is indefinite.111One can perturb the objective function by exploiting the structure of the permutation matrices and obtain positive definiteness of the quadratic form. However, this can result in deterioration of the bounds from any relaxations.
Notice also that the feasible region consists of permutation matrices, a discrete set. There is a standard strategy for forming a semi-definite programming relaxation for such a problem. Consider the vectorization and define the lifting to the rank one block matrix
[TABLE]
where the matrix consists of blocks beginning in row and column . The idea is then to reformulate the objective and a relaxation of the feasible region linearly in terms of , and then simply insist that is PSD, though not necessarily rank one. In particular, the objective function can easily be rewritten as a linear function of , namely , where
[TABLE]
and we denote the Kronecker product, .
Next we turn to the constraints. We seek to replace the set of permutation matrices by more favorable constraints that permutation matrices satisfy. For example, observe that the permutation matrices are doubly stochastic and hence the row sums and column sums are one, yielding the following linear assignment constraints
[TABLE]
There are of course many more possible constraints one can utilize; the greater their number, even if redundant, the tighter the SDP relaxation in general. Some prominent ones, including the ones above, are
[TABLE]
where denotes the Hadamard (elementwise) product. Note that including both equivalent orthogonality constraints is not redundant in the relaxations.
Let us see how to reformulate the constraints linearly in . We first consider the linear row sum constraints in (6.2a). To this end, observe
[TABLE]
We obtain
[TABLE]
Defining now the matrix
[TABLE]
we obtain the equivalent linear homogeneous constraint . Similarly the linear column sum constraints amount to the equality , where
[TABLE]
Thus the feasible region of the SDP relaxation lies in the face of exposed by . Henceforth, let be full column rank and satisfying .
The other three constraints in (6.2) can be rephrased linearly in as well: (6.2b) results in the so-called arrow constraint (the first row (column) and the diagonal of are equal); the constraints (6.2c) yield the block-diagonal constraint (diagonal blocks sum to the identity matrix) and the off-diagonal contraint (the traces of the off-diagonal blocks are zero); and the Hadamard orthogonality constraints (6.2d) are called the gangster constraints and guarantee that the diagonal blocks are diagonal matrices and the diagonals of the off-diagonal blocks are zero. We omit further details222See more details in [149]. but denote the resulting constraints with the additional in the form . We note that the transformation without the Hadamard orthogonality constraints is onto while is not. We numerically test both settings with and without the gangster constraints and together with and without facial reduction below in this section.
Now the standard relaxation of the problem is obtained by letting be a positive semi-definite matrix with no constraint on its rank:
[TABLE]
All in all, the number of linear constraints is
[TABLE]
i.e.,
[TABLE]
As discussed above, the matrix certifies that this relaxation fails strict feasibility. Indeed the entire feasible region lies in the face of exposed by . Surprisingly, after restricting to the face exposed by , the constraints simplify dramatically. The resulting equivalent formulation becomes
[TABLE]
where is the first unit vector, as we start indexing at [math], and
[TABLE]
and is an appropriately defined index set; see [149]. Roughly speaking, this index set guarantees that the diagonal blocks of are diagonal matrices and the diagonal elements of the off-diagonal blocks of are all zero. In particular, one can show that the resulting linear constraint is surjective.
In fact, the gangster operator and gangster constraints guarantee that most of the Hadamard product constraints in (6.2d) hold. And the constraints corresponding to the linear constraints in (6.2a), the arrow constraint in (6.2b), the block-diagonal and off-diagonal constraints in (6.2c) and some of the gangster constraints in (6.2d) have all become redundant, thereby illuminating the strength of the facial reduction together with the gangster constraints (6.2d).
Moreover, we can rewrite the linear constraints in (6.4) as
[TABLE]
We see that these low rank333These constraints are rank two. Low rank constraints can be exploited in several of the current software packages for SDP. constraints are linearly independent and the number of constraints has been reduced from to
[TABLE]
i.e., the number of constraints is still but has decreased by .
Finally, we should mention that the singularity degree of the SDP relaxation of the QAP is . The problem (6.4) has a strictly feasible point . Moreover, one can show that the dual of (6.4) also has a strictly feasible point, see [149, 100].
Let us illustrate empirically the improvement in accuracy and cputime for the facially reduced SDP relaxation of the QAP. We use the model in (6.3) and compare it to the simplified facially reduced model in (6.4). See Figure 6.1, page 6.1, and Figure 6.2, page 6.2. The improvement in accuracy and cputime is evident.
6.2 Second lift of Max-Cut
Recall that for a given weighted undirected graph , the maximum cut problem is to determine a vertex set such that the total weight of the edges between and its complement is as large as possible. Thus enumerating the vertices , we are interested in the problem
[TABLE]
Here, we have for and for . Notice the constraints can equivalently be written with the quadratic constraint
[TABLE]
Relaxing to a positive semi-definite matrix , we arrive at the celebrated SDP relaxation of Max-Cut:
[TABLE]
Here denotes the weighted Laplacian matrix of the graph, which will not play a role in our discussion. This SDP is clearly strictly feasible.
Another idea now to improve the accuracy of the relaxation is to “extend the lifting”. Namely, with the goal of tightening the approximation to the original Max-Cut problem, we can certainly add the following quadratic constraints to the SDP relaxation:
[TABLE]
Let us see how to form a relaxation with these nonlinear constraints.
For , let denote the vector formed from the upper triangular part of taken columnwise with the strict upper triangular part multiplied by . By abuse of notation, we let and define the matrix . We can now form a new SDP relaxation by insisting that is PSD (though not rank one) and rewriting the constraints linearly in . The nonlinear constraints (6.7) can indeed be written linearly in ; we omit the details. On the other hand, note that the -th constraint in the original SDP relaxation (6.6) is equivalent to
[TABLE]
Exactly, the same way as in Section 6.1, we can define the matrix
[TABLE]
which certifies that strict feasibility fails and that the entire feasible region lies in the face of exposed by . It turns out that this second lift of max-cut, in practice, provides much tighter bounds than the original SDP relaxation (6.6), and the elementary facial reduction step using serves to stabilize the problem.
6.3 General semi-definite lifts of combinatorial problems
Let us next look at a general recipe often used for obtaining SDP relaxations of NP-hard problems; elements of this technique were already used in the previous sections. Consider a nonconvex feasible region of the form
[TABLE]
where is a linear transformation. An SDP relaxation of this region is the set
[TABLE]
Indeed, is the image of a linear projection of the intersection of with rank one matrices. For this reason is often called an SDP* lift* of .
In applications, such as the ones in the previous sections, the affine hull of may not be full dimensional. For example, the affine hull of the set of permutation matrices (used for QAP) has empty interior. To this end, suppose that the affine hull of is given by , where is a linear transformation and a vector. Define the matrix . Then clearly there is no harm in including the redundant constraint
[TABLE]
in the very definition of . Notice then is clearly contained in the face of exposed by . Indeed, this is the minimal face of containing . To see this, suppose that the affine span of has dimension , and consider any affinely independent vectors . Then the vectors are linearly independent, and therefore the barycenter
[TABLE]
is a rank matrix lying in . On the other hand, it is immediate that the face of exposed by also has dimension . The claimed minimality follows. It is curious to note that if the constraint (6.8) were not explicitly included in the definition , then the SDP lift could nevertheless be strictly feasible, and hence unnecessarily large.
Example 6.3.1** (Strictly feasible SDP lifts).**
Consider the region:
[TABLE]
There are only four feasible points, namely , and they affinely span the two dimensional subspace perpendicular to the vector . If this constraint is not included explicitly, then the SDP lift is given by
[TABLE]
In particular, the identity matrix is feasible.
6.4 Elimination method for sparse SOS polynomials
Checking whether a polynomial is always nonnegative is a ubiquitous task in computational mathematics. This problem is NP-hard, as it for example encompasses a great variety of hard combinatorial problems. Instead a common approach utilizes sum of squares formulations. Indeed, checking whether a polynomial is a sum of squares of polynomials can be modeled as an SDP . A certain hierarchy of sum of squares problems [78, 103] can then be used to determine the nonnegativity of the original polynomial. The size of the SDP arising from a sum of squares problem depends on the number of monomials that must be used in the formulation. In this section, we show how facial reduction iterations on the cone of sums of squares polynomials can be used to eliminate monomials yielding a smaller and better conditioned equivalent SDP formulation. A rigorous explanation of the material in this section requires some heavier notation; therefore we only outline the techniques.
Let denote the vector space of polynomials in variable with real coefficients of degree at most . We will write a polynomial using multi-index notation
[TABLE]
where is some subset of , we set , and are some real coefficients. We will think of as a Euclidean space with the inner product being the usual dot product between coefficient vectors. Let be the set of polynomials that are sums of squares, meaning that can be written as for some polynomials . Clearly is a closed convex cone, often called the SOS cone.
A fundamental fact is that membership in the SOS cone can be checked by solving an SDP .
Theorem 6.4.1**.**
Fix a set of monomials . Then a polynomial is a sum of squares of polynomials over the monomial set if and only if there exists a matrix so that , where is a vector of monomials in .
Proof 6.4.2**.**
If is a sum of squares , then we can form a matrix whose rows are the coefficient vectors of . Then is the PSD matrix we seek. Conversely, given a PSD matrix satisfying , we can form a factorization , and read off the coefficients of each polynomial from the rows of .
Notice that the relation can be easily rewritten as a linear relation on by matching coefficient of the left and right-hand-sides. The size of is completely dictated by the number of monomials.
More generally, instead of certifying whether a polynomial is SOS, we mught be interested in minimizing a linear functional over an affine slice of the SOS cone. More precisely, consider a problem of the form:
[TABLE]
where is the decision variable, are specified polynomials and is a fixed vector. Clearly this problem can be converted to an SDP. The size of the decision matrix is determined by the number of monomials. Parsing algorithms attempt to choose a (small) set of monomials so that every feasible for (6.9) can be written as a sum of squares over the monomial set , thereby decreasing the size of the SDP. Not surprisingly, some parsing strategies can be interpreted as facial reduction iterations on (6.9).
We next outline such a strategy closely following [111]. To this end, we must first explain which faces of the SOS cone correspond to eliminating monomials. Indeed, there are faces of that do not have such a description.
To answer this question, we will need extra notation. Henceforth, fix a set of monomials and set to be the maximal degree of monomials in . Let be the set of all polynomials that can be written as sums of squares over the monomial set . Finally, set to be the set of points in that are not midpoints of any points in , namely
[TABLE]
Let us now look two types of faces that arise from elimination of monomials.
Theorem 6.4.3** (Type I face).**
If equality,
[TABLE]
holds, then is a face of .
In other words, if the convex hull contains no grid points other than those already in , then is a face of .
Theorem 6.4.4** (Type II face).**
If is a face of , then is a face of for any .
Thus given a face , we can recursively make the face smaller by deleting any .
Let us now turn to facial reduction. In the first step of facial reduction for (6.9), we must find an exposing vector that is orthogonal to all the affine constraints. Doing so in full generality is a difficult proposition. Instead, let us try to replace by a polyhedral inner approximation. Then the search for is a linear program.
Theorem 6.4.5**.**
The polyhedral set
[TABLE]
satisfies
Thus if we can find that is orthogonal to the affine constraints (6.9), then we can use to expose a face of containing the feasible region. Remarkably, this face can indeed be represented by eliminating monomials from .
Theorem 6.4.6**.**
Consider a vector
[TABLE]
for some and nonnegative numbers . Define the monomial set . Then the face coincides with .
Thus we can inductively use this procedure to eliminate monomials. At the end, one would hope that we would be left with a small dimensional SDP to solve. Promising numerical results and further explanations of methods of this type can be found in [110, 111, 74, 141, 140].
6.5 Commentary
Quadratic assignment problem, QAP
Many survey articles and books have appeared on the QAP, e.g., [101, 102, 28, 89]. More recent work on implementation of SDP relaxations include [41, 100, 150]. That the quadratic assignment problem is NP-hard is shown in [125]. The elegant trace formulation we used was introduced in [49].
The classic Nugent test set for QAP is given in [99].444It is maintained within QAPLIB [26] currently online. These problems have proven to be extremely hard to solve to optimality, see e.g., [28]. The difficulty of these problems is illustrated in the fact that many of them were not solved for odd years, see e.g., [13].
The semi-definite relaxation described here was introduced in [149]. It was derived by using the Lagrangian relaxation after modelling the permutation matrix constraint by various quadratic constraints. The semi-definite relaxation is then the dual of the Lagrangian relaxation, i.e., the dual of the dual. Application of FR then results in the surprisingly simplified gangster operator formulation.
This gangster formulation along with symmetry for certain QAP models is exploited in [41, 40] to significantly increase the size of QAP problems that can be solved. Other relaxations of QAP based on e.g., eigenvalue bounds are studied in e.g., [52, 53, 14].
Graph partitioning, GP
The graph partitioning, GP, problem is very similar to the QAP in that it involves a trace quadratic objective with a matrix variable , i.e. the matrix whose columns are the incidence vectors of the sets for the partition. A similar successful SDP relaxation can be found [147]. More recently, successful bounding results have been found in [113, 130, 38].
Second lift of Max-Cut
The second lifting from Section 6.2 is derived in [11, 12, 10] but in a different way, i.e. using the nullspace of the barycenter approach. The bounds found were extremely successful and, in fact, found the optimal solution of the MC in almost all but very special cases. The algorithm used for the SDP relaxation was the spectral bundle approach [65] and only problems of limited size could be solved. More recently an ADMM approach was much more successful in solving larger problems in [131].
Lifts of combinatorial problems
The SDP lifting of combinatorial regions described in Section 6.3 is standard; see for [137] for many examples and references. The material on the minimal face of the SDP lift follows [135], though our explanation here is stated in dual terms, i.e. using exposing vectors.
Monomial elimination from SOS problems
The topic of eliminating monomials from sum of squares problems has a rich history. The section in the current text follows entirely the exposition in [111]. The technique of solving linear programs in order to approximate exposing vectors was extensively studied in [110]. Important earlier references on monomial elimination include [74, 141, 140]. For an exposition of how to use SOS hierarchies to solve polynomial optimization problems see the monograph [79].
Acknowledgements.
We would like to thank Jiyoung (Haesol) Im for her helpful comments and help with proofreading the manuscript. Research of the first author was partially supported by the AFOSR YIP award FA9550-15-1-0237. Research of the second author was supported by The Natural Sciences and Engineering Research Council of Canada.
Index
- §6.1
- §6.1
- , nuclear norm §5.3
- adjoint §2.1
- adjoint mapping §2.1
- anchors §5.2.1
- arrow constraint §6.1
- §2.1
- block-diagonal §6.1
- centered matrices, §5.2
- chordal §5.1
- §2.1
- clique §5.1
- complementary slackness Proposition 2.3.1
- conic expansion approach §4.6
- conjugate face, Definition 2.2.7
- constraint qualifications §2.3
- convex cone §2.1
- coordinate projection, §5.1
- , Kronecker product §6.1
- dual cone §2.1
- §5.1
- EDM completable §5.2
- EDM, Euclidean distance matrix §5.2
- EDMC, Euclidean distance matrix completion §5.2
- , embedding dimension of §5.2
- embedding dimension of §5.2
- Euclidean distance matrix completion, EDMC §5.2
- Euclidean distance matrix, EDM §5.2
- exposed face Definition 2.2.3
- exposing vector Definition 2.2.3, §5.3
- , face of a cone Definition 2.2.1
- , conjugate face Definition 2.2.7
- face of a cone, Definition 2.2.1
- , minimal face §2.2
- facial reduction (FR) Chapter 1
- facially exposed §2.2, Example 2.2.9
- §5.1
- 5.3
- FR, facial reduction Chapter 1
- gangster constraints §6.1
- gangster operator §6.1, §6.5
- graph §5.1
- graph partitioning, GP §6.5
- Hadamard (elementwise) product §6.1
- Hölder error bound §4.5
- hollow matrices, §5.2
- §2.1
- , dual cone §2.1
- , polar cone §2.1
- , second polar cone §2.1
- Kronecker product, §6.1
- Lagrangian function §2.3
- lifting §6.1
- linear assignment constraints §6.1
- linear program, LP Chapter 1
- low-rank matrix completion problem, LRMC §5.3
- LP, linear program Chapter 1
- LRMC, low-rank matrix completion problem §5.3
- Mangasarian-Fromovitz Constraint Qualification (MFCQ) Definition 3.2.1
- minimal face, §2.2
- Moore-Penrose pseudoinverse footnote 3
- nuclear norm §5.3
- off-diagonal §6.1
- optimal face Part II
- partial order §2.1
- , coordinate projection §5.1
- perdiagonal §2.4
- permutation matrix §6.1
- perturbed problems Example 2.3.7
- positive semi-definite, PSD §4.6
- preprocessing Chapter 1
- preprocessing step §4.1
- primal-dual pair §2.3
- proper convex cone §2.1
- proper face §2.2
- PSD completion problem §5.1
- PSD, positive semi-definite §4.6
- QAP, quadratic assignment problem §6.1
- quadratic assignment problem, QAP §6.1
- §5.1
- Example 2.1.3
- Robinson regularity condition §3.4
- §6.2
- self-replicating Example 2.2.8, Example 2.2.9
- sensor network localization problem, SNL §5.2.1
- sensors §5.2.1
- , singularity degree of (P) Definition 4.2.1
- singularity degree of (P), Definition 4.2.1
- singularity degree of a graph §5.1, §5.2
- , centered matrices §5.2
- , hollow matrices §5.2
- SNL, sensor network localization problem §5.2.1
- specified submatrix, §5.3
- strict feasibility §2.3
- trace inner product §2.1
- vectorization §6.1
- weak duality inequality §2.3
- §5.1
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Alfakih. Graph rigidity via Euclidean distance matrices. Linear Algebra Appl. , 310(1-3):49–165, 2000.
- 2[2] A. Alfakih, M.F. Anjos, V. Piccialli, and H. Wolkowicz. Euclidean distance matrices, semidefinite programming, and sensor network localization. Portug. Math. , 68(1):53–102, 2011.
- 3[3] A. Alfakih, A. Khandani, and H. Wolkowicz. Solving Euclidean distance matrix completion problems via semidefinite programming. Comput. Optim. Appl. , 12(1-3):13–30, 1999. A tribute to Olvi Mangasarian.
- 4[4] A. Alfakih and H. Wolkowicz. Matrix completion problems. In Handbook of semidefinite programming , volume 27 of Internat. Ser. Oper. Res. Management Sci. , pages 533–545. Kluwer Acad. Publ., Boston, MA, 2000.
- 5[5] A.Y. Alfakih. A remark on the faces of the cone of Euclidean distance matrices. Linear Algebra Appl. , 414(1):266–270, 2006.
- 6[6] B. Alipanahi, N. Krislock, A. Ghodsi, H. Wolkowicz, L. Donaldson, and M. Li. Determining protein structures from NOESY distance constraints by semidefinite programming. J. Comput. Biol. , 20(4):296–310, 2013.
- 7[7] B. Alipanahi, N. Krislock, A. Ghodsi, H. Wolkowicz, L. Donaldson, and M. Li. Determining protein structures from NOESY distance constraints by semidefinite programming. J. Comput. Biol. , 20(4):296–310, 2013.
- 8[8] E.D. Andersen and K.D. Andersen. The Mosek interior point optimizer for linear programming: an implementation of the homogeneous algorithm. In High performance optimization , volume 33 of Appl. Optim. , pages 197–232. Kluwer Acad. Publ., Dordrecht, 2000.
