On the Burer-Monteiro method for general semidefinite programs

Diego Cifuentes

arXiv:1904.07147·math.OC·March 3, 2020·Optim. Lett.

On the Burer-Monteiro method for general semidefinite programs

Diego Cifuentes

PDF

TL;DR

This paper extends theoretical guarantees for the Burer-Monteiro nonconvex approach to solve general semidefinite programs, including those with inequalities and multiple constraints, with applications to matrix sensing and quadratic minimization.

Contribution

It generalizes existing results to broader classes of SDPs, providing new guarantees for the Burer-Monteiro method with fixed cost matrices and generic constraints.

Findings

01

Guarantees for the Burer-Monteiro method extend to arbitrary SDPs.

02

Applicable to SDPs with inequalities and multiple semidefinite constraints.

03

Demonstrates effectiveness in matrix sensing and quadratic minimization.

Abstract

Consider a semidefinite program (SDP) involving an $n \times n$ positive semidefinite matrix $X$ . The Burer-Monteiro method uses the substitution $X = Y Y^{T}$ to obtain a nonconvex optimization problem in terms of an $n \times p$ matrix $Y$ . Boumal et al. showed that this nonconvex method provably solves equality-constrained SDPs with a generic cost matrix when $p ≳ 2 m$ , where $m$ is the number of constraints. In this note we extend their result to arbitrary SDPs, possibly involving inequalities or multiple semidefinite constraints. We derive similar guarantees for a fixed cost matrix and generic constraints. We illustrate applications to matrix sensing and integer quadratic minimization.

Equations60

X \in X min C ∙ X, X := {X \in \SS_{+}^{n} : A (X) - b \in {0}^{m_{1}} \times R_{+}^{m_{2}}}

X \in X min C ∙ X, X := {X \in \SS_{+}^{n} : A (X) - b \in {0}^{m_{1}} \times R_{+}^{m_{2}}}

Y \in R^{n \times p} min C ∙ Y Y^{T} such that Y Y^{T} \in X .

Y \in R^{n \times p} min C ∙ Y Y^{T} such that Y Y^{T} \in X .

y min {f (y) : h (y) \in {0}^{m_{1}} \times R_{+}^{m_{2}}} .

y min {f (y) : h (y) \in {0}^{m_{1}} \times R_{+}^{m_{2}}} .

y feasible, λ \in R^{m_{1}} \times R_{+}^{m_{2}}, λ_{i} = 0 for i \in / I (y), \nabla_{y} L (y, λ) = 0,

y feasible, λ \in R^{m_{1}} \times R_{+}^{m_{2}}, λ_{i} = 0 for i \in / I (y), \nabla_{y} L (y, λ) = 0,

u^{T} \nabla_{y y}^{2} L (y, λ) u \geq 0, \forall u such that \nabla_{y} h_{i} (y) u = 0 for i \in I (y) .

{\nabla h_{i} (y) : i \in I (y)} are linearly independent .

{\nabla h_{i} (y) : i \in I (y)} are linearly independent .

x, y min {f (x, y) : h (x, y) = 0, x \in K},

x, y min {f (x, y) : h (x, y) = 0, x \in K},

(x, y) feasible, s \in K^{*}, ⟨ s, x ⟩ = 0, \nabla_{x, y} L (x, y, λ, s) = 0,

(x, y) feasible, s \in K^{*}, ⟨ s, x ⟩ = 0, \nabla_{x, y} L (x, y, λ, s) = 0,

u^{T} \nabla_{y y}^{2} L (x, y, λ, s) u \geq 0, \forall u such that \nabla_{y} h (x, y) u = 0.

X min C ∙ X s.t. X_{i, i} \geq X_{i, n + 1} for i \in [n], X_{n + 1, n + 1} = 1, X \in \SS_{+}^{n + 1} .

X min C ∙ X s.t. X_{i, i} \geq X_{i, n + 1} for i \in [n], X_{n + 1, n + 1} = 1, X \in \SS_{+}^{n + 1} .

\SS_{n - p}^{n} := {X : rank X \leq n - p} \subset \SS^{n} .

\SS_{n - p}^{n} := {X : rank X \leq n - p} \subset \SS^{n} .

L := ⋃_{I} L_{I} \subset \SS^{n}, with L_{I} := span {A_{i} : i \in I},

L := ⋃_{I} L_{I} \subset \SS^{n}, with L_{I} := span {A_{i} : i \in I},

dim (\SS_{n - r}^{n} + L) \leq (τ (n) - τ (p)) + m^{'} < τ (n) = dim \SS^{n} .

dim (\SS_{n - r}^{n} + L) \leq (τ (n) - τ (p)) + m^{'} < τ (n) = dim \SS^{n} .

S (λ) := C - A^{*} (λ) \in \SS^{n} is the slack matrix,

S (λ) := C - A^{*} (λ) \in \SS^{n} is the slack matrix,

Y Y^{T} \in X, λ \in R^{m_{1}} \times R_{+}^{m_{2}}, λ_{i} = 0 for i \in / I (Y), S (λ) Y = 0,

Y Y^{T} \in X, λ \in R^{m_{1}} \times R_{+}^{m_{2}}, λ_{i} = 0 for i \in / I (Y), S (λ) Y = 0,

S (λ) ∙ U U^{T} \geq 0, \forall U \in R^{n \times p} such that A_{i} ∙ U Y^{T} = 0 for i \in I (Y) .

X \in \SS^{n} min ∥ X ∥_{*} such that A (X) = b .

X \in \SS^{n} min ∥ X ∥_{*} such that A (X) = b .

SDP_{n}

SDP_{n}

\underline{n} := (n_{1}, t o 8.99994 pt . \hss . \hss ., n_{k}), \overline{n} := (n_{k + 1}, t o 8.99994 pt . \hss . \hss ., n_{ℓ}), \overline{X} := (X_{k + 1}, t o 8.99994 pt . \hss . \hss ., X_{ℓ}) .

\underline{n} := (n_{1}, t o 8.99994 pt . \hss . \hss ., n_{k}), \overline{n} := (n_{k + 1}, t o 8.99994 pt . \hss . \hss ., n_{ℓ}), \overline{X} := (X_{k + 1}, t o 8.99994 pt . \hss . \hss ., X_{ℓ}) .

BM_{n}

BM_{n}

m^{'} := r_{k + 1}, \dots, r_{ℓ} max m - d - τ (r_{k + 1}) - τ (r_{k + 2}) - \dots - τ (r_{ℓ}),

m^{'} := r_{k + 1}, \dots, r_{ℓ} max m - d - τ (r_{k + 1}) - τ (r_{k + 2}) - \dots - τ (r_{ℓ}),

\underline{V} := j \in [k] : p_{j} \leq n_{j} ⋃ (\SS^{n_{1}} \times \dots \times \SS^{n_{j - 1}} \times \SS_{n_{j} - p_{j}}^{n_{j}} \times \SS^{n_{j + 1}} \times \dots \times \SS^{n_{ℓ}}) \subset \SS^{\underline{n}},

\underline{V} := j \in [k] : p_{j} \leq n_{j} ⋃ (\SS^{n_{1}} \times \dots \times \SS^{n_{j - 1}} \times \SS_{n_{j} - p_{j}}^{n_{j}} \times \SS^{n_{j + 1}} \times \dots \times \SS^{n_{ℓ}}) \subset \SS^{\underline{n}},

\overline{V} := r_{k + 1}, \dots, r_{ℓ} ⋃ (\SS_{n_{k + 1} - r_{k + 1}}^{n_{k + 1}} \times \dots \times \SS_{n_{ℓ} - r_{ℓ}}^{n_{ℓ}}) \subset \SS^{\overline{n}},

dim \underline{V} = j \leq k \sum τ (n_{j}) - τ (p_{m i n}), dim \overline{V} = r_{k + 1} \dots r_{ℓ} max j > k \sum τ (n_{j}) - τ (r_{j}) .

dim \underline{V} = j \leq k \sum τ (n_{j}) - τ (p_{m i n}), dim \overline{V} = r_{k + 1} \dots r_{ℓ} max j > k \sum τ (n_{j}) - τ (r_{j}) .

dim (\underline{V} \times \overline{V} \times {0} + Im A^{*}) = m + j \leq k \sum τ (n_{j}) - τ (p_{m i n}) + r_{k + 1} \dots r_{ℓ} max j > k \sum τ (n_{j}) - τ (r_{j})

dim (\underline{V} \times \overline{V} \times {0} + Im A^{*}) = m + j \leq k \sum τ (n_{j}) - τ (p_{m i n}) + r_{k + 1} \dots r_{ℓ} max j > k \sum τ (n_{j}) - τ (r_{j})

\displaystyle=\,D-\tau(p_{\min})+\max_{r_{k+1},\dots,r_{\ell}}\bigl{\{}m\!-\!d\!-\!\sum_{j>k}\tau(r_{j})\bigr{\}}\,=\,D-\tau(p_{\min})+m^{\prime}\;<\;D,

(q (Y), \overline{X}, x) \in X, \overline{S} (λ) \in \SS_{+}^{\overline{n}}, ⟨ \overline{S} (λ), \overline{X} ⟩ = 0, s (λ) = 0, S_{j} (λ) Y_{j} = 0,

(q (Y), \overline{X}, x) \in X, \overline{S} (λ) \in \SS_{+}^{\overline{n}}, ⟨ \overline{S} (λ), \overline{X} ⟩ = 0, s (λ) = 0, S_{j} (λ) Y_{j} = 0,

S_{j} (λ) ∙ U_{j} U_{j}^{T} \geq 0, \forall U_{j} \in R^{n_{j} \times p_{j}} s.t. A_{j} (U_{j} Y_{j}^{T}) = 0 (for j \in [k]) .

∥ X ∥_{*} = Z min I_{n} ∙ Z such that Z + X \in \SS_{+}^{n}, Z - X \in \SS_{+}^{n} .

∥ X ∥_{*} = Z min I_{n} ∙ Z such that Z + X \in \SS_{+}^{n}, Z - X \in \SS_{+}^{n} .

X_{1}, X_{2} min I_{n} ∙ X_{1} + I_{n} ∙ X_{2} such that A (X_{1}) - A (X_{2}) = b, X_{1} \in \SS_{+}^{n}, X_{2} \in \SS_{+}^{n} .

X_{1}, X_{2} min I_{n} ∙ X_{1} + I_{n} ∙ X_{2} such that A (X_{1}) - A (X_{2}) = b, X_{1} \in \SS_{+}^{n}, X_{2} \in \SS_{+}^{n} .

(I_{n}, I_{n}) \in / \underline{V} + (1, - 1) \otimes Im A^{*}, where \underline{V} := \SS_{n - p}^{n} \times \SS^{n} \cup \SS^{n} \times \SS_{n - p}^{n} .

(I_{n}, I_{n}) \in / \underline{V} + (1, - 1) \otimes Im A^{*}, where \underline{V} := \SS_{n - p}^{n} \times \SS^{n} \cup \SS^{n} \times \SS_{n - p}^{n} .

M_{I} := {Y \in R^{n \times p} : A_{i} ∙ Y Y^{T} = b_{i} for i \in I} .

M_{I} := {Y \in R^{n \times p} : A_{i} ∙ Y Y^{T} = b_{i} for i \in I} .

f_{I} : R^{n \times p} \to R^{I}, Y \mapsto (\overset{ˉ}{A}_{i} ∙ Y Y^{T} : i \in I) .

f_{I} : R^{n \times p} \to R^{I}, Y \mapsto (\overset{ˉ}{A}_{i} ∙ Y Y^{T} : i \in I) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the Burer-Monteiro method

for general semidefinite programs

Diego Cifuentes

Massachusetts Institute of Technology

Cambridge, MA, USA

[email protected]

Abstract.

Consider a semidefinite program (SDP) involving an $n\times\nobreak n$ positive semidefinite matrix $X$ . The Burer-Monteiro method uses the substitution $X=YY^{T}$ to obtain a nonconvex optimization problem in terms of an $n\times p$ matrix $Y$ . Boumal et al. showed that this nonconvex method provably solves equality-constrained SDPs with a generic cost matrix when $p\gtrsim\nobreak\sqrt{2m}$ , where $m$ is the number of constraints. In this note we extend their result to arbitrary SDPs, possibly involving inequalities or multiple semidefinite constraints. We derive similar guarantees for a fixed cost matrix and generic constraints. We illustrate applications to matrix sensing and integer quadratic minimization.

Key words and phrases:

Semidefinite programming, Burer-Monteiro method, Low rank factorization, Nonconvex optimization, Spurious local minima

1. Introduction

Consider a semidefinite program (SDP) in $\SS^{n}$ , the space of $n{\times}n$ symmetric matrices, with $m\!=\!m_{1}\!+\!m_{2}$ constraints ( $m_{1}$ equalities and $m_{2}$ inequalities):

[TABLE]

where $C\!\in\!\SS^{n}$ , $b\!\in\!\mathbb{R}^{m}$ and $\mathcal{A}:\SS^{n}\!\to\!\mathbb{R}^{m}$ , $X\!\mapsto\!(A_{1}\!\bullet\!X,\hbox to8.99994pt{.\hss.\hss.}\kern 0.50003pt,A_{m}\!\bullet\!X)$ is a linear map. We assume that $\mathscr{X}$ is nonempty and that the minimum is achieved. Though interior point methods can solve (SDP) in polynomial time, they often run into memory problems for large values of $n$ . This has motivated a surge of newer, more scalable techniques; see the recent survey [18]. We study here the low rank factorization method, pioneered by Burer and Monteiro [10, 11].

The Burer-Monteiro method consists in writing $X=YY^{T}$ for some $Y\in\mathbb{R}^{n\times p}$ , and solving the following nonconvex optimization problem:

[TABLE]

Let $\tau(k):=\binom{k+1}{2}$ be the $k$ -th triangular number. Barvinok [3] and Pataki [22] independently showed that (SDP) has an optimal solution of rank $r$ , with $\tau(r)\!\leq\!m$ . Consequently, problems (SDP) and (BM) are equivalent for any $p$ with $\tau(p)\!\geq\!m$ . But due to nonconvexity, local optimization methods may not always recover the global optimum of (BM). Nonetheless, the Burer-Monteiro performs very well in several applications, see e.g., [10, 16, 24].

There has been much recent work in proving global guarantees for (BM). Most remarkably, Boumal et al. [8, 9] showed that equality-constrained SDPs ( $m_{2}{=}0$ ) have no spurious 2nd-order critical points when $\tau(p)\!>\!m$ under certain assumptions. Concretely, they require that the cost matrix $C$ is generic and that the feasible set of (BM) is sufficiently regular. By generic we mean that the result holds outside a set of measure zero. Though other global guarantees for (BM) exist, e.g. [13, 20], their setting is more restrictive.

In this note we generalize the result from Boumal et al. [8, 9] to arbitrary SDPs, possibly involving inequalities or multiple positive semidefinite (PSD) constraints. For the inequality-constrained problem (SDP), we show in Theorem 1 that if $\tau(p)\!>\!m$ and the cost is generic, then any 2nd-order critical point of (BM) is globally optimal. Similar guarantees might be derived even when the cost matrix is fixed, see Theorem 3. We show applications to integer quadratic minimization and PSD matrix sensing.

Our proof of Theorem 1 is simpler than the one in [8, 9], as it relies on nonlinear programming instead of Riemannian optimization. This simplicity is reflected in the fact that Theorem 1 does not require any regularity assumptions on the domain (constraint qualifications). Nevertheless, regularity conditions might still be needed to prevent the existence of local minima that do not satisfy the 2nd-order criticality conditions.

We also consider SDPs involving multiple PSD variables, and study the Burer-Monteiro method applied to a subset of these variables. We prove in Theorem 4 that, for a generic cost, any 2nd-order critical point is globally optimal when $p$ satisfies a bound due to Pataki [22]. We present an application to symmetric matrix sensing (the restricted isometry property is not needed).

The structure of this note is as follows. Section 2 reviews the notion of 2nd-order critical points in nonlinear programming. Section 3 analyzes the Burer-Monteiro method for the inequality-constrained problem (SDP). Section 4 studies SDPs involving multiple PSD constraints.

Related work. The guarantees from Boumal et al. have been further studied in [25, 5, 23, 12], but all these papers focus on the equality-constrained case. The bound $\tau(p)>m$ was shown to be optimal up to lower order terms in [25]. Guarantees for approximate 2-critical points were derived in [5, 23, 12]. The first polynomial time bounds for the Burer-Monteiro method were recently proved in [12]. We hope that the techniques developed in this paper may lead to polynomial time guarantees for arbitrary SDPs.

2. Criticality conditions

We review the notion of critical points. Consider the nonlinear program

[TABLE]

Let $L(y,\lambda)\!=\!f(y)\!-\!\lambda\!\cdot\!h(y)$ be the Lagrangian function. Let $I(y)\!\subset\![m]$ be the indices of the active constraints at $y$ , i.e., the indices for which $h_{i}(y)\!=\!0$ . The 1st-order and 2nd-order necessary optimality conditions are:

[TABLE]

A point $y$ is 1st-order critical for (NLP), abbreviated 1-critical, if there exist multipliers $\lambda$ satisfying (1a). The point is 2nd-order critical, abbreviated 2-critical, if (1b) also holds. A critical point is spurious if it is not the global minimum of (NLP).

Given a local minimum $y$ of (NLP), it is known that $y$ satisfies (1) under suitable regularity assumptions. Different regularity conditions, known as constraint qualifications, have been proposed [4]. One of the simplest is:

[TABLE]

Various algorithms with provable convergence guarantees to 2-critical points are known, see e.g., [2, 14, 6] and the references therein. These results rely either on (LICQ) or a weaker constraint qualification.

More generally, consider the nonlinear conic program

[TABLE]

where $\mathcal{K}$ is a closed convex cone. The Lagrangian function is $L(x,y,\lambda,s)=f(x,y)-\lambda{\cdot}\nobreak h(x,y)-s{\cdot}x$ . The following 1st-order conditions are necessary for optimality under suitable regularity conditions, see e.g., [7, §3.1]:

[TABLE]

A point $(x,y)$ is 1-critical for (NLCP) if it satisfies (2a) for some $\lambda,s$ . The point is 2-critical if (2b) also holds.

There are several algorithms for (NLCP) for the case $\mathcal{K}\!=\!\SS_{+}^{n}$ , see the survey paper [26]. Symmetric cones (e.g., products of PSD cones) were studied in [17]. These methods are provably convergent to 1-critical points. In order to escape from points that do not satisfy (2b) we may rely on 2nd-order methods for the (NLP) given by fixing the $x$ coordinate.

3. Inequality constrained SDPs

Consider problems (SDP) and (BM). For $X\!\in\!\mathscr{X}$ , recall that the $i$ -th constraint is active at $X$ if $A_{i}\!\bullet\!X\!=\!b_{i}$ . Let $m^{\prime}\!=\!m^{\prime}(\mathscr{X})$ be the largest number of linearly independent constraints that can be simultaneously active. For instance, if $m_{2}\!=\!0$ then $m^{\prime}\!=\!\operatorname{rank}\mathcal{A}$ . We will show the following theorem.

Theorem 1.

Let $p$ such that $\tau(p)>m^{\prime}$ . For a generic $C$ , problem (BM) has no spurious 2-critical points. This means that any 2-critical point $Y$ for (BM) is also globally optimal, and hence $YY^{T}$ is optimal for (SDP).

Example 1 (Integer quadratic minimization).

Consider the optimization problem $\min\{f(x):x\!\in\!\mathbb{Z}^{n}\}$ where $f(x)$ is a convex quadratic function. Denoting $\tilde{x}\!:=\!(x,1)\!\in\!\mathbb{Z}^{n+1}$ , we may write $f(x)\!=\!\tilde{x}^{T}C\tilde{x}$ for some $C\!\in\!\SS^{n+1}$ . The following SDP relaxation for this problem was proposed in [21]:

[TABLE]

By Theorem 1, for a generic cost function any 2-critical point of the Burer-Monteiro problem is globally optimal when $\tau(p)\!>\!n{+}1$ .

By generic, we mean the following. For fixed $\mathcal{A}$ , $b$ , the set of all cost matrices $C\!\in\!\SS^{n}$ for which (BM) has a spurious 2-critical point has measure zero. We can provide an explicit characterization of this measure-zero set in $\SS^{n}$ . This set is contained in the Minkowski sum of two special algebraic sets. The first algebraic set is given by a rank constraint:

[TABLE]

It is known that $\dim\SS^{n}_{n-p}=\tau(n)\!-\!\tau(p)$ , see e.g., [15, Prop.2.1]. The second algebraic set is a union of linear subspaces:

[TABLE]

where the union is over the possible subsets of constraints $I\subset[m]$ that can be simultaneously active. Note that $\dim\mathcal{L}=m^{\prime}$ by definition of $m^{\prime}$ .

Theorem 2.

If (BM) has a spurious 2-critical point then $C\in\SS^{n}_{n{-}p}+\mathcal{L}$ .

Theorem 1 follows directly from Theorem 2. Indeed, if $\tau(p)\!>\!m^{\prime}$ then

[TABLE]

Therefore $\SS^{n}_{n-r}\!+\!\mathcal{L}$ is a proper algebraic set in $\SS^{n}$ , and has measure zero.

We proceed to prove Theorem 2. We first derive the criticality conditions for (BM). This is a special instance of (NLP), so we need to specialize (1). We have $h(Y)\!=\!\mathcal{A}(YY^{T}){-}b$ and $L(Y,\lambda)\!=\!S(\lambda){\bullet}YY^{T}\!+\!b^{T}\lambda$ , where

[TABLE]

and $\mathcal{A}^{*}:\mathbb{R}^{m}\!\to\!\SS^{n}$ , $\lambda\!\mapsto\!\sum_{i}\lambda_{i}A_{i}$ is the adjoint of $\mathcal{A}$ . The 1st-order and 2nd-order criticality conditions are:

[TABLE]

The following lemma establishes sufficient conditions for a critical point to be global optimal. The lemma is known, see [11, 16, 9], but our assumptions are slightly different since we allow inequalities.

Lemma 1.

Either of the following conditions imply global optimality:

(i)

$Y$ * is 1-critical and the multiplier $\lambda$ satisfies $S(\lambda)\in\SS_{+}^{n}$ ,* 2. (ii)

or $Y$ is 2-critical and $Y$ is column rank deficient.

Proof.

(i) The conic dual of (SDP) is $\,\max_{\lambda}\{b^{T}\lambda:S(\lambda)\!\in\!\SS_{+}^{n},\,\lambda\!\in\!\mathbb{R}^{m_{1}}\!{\times}\mathbb{R}_{+}^{m_{2}}\}.$ Let $(Y,\lambda)$ satisfy (6a), and let $X:=YY^{T}$ . We will show that the primal/dual pair $(X,\lambda)$ is optimal for the SDP. It suffices to verify three conditions: $X$ is primal feasible, $\lambda$ is dual feasible, and complementary slackness holds (i.e., $\lambda_{i}{=}0$ for $i{\notin}I(X)$ and $S(\lambda)X{=}0$ ). Primal feasibility and complementary slackness follow from (6a), while dual feasibility corresponds to $S(\lambda)\!\in\!\SS_{+}^{n}$ .

(ii) Let $(Y,\lambda)$ satisfy (6). By the above item, it suffices to show that $S(\lambda)\!\in\!\SS_{+}^{n}$ . Let $x\!\in\!\mathbb{R}^{n}$ , and let us see that $x^{T}S(\lambda)x\!\geq\!0$ . Since $Y$ is rank deficient, there is a nonzero vector $z\!\in\!\mathbb{R}^{p}$ such that $Yz\!=\!0$ . The matrix $U\!:=\!xz^{T}$ satisfies $UY^{T}\!\!=\!0$ , so $S(\lambda)\!\bullet\!UU^{T}\!\!\geq\!0$ by (6b). Since $S(\lambda)\!\bullet\!UU^{T}\!=\!\|z\|^{2}(x^{T}S(\lambda)x)$ , then $x^{T}S(\lambda)x\geq 0$ . ∎

We are ready to prove Theorem 2 (which implies Theorem 1).

Proof of Theorem 2.

Let $(Y,\lambda)$ a spurious point satisfying (6). Lemma 1(ii) gives that $\operatorname{rank}Y\!=\!p$ . By (6a) we have $S(\lambda)Y\!=\!0$ , which implies $S(\lambda)\!\in\!\SS^{n}_{n-p}$ , and also $\lambda_{i}{=}0$ for $i\!\notin\!I(Y)$ . Thus $C=S(\lambda)\!+\!\mathcal{A}^{*}(\lambda)\in\SS^{n}_{n-p}\!+\mathcal{L}$ . ∎

To finish this section, we observe that Theorem 2 can be used even if the cost matrix $C$ is not generic. For instance, the next theorem assumes that both $b,C$ are fixed and $\mathcal{A}$ is generic (i.e., $A_{1},\dots,A_{m}$ are generic).

Theorem 3.

Let $p$ such that $\tau(p)\!>\!m$ and $\operatorname{rank}C\!>\!n{-}p$ . For a generic $\mathcal{A}$ , problem (BM) has no spurious 2-critical points.

Proof.

By Theorem 2, it suffices to see that $C\notin\SS^{n}_{n-p}\!+\!\mathcal{L}$ . Fix $I\!\subset\![m]$ , and let $\mathcal{L}_{I}\!\subset\nobreak\!\SS^{n}$ as in (4). Note that $\mathcal{L}_{I}$ is generic among the subspaces of dimension $|I|$ , as it depends on the generic matrices $A_{i}$ . Recall that $\dim(\SS^{n}_{n-p}{+}\mathcal{L}_{I})\!<\!\dim\SS^{n}$ by (5). Since $C\!\notin\!\SS^{n}_{n-p}$ , then $C\notin\SS^{n}_{n-p}\!+\!\mathcal{L}_{I}$ for a generic $\mathcal{L}_{I}$ . The result follows from $\mathcal{L}=\bigcup_{I}\mathcal{L}_{I}$ . ∎

An additional advantage of having generic constraints is that regularity is always satisfied. Therefore any local minimum of (BM) is also 2-critical, and hence is subject to Theorem 3. The next proposition is shown in Appendix A.

Proposition 1.

Assume that the entries of $b$ are nonzero. For a generic $\mathcal{A}$ , any feasible point of (BM) satisfies (LICQ).

Example 2 (Matrix sensing).

Given a linear map $\mathcal{A}:\SS^{n}\!\to\!\mathbb{R}^{m}$ and a vector $b\!\in\!\mathbb{R}^{m}$ , consider finding a low rank matrix $X\!\in\!\SS^{n}$ such that $\mathcal{A}(X)\!=\!b$ . A standard technique to promote low rank is to minimize the nuclear norm:

[TABLE]

If we further assume that $X$ that is PSD, the cost function is $I_{n}\bullet X$ . By Theorem 3, if $\mathcal{A}$ is generic and $\tau(p)\!>\!m$ , then any local minimum of (BM) is globally optimal. The PSD assumption will be relaxed in the next section.

*Remark**.*

Different guarantees about the Burer-Monteiro method for matrix sensing were obtained in [20], relying on the restricted isometry property.

4. General SDPs

Let $\textbf{n}\!:=\!(n_{1},\dots,n_{\ell})\in\mathbb{N}^{\ell}$ and $d\in\mathbb{N}$ . We consider an SDP involving PSD matrices of sizes $n_{1},\dots,n_{\ell}$ and a free variable of dimension $d$ . Let the Euclidean space $\SS^{\bf n}:=\SS^{n_{1}}{\times}\cdots{\times}\SS^{n_{\ell}}$ and the convex cone $\SS^{\bf n}_{+}:=\SS_{+}^{n_{1}}{\times}\cdots{\times}\SS_{+}^{n_{\ell}}$ . Given $C\in\SS^{\bf n}{\times}\mathbb{R}^{d}$ , $b\in\mathbb{R}^{m}$ , and a linear map $\mathcal{A}:\SS^{\bf n}{\times}\mathbb{R}^{d}\!\to\!\mathbb{R}^{m}$ , consider:

[TABLE]

where $X:=(X_{1},\dots,X_{\ell},x)$ with $X_{j}\in\SS^{n_{j}}$ , $x\in\mathbb{R}^{d}$ . As before, we assume that $\mathscr{X}$ is nonempty and that the minimum is achieved.

We apply the Burer-Monteiro method to the first $k$ matrices. Let $Y:=(Y_{1},\hbox to8.99994pt{.\hss.\hss.}\kern 0.50003pt,Y_{k})$ , with $Y_{j}\!\in\!\mathbb{R}^{n_{j}\times p_{j}}$ , and let $q(Y):=(Y_{1}Y_{1}^{T},\hbox to8.99994pt{.\hss.\hss.}\kern 0.50003pt,Y_{k}Y_{k}^{T})$ . We denote

[TABLE]

In particular, $\SS^{\bf n}=\SS^{\underline{\bf n}}\times\SS^{\overline{\bf n}}$ . The Burer-Monteiro problem is:

[TABLE]

Pataki [22] showed that ( ${\mathit{SDP}}_{\!\bf n}$ ) always has an optimal solution such that $\sum_{j=1}^{\ell}\tau(r_{j})\!\leq\!m\!-\!d$ , where $r_{j}\!:=\!\operatorname{rank}X_{j}$ . We can ensure that there is a solution with $r_{j}\!\leq\!p_{j}$ for all $j\!\in\![k]$ if either $p_{j}\!\geq\!n_{j}$ or $\tau(p_{j})\!\geq\!m^{\prime}$ , with

[TABLE]

where the maximum is over the possible ranks $r_{k+1},\dots,r_{\ell}$ . Hence, problems ( ${\mathit{SDP}}_{\!\bf n}$ ) and ( ${\mathit{BM}}_{\!\bf n}$ ) agree when $\tau(p_{j})\!\geq\!\min\{m^{\prime},\tau(n_{j})\}$ for $j\!\in\![k]$ .

Theorem 4.

Assume that $\tau(p_{j})\!>\!\min\{m^{\prime},\tau(n_{j})\}$ for $j\!\in\![k]$ . For a generic $C$ , problem ( ${\mathit{BM}}_{\!\bf n}$ ) has no spurious 2-critical points.

Example 3 (Inequalities).

Consider the inequality constrained problem (SDP). We may view each of the $m_{2}$ inequalities as a PSD constraint on a $1{\times}1$ matrix. So this is a special instance of ( ${\mathit{SDP}}_{\!\bf n}$ ) with $k{=}1$ , $\ell{=}m_{2}{+}1$ , $d{=}0$ , and $n_{2}{=}\dots=n_{\ell}{=}1$ . Note that $r_{i+1}{=}1$ when the $i$ -th inequality constraint is inactive, and is zero otherwise. Hence $m^{\prime}=m-\#(\text{inactive constrs})=\#(\text{active constrs})$ . This is consistent with the results from Section 3.

Example 4 (Second-order cone).

Let $\mathcal{Q}^{n}:=\{x\!\in\!\mathbb{R}^{n}:\|(x_{2},\dots,x_{n})\|\!\leq\!x_{1}\}$ be the second-order cone. Consider minimizing a linear cost on $\SS_{+}^{n_{1}}\!\times\!\mathcal{Q}^{n_{2}}$ subject to $m_{1}$ linear equalities. Apply the Burer-Monteiro factorization to the matrix in $\SS_{+}^{n_{1}}$ . We can embed $\mathcal{Q}^{n_{2}}$ inside $\SS^{n_{2}}_{+}$ by adding $\tau(n_{2}{-}1)$ new linear equalities, see [1, pg.7]. So this is a special case of ( ${\mathit{BM}}_{\!\bf n}$ ) with $\ell{=}2$ , $k{=}1$ , $d{=}0$ , $m{=}m_{1}{+}\tau(n_{2}{-}1)$ . Given $x\!\in\!\mathcal{Q}^{n_{2}}$ , the rank of the corresponding PSD matrix is $r_{2}{=}0$ if $x{=}0$ , $r_{2}{=}n_{2}{-}1$ if $x$ lies in the boundary, and $r_{2}{=}n_{2}$ if $x$ lies in the interior. So Theorem 4 applies when $\tau(p_{1})\!>\!m_{1}{+}\tau(n_{2}{-}1){-}\tau(r_{2})$ , where $r_{2}$ is the smallest feasible rank. We point out that embedding $\mathcal{Q}^{n_{2}}$ inside $\SS_{+}^{n_{2}}$ is used for the analysis, but we do not need to do this in practice. The reason is that the embedding preserves critical points.

We also provide an explicit characterization of the costs $C$ for which spurious 2-critical points may exist. These costs lie in the Minkowski sum of two algebraic sets, which are closely related to the ones in (3) and (4).

Theorem 5.

If ( ${\mathit{BM}}_{\!\bf n}$ ) has a spurious 2-critical point, then $C$ lies in the algebraic set $\,\underline{\mathcal{V}}\!\times\!\overline{\mathcal{V}}\!\times\!\{0^{d}\}+\operatorname{Im}\mathcal{A}^{*}\subset\SS^{\bf n}\!\times\!\mathbb{R}^{d}$ , with

[TABLE]

where the last union is over the possible ranks $r_{k+1},\dots,r_{\ell}$ in ( ${\mathit{SDP}}_{\!\bf n}$ ).

Theorem 4 follows from Theorem 5 by counting dimensions. Let $p_{\min}$ be the minimum of $p_{1},\dots,p_{k}$ , ignoring the values with $p_{j}\!>\!n_{j}$ . Note that

[TABLE]

Let $D:=\dim(\SS^{\bf n}\!\times\!\mathbb{R}^{d})=\tau(n_{1}){+}\cdots{+}\tau(n_{\ell}){+}d$ . If $\tau(p_{\min})\!>\!m^{\prime}$ , then

[TABLE]

Hence $\underline{\mathcal{V}}\!\times\!\overline{\mathcal{V}}\!\times\!\{0\}\!+\!\operatorname{Im}\mathcal{A}^{*}$ has measure zero.

We proceed to prove Theorem 5. We first derive the optimality conditions for ( ${\mathit{BM}}_{\!\bf n}$ ). This is a special instance of (NLCP), so we need to specialize (2). For $\lambda\!\in\!\mathbb{R}^{m}$ , consider the slack variable $S(\lambda):=C{-}\mathcal{A}^{*}(\lambda)\in\SS^{\bf n}{\times}\mathbb{R}^{n}$ . Let $S_{j}(\lambda)\in\SS^{n_{j}}$ be the $j$ -th component of $S(\lambda)$ . Similarly define $\overline{S}(\lambda)\in\SS^{\overline{\bf n}}$ and $s(\lambda)\in\mathbb{R}^{d}$ . The criticality conditions are:

[TABLE]

We now provide sufficient conditions for global optimality of critical points.

Lemma 2.

Either of the following conditions imply global optimality:

(i)

$(Y,\overline{X},x)$ * is 1-critical and $S_{j}(\lambda)\!\in\!\SS_{+}^{n_{j}}$ for $j\!\in\![k]$ .* 2. (ii)

or $(Y,\overline{X},x)$ is 2-critical and $Y_{j}$ is column rank deficient for $j\!\in\![k]$ .

Proof.

The proof is analogous to Lemma 1. For (i) we compare (8a) with the primal/dual optimality conditions for ( ${\mathit{SDP}}_{\!\bf n}$ ). For (ii) we use a vector $z_{j}\!\in\!\mathbb{R}^{p_{j}}$ in the right kernel of $Y_{j}$ in order to show that $S_{j}(\lambda)\in\SS_{+}^{n_{j}}$ . ∎

Proof of Theorem 5.

Let $(Y,\overline{X},x,\lambda)$ a spurious point satisfying (8). By Lemma 2(ii) we have tat $\operatorname{rank}Y_{j}\!=\!p_{j}$ for some $j\!\in\![k]$ . As $S_{j}(\lambda)Y_{j}\!=\!0$ then $S_{j}(\lambda)\!\in\!\SS^{n_{j}}_{n_{j}-p_{j}}$ . Let $(r_{k+1},\dots,r_{\ell})$ be the ranks of $\overline{X}$ . Since $\langle\overline{S}(\lambda),\overline{X}\rangle\!=\!0$ and both lie in $\SS^{\overline{\bf n}}_{+}$ , then $\overline{S}(\lambda)\subset\SS^{n_{k+1}}_{n_{k+1}-r_{k+1}}\!\!\times\!\cdots\!\times\!\SS^{n_{\ell}}_{n_{\ell}-r_{\ell}}$ . Hence $S(\lambda)\in\underline{\mathcal{V}}\!\times\!\overline{\mathcal{V}}\!\times\!\{0\}$ , as $s(\lambda)\!=\!0$ . The result follows from $C=S(\lambda)\!+\!\mathcal{A}^{*}(\lambda)$ . ∎

As illustrated next, Theorem 5 can be used even when $C$ is not generic.

Example 5 (Matrix sensing).

We revisit the problem of sensing symmetric matrices from Example 2. For $X\!\in\!\SS^{n}$ , its nuclear norm satisfies:

[TABLE]

Let $X_{1}\!:=\!\frac{1}{2}(Z{+}X)$ , $X_{2}\!:=\!\frac{1}{2}(Z{-}X)$ . We can rewrite problem (7) as follows:

[TABLE]

Consider the Burer-Monteiro method applied to both matrices $X_{1},X_{2}$ , so that $k\!=\!\ell\!=\!2$ , using the same rank $p$ for both matrices. We will prove that there are no spurious 2-critical points when $\mathcal{A}:\SS^{n}\!\to\!\mathbb{R}^{m}$ is generic and $\tau(p)\!>\!m$ . By Theorem 5, we need to show that

[TABLE]

It suffices to see that $I_{n}\notin\SS^{n}_{n-p}\!+\!\operatorname{Im}\mathcal{A}^{*}$ . But this was shown in Theorem 3.

Acknowledgments

The author thanks Nicolas Boumal, Ankur Moitra, Pablo Parrilo, and David Rosen for helpful discussions and comments.

Appendix A Regularity with generic constraints

In this section we prove Proposition 1. Our proof relies on Sard’s theorem from differential geometry, see e.g., [19, §2].

Theorem 6 (Sard).

Let $f:\mathbb{R}^{n}\to\mathbb{R}^{m}$ be a smooth map, with $n\!\geq\!m$ . Let $v\!\in\!\mathbb{R}^{m}$ be a generic point. Then $\operatorname{rank}(\nabla f(y))\!=\!m$ for any $y\!\in\!f^{-1}(v)$ .

Proof of Proposition 1.

Fix a set of indices $I\!\subset\![m]$ , and let

[TABLE]

We claim that (LICQ) holds at all points on $\mathcal{M}_{I}$ (i.e., the gradients are linearly independent). If this happens for each $I\!\subset\![m]$ , then (LICQ) also holds for the feasible set of (BM). So it suffices to show the claim.

We prove the claim under a more restrictive genericity setting. We assume that each $b_{i}\!\neq\!0$ and that $A_{i}\!=\!\alpha_{i}\bar{A}_{i}$ , where $\{\bar{A}_{i}\}$ are fixed matrices and $\{\alpha_{i}\}$ are generic scalars. Let

[TABLE]

The vector $v\!:=\!(b_{i}/\alpha_{i}:i\!\in\!I)$ is generic since $\{\alpha_{i}\}$ are generic. By Theorem 6, $\nabla f_{I}(Y)$ is full rank for any $Y\!\in\!f_{I}^{-1}(v)=\mathcal{M}_{I}$ . So (LICQ) holds on $\mathcal{M}_{I}$ . ∎

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program. , 95:3–51, 2003.
2[2] R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt. Second-order negative-curvature methods for box-constrained and general constrained optimization. Comput. Optim. Appl. , 45(2):209–236, 2010.
3[3] A. I. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete Comput. Geom. , 13(2):189–202, 1995.
4[4] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear programming: theory and algorithms . John Wiley & Sons, 2013.
5[5] S. Bhojanapalli, N. Boumal, P. Jain, and P. Netrapalli. Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form. In Conf. Learn. Theory , pages 3243–3270, 2018.
6[6] E. G. Birgin, G. Haeser, and A. Ramos. Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Comput. Optim. Appl. , 69(1):51–75, 2018.
7[7] J. F. Bonnans and A. Shapiro. Perturbation analysis of optimization problems . Springer Science & Business Media, 2013.
8[8] N. Boumal, V. Voroninski, and A. Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Adv. Neural Inf. Process. Syst. , pages 2757–2765, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the Burer-Monteiro method

Abstract.

Key words and phrases:

1. Introduction

2. Criticality conditions

3. Inequality constrained SDPs

Theorem 1**.**

Example 1** (Integer quadratic minimization).**

Theorem 2**.**

Lemma 1**.**

Proof.

Proof of Theorem 2.

Theorem 3**.**

Proof.

Proposition 1**.**

Example 2** (Matrix sensing).**

Remark*.*

4. General SDPs

Theorem 4**.**

Example 3** (Inequalities).**

Example 4** (Second-order cone).**

Theorem 5**.**

Lemma 2**.**

Proof.

Proof of Theorem 5.

Example 5** (Matrix sensing).**

Acknowledgments

Appendix A Regularity with generic constraints

Theorem 6** (Sard).**

Proof of Proposition 1.

Theorem 1.

Example 1 (Integer quadratic minimization).

Theorem 2.

Lemma 1.

Theorem 3.

Proposition 1.

Example 2 (Matrix sensing).

*Remark**.*

Theorem 4.

Example 3 (Inequalities).

Example 4 (Second-order cone).

Theorem 5.

Lemma 2.

Example 5 (Matrix sensing).

Theorem 6 (Sard).