On the Burer-Monteiro method for general semidefinite programs
Diego Cifuentes

TL;DR
This paper extends theoretical guarantees for the Burer-Monteiro nonconvex approach to solve general semidefinite programs, including those with inequalities and multiple constraints, with applications to matrix sensing and quadratic minimization.
Contribution
It generalizes existing results to broader classes of SDPs, providing new guarantees for the Burer-Monteiro method with fixed cost matrices and generic constraints.
Findings
Guarantees for the Burer-Monteiro method extend to arbitrary SDPs.
Applicable to SDPs with inequalities and multiple semidefinite constraints.
Demonstrates effectiveness in matrix sensing and quadratic minimization.
Abstract
Consider a semidefinite program (SDP) involving an positive semidefinite matrix . The Burer-Monteiro method uses the substitution to obtain a nonconvex optimization problem in terms of an matrix . Boumal et al. showed that this nonconvex method provably solves equality-constrained SDPs with a generic cost matrix when , where is the number of constraints. In this note we extend their result to arbitrary SDPs, possibly involving inequalities or multiple semidefinite constraints. We derive similar guarantees for a fixed cost matrix and generic constraints. We illustrate applications to matrix sensing and integer quadratic minimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On the Burer-Monteiro method
for general semidefinite programs
Diego Cifuentes
Massachusetts Institute of Technology
Cambridge, MA, USA
Abstract.
Consider a semidefinite program (SDP) involving an positive semidefinite matrix . The Burer-Monteiro method uses the substitution to obtain a nonconvex optimization problem in terms of an matrix . Boumal et al. showed that this nonconvex method provably solves equality-constrained SDPs with a generic cost matrix when , where is the number of constraints. In this note we extend their result to arbitrary SDPs, possibly involving inequalities or multiple semidefinite constraints. We derive similar guarantees for a fixed cost matrix and generic constraints. We illustrate applications to matrix sensing and integer quadratic minimization.
Key words and phrases:
Semidefinite programming, Burer-Monteiro method, Low rank factorization, Nonconvex optimization, Spurious local minima
1. Introduction
Consider a semidefinite program (SDP) in , the space of symmetric matrices, with constraints ( equalities and inequalities):
[TABLE]
where , and , is a linear map. We assume that is nonempty and that the minimum is achieved. Though interior point methods can solve (SDP) in polynomial time, they often run into memory problems for large values of . This has motivated a surge of newer, more scalable techniques; see the recent survey [18]. We study here the low rank factorization method, pioneered by Burer and Monteiro [10, 11].
The Burer-Monteiro method consists in writing for some , and solving the following nonconvex optimization problem:
[TABLE]
Let be the -th triangular number. Barvinok [3] and Pataki [22] independently showed that (SDP) has an optimal solution of rank , with . Consequently, problems (SDP) and (BM) are equivalent for any with . But due to nonconvexity, local optimization methods may not always recover the global optimum of (BM). Nonetheless, the Burer-Monteiro performs very well in several applications, see e.g., [10, 16, 24].
There has been much recent work in proving global guarantees for (BM). Most remarkably, Boumal et al. [8, 9] showed that equality-constrained SDPs () have no spurious 2nd-order critical points when under certain assumptions. Concretely, they require that the cost matrix is generic and that the feasible set of (BM) is sufficiently regular. By generic we mean that the result holds outside a set of measure zero. Though other global guarantees for (BM) exist, e.g. [13, 20], their setting is more restrictive.
In this note we generalize the result from Boumal et al. [8, 9] to arbitrary SDPs, possibly involving inequalities or multiple positive semidefinite (PSD) constraints. For the inequality-constrained problem (SDP), we show in Theorem 1 that if and the cost is generic, then any 2nd-order critical point of (BM) is globally optimal. Similar guarantees might be derived even when the cost matrix is fixed, see Theorem 3. We show applications to integer quadratic minimization and PSD matrix sensing.
Our proof of Theorem 1 is simpler than the one in [8, 9], as it relies on nonlinear programming instead of Riemannian optimization. This simplicity is reflected in the fact that Theorem 1 does not require any regularity assumptions on the domain (constraint qualifications). Nevertheless, regularity conditions might still be needed to prevent the existence of local minima that do not satisfy the 2nd-order criticality conditions.
We also consider SDPs involving multiple PSD variables, and study the Burer-Monteiro method applied to a subset of these variables. We prove in Theorem 4 that, for a generic cost, any 2nd-order critical point is globally optimal when satisfies a bound due to Pataki [22]. We present an application to symmetric matrix sensing (the restricted isometry property is not needed).
The structure of this note is as follows. Section 2 reviews the notion of 2nd-order critical points in nonlinear programming. Section 3 analyzes the Burer-Monteiro method for the inequality-constrained problem (SDP). Section 4 studies SDPs involving multiple PSD constraints.
Related work. The guarantees from Boumal et al. have been further studied in [25, 5, 23, 12], but all these papers focus on the equality-constrained case. The bound was shown to be optimal up to lower order terms in [25]. Guarantees for approximate 2-critical points were derived in [5, 23, 12]. The first polynomial time bounds for the Burer-Monteiro method were recently proved in [12]. We hope that the techniques developed in this paper may lead to polynomial time guarantees for arbitrary SDPs.
2. Criticality conditions
We review the notion of critical points. Consider the nonlinear program
[TABLE]
Let be the Lagrangian function. Let be the indices of the active constraints at , i.e., the indices for which . The 1st-order and 2nd-order necessary optimality conditions are:
[TABLE]
A point is 1st-order critical for (NLP), abbreviated 1-critical, if there exist multipliers satisfying (1a). The point is 2nd-order critical, abbreviated 2-critical, if (1b) also holds. A critical point is spurious if it is not the global minimum of (NLP).
Given a local minimum of (NLP), it is known that satisfies (1) under suitable regularity assumptions. Different regularity conditions, known as constraint qualifications, have been proposed [4]. One of the simplest is:
[TABLE]
Various algorithms with provable convergence guarantees to 2-critical points are known, see e.g., [2, 14, 6] and the references therein. These results rely either on (LICQ) or a weaker constraint qualification.
More generally, consider the nonlinear conic program
[TABLE]
where is a closed convex cone. The Lagrangian function is . The following 1st-order conditions are necessary for optimality under suitable regularity conditions, see e.g., [7, §3.1]:
[TABLE]
A point is 1-critical for (NLCP) if it satisfies (2a) for some . The point is 2-critical if (2b) also holds.
There are several algorithms for (NLCP) for the case , see the survey paper [26]. Symmetric cones (e.g., products of PSD cones) were studied in [17]. These methods are provably convergent to 1-critical points. In order to escape from points that do not satisfy (2b) we may rely on 2nd-order methods for the (NLP) given by fixing the coordinate.
3. Inequality constrained SDPs
Consider problems (SDP) and (BM). For , recall that the -th constraint is active at if . Let be the largest number of linearly independent constraints that can be simultaneously active. For instance, if then . We will show the following theorem.
Theorem 1**.**
Let such that . For a generic , problem (BM) has no spurious 2-critical points. This means that any 2-critical point for (BM) is also globally optimal, and hence is optimal for (SDP).
Example 1** (Integer quadratic minimization).**
Consider the optimization problem where is a convex quadratic function. Denoting , we may write for some . The following SDP relaxation for this problem was proposed in [21]:
[TABLE]
By Theorem 1, for a generic cost function any 2-critical point of the Burer-Monteiro problem is globally optimal when .
By generic, we mean the following. For fixed , , the set of all cost matrices for which (BM) has a spurious 2-critical point has measure zero. We can provide an explicit characterization of this measure-zero set in . This set is contained in the Minkowski sum of two special algebraic sets. The first algebraic set is given by a rank constraint:
[TABLE]
It is known that , see e.g., [15, Prop.2.1]. The second algebraic set is a union of linear subspaces:
[TABLE]
where the union is over the possible subsets of constraints that can be simultaneously active. Note that by definition of .
Theorem 2**.**
If (BM) has a spurious 2-critical point then .
Theorem 1 follows directly from Theorem 2. Indeed, if then
[TABLE]
Therefore is a proper algebraic set in , and has measure zero.
We proceed to prove Theorem 2. We first derive the criticality conditions for (BM). This is a special instance of (NLP), so we need to specialize (1). We have and , where
[TABLE]
and , is the adjoint of . The 1st-order and 2nd-order criticality conditions are:
[TABLE]
The following lemma establishes sufficient conditions for a critical point to be global optimal. The lemma is known, see [11, 16, 9], but our assumptions are slightly different since we allow inequalities.
Lemma 1**.**
Either of the following conditions imply global optimality:
- (i)
* is 1-critical and the multiplier satisfies ,* 2. (ii)
or is 2-critical and is column rank deficient.
Proof.
(i) The conic dual of (SDP) is Let satisfy (6a), and let . We will show that the primal/dual pair is optimal for the SDP. It suffices to verify three conditions: is primal feasible, is dual feasible, and complementary slackness holds (i.e., for and ). Primal feasibility and complementary slackness follow from (6a), while dual feasibility corresponds to .
(ii) Let satisfy (6). By the above item, it suffices to show that . Let , and let us see that . Since is rank deficient, there is a nonzero vector such that . The matrix satisfies , so by (6b). Since , then . ∎
We are ready to prove Theorem 2 (which implies Theorem 1).
Proof of Theorem 2.
Let a spurious point satisfying (6). Lemma 1(ii) gives that . By (6a) we have , which implies , and also for . Thus . ∎
To finish this section, we observe that Theorem 2 can be used even if the cost matrix is not generic. For instance, the next theorem assumes that both are fixed and is generic (i.e., are generic).
Theorem 3**.**
Let such that and . For a generic , problem (BM) has no spurious 2-critical points.
Proof.
By Theorem 2, it suffices to see that . Fix , and let as in (4). Note that is generic among the subspaces of dimension , as it depends on the generic matrices . Recall that by (5). Since , then for a generic . The result follows from . ∎
An additional advantage of having generic constraints is that regularity is always satisfied. Therefore any local minimum of (BM) is also 2-critical, and hence is subject to Theorem 3. The next proposition is shown in Appendix A.
Proposition 1**.**
Assume that the entries of are nonzero. For a generic , any feasible point of (BM) satisfies (LICQ).
Example 2** (Matrix sensing).**
Given a linear map and a vector , consider finding a low rank matrix such that . A standard technique to promote low rank is to minimize the nuclear norm:
[TABLE]
If we further assume that that is PSD, the cost function is . By Theorem 3, if is generic and , then any local minimum of (BM) is globally optimal. The PSD assumption will be relaxed in the next section.
Remark*.*
Different guarantees about the Burer-Monteiro method for matrix sensing were obtained in [20], relying on the restricted isometry property.
4. General SDPs
Let and . We consider an SDP involving PSD matrices of sizes and a free variable of dimension . Let the Euclidean space and the convex cone . Given , , and a linear map , consider:
[TABLE]
where with , . As before, we assume that is nonempty and that the minimum is achieved.
We apply the Burer-Monteiro method to the first matrices. Let , with , and let . We denote
[TABLE]
In particular, . The Burer-Monteiro problem is:
[TABLE]
Pataki [22] showed that () always has an optimal solution such that , where . We can ensure that there is a solution with for all if either or , with
[TABLE]
where the maximum is over the possible ranks . Hence, problems () and () agree when for .
Theorem 4**.**
Assume that for . For a generic , problem () has no spurious 2-critical points.
Example 3** (Inequalities).**
Consider the inequality constrained problem (SDP). We may view each of the inequalities as a PSD constraint on a matrix. So this is a special instance of () with , , , and . Note that when the -th inequality constraint is inactive, and is zero otherwise. Hence . This is consistent with the results from Section 3.
Example 4** (Second-order cone).**
Let be the second-order cone. Consider minimizing a linear cost on subject to linear equalities. Apply the Burer-Monteiro factorization to the matrix in . We can embed inside by adding new linear equalities, see [1, pg.7]. So this is a special case of () with , , , . Given , the rank of the corresponding PSD matrix is if , if lies in the boundary, and if lies in the interior. So Theorem 4 applies when , where is the smallest feasible rank. We point out that embedding inside is used for the analysis, but we do not need to do this in practice. The reason is that the embedding preserves critical points.
We also provide an explicit characterization of the costs for which spurious 2-critical points may exist. These costs lie in the Minkowski sum of two algebraic sets, which are closely related to the ones in (3) and (4).
Theorem 5**.**
If () has a spurious 2-critical point, then lies in the algebraic set , with
[TABLE]
where the last union is over the possible ranks in ().
Theorem 4 follows from Theorem 5 by counting dimensions. Let be the minimum of , ignoring the values with . Note that
[TABLE]
Let . If , then
[TABLE]
Hence has measure zero.
We proceed to prove Theorem 5. We first derive the optimality conditions for (). This is a special instance of (NLCP), so we need to specialize (2). For , consider the slack variable . Let be the -th component of . Similarly define and . The criticality conditions are:
[TABLE]
We now provide sufficient conditions for global optimality of critical points.
Lemma 2**.**
Either of the following conditions imply global optimality:
- (i)
* is 1-critical and for .* 2. (ii)
or is 2-critical and is column rank deficient for .
Proof.
The proof is analogous to Lemma 1. For (i) we compare (8a) with the primal/dual optimality conditions for (). For (ii) we use a vector in the right kernel of in order to show that . ∎
Proof of Theorem 5.
Let a spurious point satisfying (8). By Lemma 2(ii) we have tat for some . As then . Let be the ranks of . Since and both lie in , then . Hence , as . The result follows from . ∎
As illustrated next, Theorem 5 can be used even when is not generic.
Example 5** (Matrix sensing).**
We revisit the problem of sensing symmetric matrices from Example 2. For , its nuclear norm satisfies:
[TABLE]
Let , . We can rewrite problem (7) as follows:
[TABLE]
Consider the Burer-Monteiro method applied to both matrices , so that , using the same rank for both matrices. We will prove that there are no spurious 2-critical points when is generic and . By Theorem 5, we need to show that
[TABLE]
It suffices to see that . But this was shown in Theorem 3.
Acknowledgments
The author thanks Nicolas Boumal, Ankur Moitra, Pablo Parrilo, and David Rosen for helpful discussions and comments.
Appendix A Regularity with generic constraints
In this section we prove Proposition 1. Our proof relies on Sard’s theorem from differential geometry, see e.g., [19, §2].
Theorem 6** (Sard).**
Let be a smooth map, with . Let be a generic point. Then for any .
Proof of Proposition 1.
Fix a set of indices , and let
[TABLE]
We claim that (LICQ) holds at all points on (i.e., the gradients are linearly independent). If this happens for each , then (LICQ) also holds for the feasible set of (BM). So it suffices to show the claim.
We prove the claim under a more restrictive genericity setting. We assume that each and that , where are fixed matrices and are generic scalars. Let
[TABLE]
The vector is generic since are generic. By Theorem 6, is full rank for any . So (LICQ) holds on . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program. , 95:3–51, 2003.
- 2[2] R. Andreani, E. G. Birgin, J. M. Martínez, and M. L. Schuverdt. Second-order negative-curvature methods for box-constrained and general constrained optimization. Comput. Optim. Appl. , 45(2):209–236, 2010.
- 3[3] A. I. Barvinok. Problems of distance geometry and convex properties of quadratic maps. Discrete Comput. Geom. , 13(2):189–202, 1995.
- 4[4] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear programming: theory and algorithms . John Wiley & Sons, 2013.
- 5[5] S. Bhojanapalli, N. Boumal, P. Jain, and P. Netrapalli. Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form. In Conf. Learn. Theory , pages 3243–3270, 2018.
- 6[6] E. G. Birgin, G. Haeser, and A. Ramos. Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Comput. Optim. Appl. , 69(1):51–75, 2018.
- 7[7] J. F. Bonnans and A. Shapiro. Perturbation analysis of optimization problems . Springer Science & Business Media, 2013.
- 8[8] N. Boumal, V. Voroninski, and A. Bandeira. The non-convex Burer-Monteiro approach works on smooth semidefinite programs. In Adv. Neural Inf. Process. Syst. , pages 2757–2765, 2016.
