A Penalty Method for Rank Minimization Problems in Symmetric Matrices

Xin Shen; John E. Mitchell

arXiv:1701.03218·math.OC·February 2, 2018·Comput. Optim. Appl.

A Penalty Method for Rank Minimization Problems in Symmetric Matrices

Xin Shen, John E. Mitchell

PDF

Open Access

TL;DR

This paper introduces a penalty method for solving rank minimization problems in symmetric matrices by reformulating them as a nonconvex semidefinite program, analyzing solution properties, and proposing an algorithm with computational validation.

Contribution

It develops a penalty approach for symmetric rank minimization, analyzes solution calmness, and proposes a PALM-based algorithm with momentum for improved performance.

Findings

01

Locally optimal solutions are KKT points.

02

Calmness results support the penalty approach.

03

Computational experiments demonstrate effectiveness.

Abstract

The problem of minimizing the rank of a symmetric positive semidefinite matrix subject to constraints can be cast equivalently as a semidefinite program with complementarity constraints (SDCMPCC). The formulation requires two positive semidefinite matrices to be complementary. This is a continuous and nonconvex reformulation of the rank minimization problem. We investigate calmness of locally optimal solutions to the SDCMPCC formulation and hence show that any locally optimal solution is a KKT point. We develop a penalty formulation of the problem. We present calmness results for locally optimal solutions to the penalty formulation. We also develop a proximal alternating linearized minimization (PALM) scheme for the penalty formulation, and investigate the incorporation of a momentum term into the algorithm. Computational results are presented.

Equations172

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{R}^{m\times n}}}}&\mbox{rank}(X)\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&X\in\mathcal{C}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{R}^{m\times n}}}}&\mbox{rank}(X)\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&X\in\mathcal{C}\end{array}

∣∣ X ∣ ∣_{*} = \sum σ_{i} = t r a ce (X^{T} X)

∣∣ X ∣ ∣_{*} = \sum σ_{i} = t r a ce (X^{T} X)

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{R}^{m\times n}}}}&||X||_{*}\\[10.84006pt] \mbox{subject to}&X\in\mathcal{C}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{R}^{m\times n}}}}&||X||_{*}\\[10.84006pt] \mbox{subject to}&X\in\mathcal{C}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{S}^{n}}}}&\langle W,X\rangle\\[10.84006pt] \mbox{subject to}&X\in\mathcal{C}\cap\mathbb{S}^{n}_{+}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{S}^{n}}}}&\langle W,X\rangle\\[10.84006pt] \mbox{subject to}&X\in\mathcal{C}\cap\mathbb{S}^{n}_{+}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{x\,\in\,\mathbb{R}^{q}}}}&f(x)\\[10.84006pt] \mbox{subject to}&g(x)\,\leq\,0\\ &h(x)\,=\,0\\ &\mathbb{S}^{n}_{+}\,\ni\,G(x)\,\perp\,H(x)\,\in\,\mathbb{S}^{n}_{+}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{x\,\in\,\mathbb{R}^{q}}}}&f(x)\\[10.84006pt] \mbox{subject to}&g(x)\,\leq\,0\\ &h(x)\,=\,0\\ &\mathbb{S}^{n}_{+}\,\ni\,G(x)\,\perp\,H(x)\,\in\,\mathbb{S}^{n}_{+}\end{array}

⟨ A, B ⟩ = t r a ce (A^{T} B) .

⟨ A, B ⟩ = t r a ce (A^{T} B) .

c (x) := ⟨ G (x), H (x)⟩ .

c (x) := ⟨ G (x), H (x)⟩ .

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{S}^{n}}}}&\mbox{rank}(X)\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&X\in\tilde{\mathcal{C}}\\[10.84006pt] &X\,\succeq\,0\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X\,\in\,\mathbb{S}^{n}}}}&\mbox{rank}(X)\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&X\in\tilde{\mathcal{C}}\\[10.84006pt] &X\,\succeq\,0\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X,U\,\in\,\mathbb{S}^{n}}}}&n\,-\,\langle I,\,U\rangle\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&X\in\tilde{\mathcal{C}}\\[10.84006pt] &0\,\preceq\,X\,\perp\,U\,\succeq\,0\\[10.84006pt] &0\,\preceq\,I\,-\,U\\[10.84006pt] &X\,\succeq\,0\\[10.84006pt] &U\,\succeq\,0\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{X,U\,\in\,\mathbb{S}^{n}}}}&n\,-\,\langle I,\,U\rangle\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&X\in\tilde{\mathcal{C}}\\[10.84006pt] &0\,\preceq\,X\,\perp\,U\,\succeq\,0\\[10.84006pt] &0\,\preceq\,I\,-\,U\\[10.84006pt] &X\,\succeq\,0\\[10.84006pt] &U\,\succeq\,0\end{array}

X = P^{T} Σ P

X = P^{T} Σ P

U = P_{0} P_{0}^{T}

U = P_{0} P_{0}^{T}

\mbox r ank (X) = n - ⟨ I, U ⟩

\mbox r ank (X) = n - ⟨ I, U ⟩

O pt (\ref R ank P S D) \geq O pt (\ref C o m pl e R ank P S D)

O pt (\ref R ank P S D) \geq O pt (\ref C o m pl e R ank P S D)

Z\,=\,\left[\begin{array}[]{clcr}G&X^{T}\\ X&B\end{array}\right]\succeq 0

Z\,=\,\left[\begin{array}[]{clcr}G&X^{T}\\ X&B\end{array}\right]\succeq 0

⟨ U, X^{T} X ⟩ = 0

⟨ U, X^{T} X ⟩ = 0

\begin{array}[]{lrcl}\min_{x\in^{q}}&f(x)\\ \mbox{subject to}&g(x)&\leq&0\\ &h(x)&=&0\\ &G(x)&\in&K\end{array}

\begin{array}[]{lrcl}\min_{x\in^{q}}&f(x)\\ \mbox{subject to}&g(x)&\leq&0\\ &h(x)&=&0\\ &G(x)&\in&K\end{array}

f (x) - f (\overset{x}{ˉ}) + μ ∣∣ (r, s, P) ∣∣ \geq 0

f (x) - f (\overset{x}{ˉ}) + μ ∣∣ (r, s, P) ∣∣ \geq 0

h (x) + r = 0, g (x) + s \leq 0, G (x) + P \in K .

h (x) + r = 0, g (x) + s \leq 0, G (x) + P \in K .

\begin{array}[]{l}0\,\in\,\partial f(\bar{x})\,+\,\partial\langle h,\lambda^{h}\rangle(\bar{x})\,+\,\partial\langle g,\lambda^{g}\rangle(\bar{x})\\[5.0pt] \qquad\qquad\,+\,\partial\langle G,\Omega^{G}\rangle(\bar{x})\,+\,\partial\langle H,\Omega^{H}\rangle(\bar{x})\,+\,\lambda^{c}\partial c(\bar{x}),\\[5.0pt] \lambda^{g}\geq 0,\,\langle g(\bar{x}),\lambda^{g}\rangle=0,\,\Omega^{G}\in\mathbb{S}^{n}_{+},\,\Omega^{H}\in\mathbb{S}^{n}_{+},\\[5.0pt] \langle\Omega^{G},G(\bar{x})\rangle=0,\,\langle\Omega^{H},H(\bar{x})\rangle=0.\end{array}

\begin{array}[]{l}0\,\in\,\partial f(\bar{x})\,+\,\partial\langle h,\lambda^{h}\rangle(\bar{x})\,+\,\partial\langle g,\lambda^{g}\rangle(\bar{x})\\[5.0pt] \qquad\qquad\,+\,\partial\langle G,\Omega^{G}\rangle(\bar{x})\,+\,\partial\langle H,\Omega^{H}\rangle(\bar{x})\,+\,\lambda^{c}\partial c(\bar{x}),\\[5.0pt] \lambda^{g}\geq 0,\,\langle g(\bar{x}),\lambda^{g}\rangle=0,\,\Omega^{G}\in\mathbb{S}^{n}_{+},\,\Omega^{H}\in\mathbb{S}^{n}_{+},\\[5.0pt] \langle\Omega^{G},G(\bar{x})\rangle=0,\,\langle\Omega^{H},H(\bar{x})\rangle=0.\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{x_{1},x_{2}}}}&x_{2}\\[10.84006pt] \mbox{s.t}&G(x)\,=\,\left[\begin{array}[]{clcr}x_{2}+1&0&0\\ 0&x_{1}&x_{2}\\ 0&x_{2}&0\end{array}\right]\,\succeq\,0\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{x_{1},x_{2}}}}&x_{2}\\[10.84006pt] \mbox{s.t}&G(x)\,=\,\left[\begin{array}[]{clcr}x_{2}+1&0&0\\ 0&x_{1}&x_{2}\\ 0&x_{2}&0\end{array}\right]\,\succeq\,0\end{array}

x_{1} = δ \mbox an d x_{2} = - δ^{2}

x_{1} = δ \mbox an d x_{2} = - δ^{2}

\begin{array}[]{ll}M\,=\,\left[\begin{array}[]{clcr}1-\delta^{2}&0&0\\ 0&\delta&-\delta^{2}\\ 0&-\delta^{2}&\delta^{3}\end{array}\right]\,\succeq\,0\end{array}

\begin{array}[]{ll}M\,=\,\left[\begin{array}[]{clcr}1-\delta^{2}&0&0\\ 0&\delta&-\delta^{2}\\ 0&-\delta^{2}&\delta^{3}\end{array}\right]\,\succeq\,0\end{array}

∣∣ G (δ, - δ^{2}) - M ∣∣ = δ^{3} .

∣∣ G (δ, - δ^{2}) - M ∣∣ = δ^{3} .

\frac{f ( x _{1} , x _{2} ) - f ( 0 , 0 )}{∣∣ G ( δ , - δ ^{2} ) - M ∣∣} = \frac{- δ ^{2}}{∣∣ G ( δ , - δ ^{2} ) - M ∣∣} \leq \frac{- δ ^{2}}{δ ^{3}} \to - \infty

\frac{f ( x _{1} , x _{2} ) - f ( 0 , 0 )}{∣∣ G ( δ , - δ ^{2} ) - M ∣∣} = \frac{- δ ^{2}}{∣∣ G ( δ , - δ ^{2} ) - M ∣∣} \leq \frac{- δ ^{2}}{δ ^{3}} \to - \infty

C = S_{+}^{n} \cap {X : g_{i} (X) \leq 0, i = 1, \dots, m_{1}}

C = S_{+}^{n} \cap {X : g_{i} (X) \leq 0, i = 1, \dots, m_{1}}

\begin{array}[]{llr}\min_{X\in\mathbb{S}^{n}_{+}}&\psi(X)\\ \mbox{subject to}&X\in\mathcal{C}&\qquad(RC(k))\\ &\mbox{rank}(X)\,\leq\,k.\end{array}

\begin{array}[]{llr}\min_{X\in\mathbb{S}^{n}_{+}}&\psi(X)\\ \mbox{subject to}&X\in\mathcal{C}&\qquad(RC(k))\\ &\mbox{rank}(X)\,\leq\,k.\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{\hat{X},\hat{U}\,\in\,\mathbb{S}^{n}}}}&n\,-\,\langle I,\,\hat{U}\rangle\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&\hat{X}\,+\,p\in\tilde{\mathcal{C}}\\[10.84006pt] &|\langle\hat{X},\,\hat{U}\rangle|\,\leq\,q\\[10.84006pt] &\lambda_{min}(I\,-\,\hat{U})\,\geq\,-r\\[10.84006pt] &\lambda_{min}(\hat{X})\,\geq\,-h_{1}\\[10.84006pt] &\lambda_{min}(\hat{U})\,\geq\,-h_{2}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{\hat{X},\hat{U}\,\in\,\mathbb{S}^{n}}}}&n\,-\,\langle I,\,\hat{U}\rangle\,+\,\psi(X)\\[10.84006pt] \mbox{subject to}&\hat{X}\,+\,p\in\tilde{\mathcal{C}}\\[10.84006pt] &|\langle\hat{X},\,\hat{U}\rangle|\,\leq\,q\\[10.84006pt] &\lambda_{min}(I\,-\,\hat{U})\,\geq\,-r\\[10.84006pt] &\lambda_{min}(\hat{X})\,\geq\,-h_{1}\\[10.84006pt] &\lambda_{min}(\hat{U})\,\geq\,-h_{2}\end{array}

\begin{array}[]{rcl}\langle I,U\rangle-\langle I,\hat{U}\rangle&\geq&-\frac{2q}{\tilde{\lambda}}\,-\,\left(n-\mbox{rank}(X)\right)\left(r+(1\,+\,r)\frac{2}{\tilde{\lambda}}\,h_{1}\right)\\[5.0pt] &&\qquad\qquad-\,h_{2}\left(\frac{4}{\tilde{\lambda}}\,||X||_{*}-\mbox{rank}(X)\right)\end{array}

\begin{array}[]{rcl}\langle I,U\rangle-\langle I,\hat{U}\rangle&\geq&-\frac{2q}{\tilde{\lambda}}\,-\,\left(n-\mbox{rank}(X)\right)\left(r+(1\,+\,r)\frac{2}{\tilde{\lambda}}\,h_{1}\right)\\[5.0pt] &&\qquad\qquad-\,h_{2}\left(\frac{4}{\tilde{\lambda}}\,||X||_{*}-\mbox{rank}(X)\right)\end{array}

\begin{array}[]{llr}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{\tilde{U}\,\in\,\mathbb{S}^{n}}}}&n\,-\,\langle I,\,\tilde{U}\rangle\\[10.84006pt] \mbox{subject to}&-\langle\hat{X},\,\tilde{U}\rangle\,\leq\,q,&\qquad y_{1}\\[10.84006pt] &\langle\hat{X},\,\tilde{U}\rangle\,\leq\,q,&\qquad y_{2}\\[10.84006pt] &I\,-\,\tilde{U}\,\succeq\,-r\,I,&\qquad\Omega_{1}\\[10.84006pt] &\tilde{U}\,\succeq\,-h_{2}\,I,&\qquad\Omega_{2}\end{array}

\begin{array}[]{llr}\displaystyle{{\operatornamewithlimits{\mbox{minimize}}_{\tilde{U}\,\in\,\mathbb{S}^{n}}}}&n\,-\,\langle I,\,\tilde{U}\rangle\\[10.84006pt] \mbox{subject to}&-\langle\hat{X},\,\tilde{U}\rangle\,\leq\,q,&\qquad y_{1}\\[10.84006pt] &\langle\hat{X},\,\tilde{U}\rangle\,\leq\,q,&\qquad y_{2}\\[10.84006pt] &I\,-\,\tilde{U}\,\succeq\,-r\,I,&\qquad\Omega_{1}\\[10.84006pt] &\tilde{U}\,\succeq\,-h_{2}\,I,&\qquad\Omega_{2}\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{maximize}}_{y_{1},y_{2}\in\mathbb{R},\,\Omega_{1},\Omega_{2}\,\in\,\mathbb{S}^{n}}}}&n\,+\,q\,y_{1}\,+\,q\,y_{2}\,-\,(1\,+\,r)\,trace(\Omega_{1})\,-\,h_{2}\,trace(\Omega_{2})\\[10.84006pt] \mbox{subject to}&-y_{1}\,\hat{X}\,+\,y_{2}\,\hat{X}\,-\,\Omega_{1}\,+\,\Omega_{2}\,=\,-I\\[10.84006pt] &y_{1},\,y_{2}\,\leq\,0\\[10.84006pt] &\Omega_{1},\,\Omega_{2}\,\succeq\,0\end{array}

\begin{array}[]{ll}\displaystyle{{\operatornamewithlimits{\mbox{maximize}}_{y_{1},y_{2}\in\mathbb{R},\,\Omega_{1},\Omega_{2}\,\in\,\mathbb{S}^{n}}}}&n\,+\,q\,y_{1}\,+\,q\,y_{2}\,-\,(1\,+\,r)\,trace(\Omega_{1})\,-\,h_{2}\,trace(\Omega_{2})\\[10.84006pt] \mbox{subject to}&-y_{1}\,\hat{X}\,+\,y_{2}\,\hat{X}\,-\,\Omega_{1}\,+\,\Omega_{2}\,=\,-I\\[10.84006pt] &y_{1},\,y_{2}\,\leq\,0\\[10.84006pt] &\Omega_{1},\,\Omega_{2}\,\succeq\,0\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Computational Geometry and Mesh Generation

Full text

A Penalty Method for Rank Minimization Problems

in Symmetric Matrices††thanks: This work was supported in part by the Air Force Office of Sponsored Research under grants FA9550-08-1-0081 and FA9550-11-1-0260 and by the National Science Foundation under Grants Number CMMI-1334327 and DMS-1736326.

Xin Shen Monsanto, St. Louis, MO.

John E. Mitchell Department of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180 ([email protected], http://www.rpi.edu/m̃itchj).

Abstract

The problem of minimizing the rank of a symmetric positive semidefinite matrix subject to constraints can be cast equivalently as a semidefinite program with complementarity constraints (SDCMPCC). The formulation requires two positive semidefinite matrices to be complementary. This is a continuous and nonconvex reformulation of the rank minimization problem. We investigate calmness of locally optimal solutions to the SDCMPCC formulation and hence show that any locally optimal solution is a KKT point.

We develop a penalty formulation of the problem. We present calmness results for locally optimal solutions to the penalty formulation. We also develop a proximal alternating linearized minimization (PALM) scheme for the penalty formulation, and investigate the incorporation of a momentum term into the algorithm. Computational results are presented.

Keywords: Rank minimization, penalty methods, alternating minimization

AMS Classification: 90C33, 90C53

1 Introduction to the Rank Minimization Problem

Recently rank constrained optimization problems have received increasing interest because of their wide application in many fields including statistics, communication and signal processing fazel2001rank ; srebro2003weighted . In this paper we mainly consider one genre of the problems whose objective is to minimize the rank of a matrix subject to a given set of constraints. We consider the slightly more general form below:

[TABLE]

where $\mathbb{R}^{m\times n}$ is the space of size m by n matrices, $\psi(X)$ is a Lipschitz continuous function, and $\mathcal{C}$ is the feasible region for X; $\mathcal{C}$ is not necessarily convex.

The class of problems has been considered computationally challenging because of its nonconvex nature. The rank function is also highly discontinuous, which makes rank minimization problems hard to solve. Methods using nonconvex optimization to solve rank minimization problems include kmOh2010 ; llsWei2016 ; sLuo2015 ; tWei2016 . In contrast to the method in this paper, these references work with an explicit low rank factorization of the matrix of interest. Other methods based on a low-rank factorization include the thresholding methods candes1 ; cambierabsil2016 ; vandereycken2013 ; weiCCL2016b . Our approach works with a nonconvex nonlinear optimization problem that is an exact reformulation of the rank minimization problem.

The exact reformulation of the rank minimization problem is a mathematical program with semidefinite cone complementarity constraints (SDCMPCC). Similar to the LPCC formulation for the $\ell_{0}$ minimization problem bkSchwartz2014 ; FMPSWachter13 , the advantage of the SDCMPCC formulation is that it can be expressed as a smooth nonlinear program, thus it can be solved by general nonlinear programming algorithms. The purpose of this paper is to investigate whether nonlinear semidefinite programming algorithms can be applied to solve the SDCMPCC formulation and examine the quality of solution returned by the nonlinear algorithms. We’re faced with two challenges. The first one is the nonconvexity of the SDCMPCC formulation, which means that we can only assure that the solutions we find are locally optimal. The second is that most nonlinear algorithms use KKT conditions as their termination criteria. Since a general SDCMPCC formulation is not well-posed because of the complementarity constraints, i.e, KKT stationarity may not hold at local optima, there might be some difficulties with the convergence of these algorithms. We show in Theorem 3.2 that any locally optimal point for the SDCMPCC formulation of the rank minimization problem does indeed satisfy the KKT conditions.

When $\psi(X)\equiv 0$ , a popular approach to choosing $X$ is to use the nuclear norm approximation fazel2001rank ; vandenberghe6 ; MaGoldfarbChenMP2010 ; candes1 , a convex approximation of the original rank minimization problem. The nuclear norm of a matrix $X\in\mathcal{R}^{m\times n}$ is defined as the sum of its singular values:

[TABLE]

In the approximated problem, the objective is to find a matrix with the minimal nuclear norm

[TABLE]

The nuclear norm is convex and continuous. Many algorithms have been developed previously to find the optimal solution to the nuclear norm minimization problem, including interior point methods vandenberghe6 , singular value thresholding candes1 , Augmented Lagrangian method lin2010augmented , proximal gradient method liu2012implementable , subspace selection method hsieh2014nuclear , reweighting methods mFazel2012 , and so on. These methods have been shown to be efficient and robust in solving large scale nuclear norm minimization problems in some applications. Previous works provided some explanation for the good performance for convex approximation by showing that nuclear norm minimization and rank minimization is equivalent under certain assumptions. Recht et al. recht2 presented a version of the restricted isometry property for a rank minimization problem. Under such a property the solution to the original rank minimization problem can be exactly recovered by solving the nuclear norm minimization problem. However, these properties are too strong and hard to validate, and the equivalence result cannot be extended to the general case. Zhang et al. zhang2013counterexample gave a counterexample in which the nuclear norm fails to find the matrix with the minimal rank.

In this paper, we focus on the case of symmetric matrices $X$ . Let $\mathbb{S}^{n}$ denote the set of symmetric $n\times n$ matrices, and $\mathbb{S}^{n}_{+}$ denote the cone of $n\times n$ symmetric positive semidefinite matrices. The set $\mathcal{C}$ is taken to be the intersection of $\mathbb{S}^{n}_{+}$ with another convex set, taken to be an affine manifold in our computational testing. Unless otherwise stated, the norms we use are the Euclidean 2-norm for vectors and the Frobenius norm for matrices.

To improve the performance of the nuclear norm minimization scheme in the case of symmetric positive semidefinite matrices, a reweighted nuclear norm heuristic was put forward by Mohan et al. mohan2010reweighted . In each iteration of the heuristic a reweighted nuclear norm minimization problem is solved, which takes the form:

[TABLE]

where $W$ is a positive semidefinite matrix, with W based upon the result of the last iteration. As with the standard nuclear norm minimization, the method only applies to problems with special structure. The lack of theoretical guarantee for these convex approximations in general problems motivates us to turn to the exact formulation of the rank function. In our computational testing, we compare the results obtained with our approach to those obtained through optimizing the nuclear norm.

We now summarize the contents of the paper. Throughout, we work with the set of symmetric positive semidefinite matrices $\mathbb{S}^{n}_{+}$ . The equivalent continuous reformulation of (1) is presented in Section 2. We show that any local minimizer for the continuous reformulation satisfies appropriate KKT conditions in Section 3 when $\mathcal{C}$ is given by the intersection of $\mathbb{S}^{n}_{+}$ and the set of solutions to a collection of continuous inequalities. A penalty approach is described in Section 4 in the case when $\mathcal{C}$ is the intersection of $\mathbb{S}^{n}_{+}$ and an affine manifold. An alternating approach to solve the penalty formulation is presented in Section 5, with test results on Euclidean distance matrix completion problems given in Section 6.

2 Semidefinite Cone Complementarity Formulation for Rank Minimization Problems

A mathematical program with semidefinite cone complementarity constraints (SDCMPCC) is a special case of a mathematical program with complementarity constraints (MPCC). In SDCMPCC problems the constraints include complementarity between matrices rather than vectors. The general SDCMPCC program takes the following form:

[TABLE]

where $f:^{q}\rightarrow$ , $h:^{q}\rightarrow^{p}$ , $g:^{q}\rightarrow^{m}$ , $G:^{q}\rightarrow\mathbb{S}^{n}$ and $H:^{q}\rightarrow\mathbb{S}^{n}$ . The requirement $G(x)\,\perp\,H(x)$ for $G(x),H(x)\in\mathbb{S}^{n}_{+}$ is that the Frobenius inner product of $G(x)$ and $H(x)$ is equal to 0, where the Frobenius inner product of two matrices $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{m\times n}$ is defined as

[TABLE]

We define

[TABLE]

It is shown in Bai et al. lijie2 that (4) can be reformulated as a convex conic completely positive optimization problem. However, the cone in the completely positive formulation does not have a polynomial-time separation oracle.

An SDCMPCC can be written as a nonlinear semidefinite program. Nonlinear semidefinite programming recently received much attention because of its wide applicability. Yamashita and Yabe yamashita2015survey surveyed numerical methods for solving nonlinear SDP programs, including Augmented Lagrangian methods kocvara2 , sequential SDP methods and primal-dual interior point methods. However, there is still much room for research in both theory and practice with solution methods, especially when the size of problem becomes large.

An SDCMPCC is a special case of a nonlinear SDP program. It is hard to solve in general. In addition to the difficulties in general nonlinear semidefinite programming, the complementarity constraints pose challenges to finding the local optimal solutions since the KKT condition may not hold at local optima. Previous work showed that optimality conditions in MPCC, such as M-stationary, C-Stationary and Strong Stationary, can be generalized into the class of SDCMPCC. Ding et al. dsye2010 discussed various kinds of first order optimality conditions of an SDCMPCC and their relationship with each other.

An exact reformulation of the rank minimization problem using semidefinite cone constraints is due to Ding et al. dsye2010 . We work with a special case of (1), in which the matrix variable $X\in\mathbb{R}^{n\times n}$ is restricted to be symmetric and positive semidefinite. The special case takes the form:

[TABLE]

By introducing an auxilliary variable $U\in\mathbb{R}^{n\times n}$ , we can model Problem (6) as a mathematical program with semidefinite cone complementarity constraints:

[TABLE]

The equivalence between Problem (6) and Problem (7) can be verified by a proper assignment of U for given feasible X. Suppose X has the eigenvalue decomposition:

[TABLE]

Let $P_{0}$ be the matrix composed of columns in P corresponding to zero eigenvalues. We can set:

[TABLE]

It is obvious that

[TABLE]

It follows that:

[TABLE]

The opposite direction of the above inequality can be easily validated by the complementarity constraints. If there exists any feasible matrix pair $(X,U)$ with the trace of U greater than $n-Opt(\ref{RankPSD})$ , the complementarity constraints would be violated: since all the eigenvalues of $U$ are no larger than 1, the rank of $U$ is at least as large as its trace, so the rank of $X$ would be smaller than the optimal value of (6).

The complementarity formulation can be extended to cases where the matrix variable $X\in\mathbb{R}^{m\times n}$ is neither positive semidefinite nor symmetric. One way to deal with nonsymmetric $X$ is to introduce an auxilliary variable Z:

[TABLE]

Liu et al. vandenberghe6 has shown that for any matrix $X$ , we can find matrix $G$ and $B$ such that $Z\succeq 0$ and $\mbox{rank}(Z)=\mbox{rank}(X)$ . The objective is to minimize the rank of matrix $Z$ instead of $X$ .

A drawback of the above extension is that it might introduce too many variables. An alternative way is to modify the complementarity constraint. If $m>n$ , the rank of matrix $X$ must be bounded by n and equals the rank of matrix $X^{T}X\in\mathbb{S}^{n}_{+}$ . $X^{T}X$ is both symmetric and positive semidefinite and we impose the following constraint:

[TABLE]

where $U\in\mathbb{S}^{n\times n}$ . The objective is minimize the rank of $X^{T}X$ instead, or equivalently to minimize $n-\langle I,U\rangle$ .

3 Constraint Qualification of the SDCMPCC Formulation

SDCMPCC problems are generally hard to solve and there have been discussions on potential methods to solve them wu2015properties ; zhang2011convergence , including relaxation and penalty methods. The original SDCMPCC formulation and all its variations fall into the genre of nonlinear semidefinite programming. Most existing algorithms use the KKT conditions as criteria for checking local optimality, and they terminate at KKT stationary points. The validity of KKT conditions at local optima can be guaranteed by constraint qualification. However, as pointed out in dsye2010 , common constraint qualifications such as LICQ and Robinson CQ are violated for SDCMPCC. The question arises as to whether any constraint qualification holds at the SDCMPCC formulation of a rank minimization problem. In this section we’ll show that a constraint qualification called calmness holds at any local optimum of the SDCMPCC formulation. In this section, we assume only that $\mathcal{C}$ is given by the intersection of a closed convex cone and the set of solutions to a collection of continuous inequalities.

3.1 Introduction of Calmness

Calmness was first defined by Clarke clarke1990optimization . If calmness holds then a first order KKT necessary condition holds at a local minimizer. Thus, calmness plays the role of a constraint qualification, although it involves both the objective function and the constraints. It has been discussed in the context of conic optimization problems in huang2006 ; jjYe15 ; zhai2014 , in addition to Ding et al. dsye2010 . Here, we give the definition from clarke1990optimization , adapted to our setting.

Definition 1

Let $K\subseteq^{n}$ be a convex cone. Let $f:^{q}\rightarrow$ , $h:^{q}\rightarrow^{p}$ , $g:^{q}\rightarrow^{m}$ , and $G:^{q}\rightarrow^{n}$ be continuous functions. A feasible point $\bar{x}$ to the conic optimization problem

[TABLE]

is Clarke calm if there exist positive $\epsilon$ and $\mu$ such that

[TABLE]

whenever $||(r,s,P)||\leq\epsilon$ , $||x-\bar{x}||\leq\epsilon$ , and $x$ satisfies the following conditions:

[TABLE]

The idea of calmness is that when there is a small perturbation in the constraints, the improvement in the objective value in a neighborhood of $\bar{x}$ must be bounded by some constant times the magnitude of perturbation.

Theorem 3.1

dsye2010 * If calmness holds at a local minimizer $\bar{x}$ of (4) then the following first order necessary KKT conditions hold at $\bar{x}$ :*

there exist multipliers $\lambda^{h}\in\mathbb{R}^{p}$ , $\lambda^{g}\in^{m}$ , $\Omega^{G}\in\mathbb{S}^{n}_{+}$ , $\Omega^{H}\in\mathbb{S}^{n}_{+}$ , and $\lambda^{c}\in$ such that the subdifferentials of the constraints and objective function of (4) satisfy

[TABLE]

In the framework of general nonlinear programming, previous results lu2012relation show that the Mangasarian-Fromowitz constraint qualification (MFCQ) and the constant-rank constraint qualification (CRCQ) imply local calmness. When all the constraints are linear, CRCQ will hold. However, in the case of SDCMPCC, calmness may not hold at locally optimal points. Linear semidefinite programming programs are a special case of SDCMPCC: take $H(x)$ identically equal to the zero matrix. Even in this case, calmness may not hold. For linear SDP, the optimality conditions in Theorem 3.1 correspond to primal and dual feasibility together with complementary slackness, so for example any linear SDP which has a duality gap or where dual attainment fails will not satisfy calmness. Consider the example below, where we show explicitly that calmness does not hold:

[TABLE]

It is trivial to see that any point $(x_{1},0)$ with $x_{1}\geq 0$ is a global optimal point to the problem. However:

Proposition 1

Calmness does not hold at any point $(x_{1},0)$ with $x_{1}\geq 0$ .

Proof

We will omit the case when $x_{1}>0$ and only show the proof for the case $x_{1}\,=\,0$ . Take

[TABLE]

As $\delta\rightarrow 0$ , we can find a matrix:

[TABLE]

in the semidefinite cone and

[TABLE]

However, the objective value at $(\delta,-\delta^{2})$ is $-\delta^{2}$ . Thus we have:

[TABLE]

as $\delta\rightarrow 0$ . It follows that calmness does not hold at the point $(0,0)$ since $\mu$ is unbounded.

3.2 Calmness of SDCMPCC Formulation

In this part, we would like to show that in Problem (7), calmness holds for each pair $(X,U)$ with X feasible and U given by (9). The variable $x$ in (4) is equal to the pair $(X,U)$ from (7), so $G(x)=X$ and $H(x)=U$ . We assume

[TABLE]

for continuous functions $g_{i}(X)$ ; these functions are incorporated into $g(x)$ in the formulation (4). Before presenting the propositions, we introduce the rank constrained problem for any positive integer $k$ :

[TABLE]

Proposition 2

Let $X$ be a local minimizer of $(RC(k))$ for some choice of $k$ and let $U$ be given by (9). Then $(X,U)$ is a local optimal solution in Problem (7). Conversely, if $(X,U)$ is a local optimal solution to (7) then $U$ is given by (9) and $X$ is a local minimizer of $(RC(k))$ with $k=\mbox{rank}(X)$ .

Proof

The proposition follows from the fact that $\mbox{rank}(X^{\prime})\geq\mbox{rank}(X)$ for all $X^{\prime}$ close enough to $X$ .

Proposition 3

For any local minimizer $X$ of $(RC(k))$ for some choice of $k$ with $U$ given by (9), let $(\hat{X},\hat{U})$ be a feasible point to the optimization problem below:

[TABLE]

where $p$ , $q$ , $r$ , $h_{1}$ and $h_{2}$ are perturbations to the constraints and $\lambda_{min}(M)$ denotes the minimum eigenvalue of matrix $M$ . Assume $X$ has at least one positive eigenvalue. For $||(p,q,r,h_{1},h_{2})||$ , $||X-\hat{X}||$ , and $||U-\hat{U}||$ all sufficiently small, we have

[TABLE]

where $||X||_{*}$ is the nuclear norm of X and $\tilde{\lambda}$ is the smallest positive eigenvalue of X.

Proof

The general scheme is to determine a lower bound for $\langle I,U\rangle$ and an upper bound for $\langle I,\hat{U}\rangle$ . A lower bound of $\langle I,U\rangle$ can be easily found by exploiting the complementarity constraints and its value is $n-\mbox{rank}(X)$ . To find an upper bound of $\langle I,\hat{U}\rangle$ , the approach we take is to fix $\hat{X}$ in Problem (12) and estimate a lower bound for the objective value of the following problem:

[TABLE]

where $y_{1}$ , $y_{2}$ , $\Omega_{1}$ and $\Omega_{2}$ are the Lagrangian multipliers for the corresponding constraints. It is obvious that $(\hat{X},\hat{U})$ must be feasible to Problem (14). We find an upper bound for $(I,\hat{U})$ by finding a feasible solution to the dual problem of Problem (14), which is:

[TABLE]

We can find a lower bound on the dual objective value by looking at a tightened version, which is established by diagonalizing $\hat{X}$ by linear transformation and restricting the non-diagonal term of $\Omega_{1}$ and $\Omega_{2}$ to be 0. Let $\{f_{i}\}$ , $\{g_{i}\}$ be the entries on the diagonal of $\Omega_{1}$ and $\Omega_{2}$ after the transformation respectively, and $\{\hat{\lambda}_{i}\}$ be the eigenvalues of $\hat{X}$ . The tightened problem is:

[TABLE]

By proper assignment of the value of $y_{1},y_{2},f,g$ , we can construct a feasible solution to the tightened problem and give a lower bound for the optimal objective of the dual problem. Let $\{\lambda_{i}\}$ be the set of eigenvalues of $X$ , with $\tilde{\lambda}$ the smallest positive eigenvalue, and set:

[TABLE]

For f and g:

•

if $\hat{\lambda}_{i}<\frac{\tilde{\lambda}}{2}$ , take $f_{i}=1+y_{2}\hat{\lambda}_{i}$ , $g_{i}=0$ .

•

if $\hat{\lambda}_{i}\geq\frac{\tilde{\lambda}}{2}$ , take $f_{i}=0$ and $g_{i}=\frac{2}{\tilde{\lambda}}\hat{\lambda}_{i}-1$ .

It is trivial to see that the above assignment will yield a feasible solution to Problem (16) and hence a lower bound for the dual objective is:

[TABLE]

By weak duality the primal objective value must be greater or equal to the dual objective value, thus:

[TABLE]

Since we can write

[TABLE]

it follows that for $||U-\hat{U}||$ sufficiently small we have

[TABLE]

For $\hat{\lambda}_{i}<\frac{\tilde{\lambda}}{2}$ , by the constraints $\hat{\lambda}_{i}\geq-h_{1}$ and setting $y_{2}=-\frac{2}{\tilde{\lambda}}$ , we have:

[TABLE]

For $\hat{\lambda}_{i}\geq\frac{\tilde{\lambda}}{2}$ , recall the definition for nuclear norm and we have:

[TABLE]

for $||X-\hat{X}||$ sufficiently small. Since there are exactly $n-\mbox{rank}(X)$ eigenvalues in $\hat{X}$ that converge to 0, we can simplify the above inequality(18) and have:

[TABLE]

Thus we can prove the inequality.

There is one case that is not covered by Proposition 3, namely that $X=0$ . This is also calm, as we show in the next lemma.

Lemma 1

Assume $X=0$ is feasible in (7), with $U$ given by (9). Let $(\hat{X},\hat{U})$ be a feasible point to (12). We have

[TABLE]

Proof

Note that $\langle I,U\rangle=n$ , since $X=0$ and $U$ satisfies (9). In addition, each eigenvalue of $\hat{U}$ is no larger than $1+r$ , so the result follows.

Proposition 4

Calmness holds at each $(X,U)$ provided (i) $X$ is a local minimizer of $(RC(k))$ for some choice of $k$ and (ii) $U$ is given by (9).

Proof

This follows directly from Proposition 3, Lemma 1, and the Lipschitz continuity assumption on $\psi(X)$ .

It follows that any local minimizer of the SDCMPCC formulation of the rank minimization problem is a KKT point. Note that no assumptions are necessary regarding ${\mathcal{C}}$ .

Theorem 3.2

The KKT condition of Theorem 3.1 holds at each local optimum of Problem (7).

Proof

This is a direct result from Theorem 3.1 and Propositions 2 and 4.

The above results show that, similar to the exact complementarity formulation of $\ell_{0}$ minimization, there are many KKT stationary points in the exact SDCMPCC formulation of rank minimization, so it is possible that an algorithm will terminate at some stationary point that might be far from a global optimum. As we have shown in the complementarity formulation for $\ell_{0}$ minimization problem FMPSWachter13 , a possible approach to overcome this difficulty is to relax the complementarity constraints. In the following sections we investigate whether this approach works for the SDCMPCC formulation.

4 A Penalty Scheme for SDCMPCC Formulation

In this section and the following sections, we present a penalty scheme for the original SDCMPCC formulation. The penalty formulation has the form:

[TABLE]

We denote the problem as $SDCPNLP(\rho)$ . We discuss properties of the formulation in this section, with an algorithm described in Section 5 and computational results given in Section 6. First, we note that it follows from standard results that a sequence of global minimizers to (20) converges to a global optimizer of (7); see Luenberger LuenOld for example.

From now on, we assume

[TABLE]

where each $A_{i}\in\mathbb{S}^{n}$ , so $\tilde{\mathcal{C}}$ is an affine manifold.

4.1 Constraint Qualification of Penalty Formulation

The penalty formulation for the Rank Minimization problem is a nonlinear semidefinite program. As with the exact SDCMPCC formulation, we would like to investigate whether algorithms for general nonlinear semidefinite programming problems can be applied to solve the penalty formulation. As far as we know, most algorithms in nonlinear semidefinite programming use first order KKT stationary conditions as the criteria for termination. The KKT stationary condition at a local optimum of the penalty formulation is:

[TABLE]

for $X\,\in\,\tilde{\mathcal{C}}$ , where $\lambda\in^{m_{2}}$ are the dual multipliers corresponding to the linear constraints $Y\in\mathbb{S}^{n}_{+}$ are the dual multipliers corresponding to the constraints $I-U\succeq 0$ , and $\Psi$ is a subgradient of $\psi(X)$ . Unfortunately, the counterexample below shows that the KKT conditions do not hold in the penalty problem (20) in general:

[TABLE]

Every feasible solution requires $x=0$ . It is obvious that if $\rho=0.5$ then the optimal solution to the above problem is:

[TABLE]

and the optimal objective value is $1.5$ . However, there does not exist any KKT multiplier at this point. We can show explicitly that calmness is violated at the current point. If we allow $\lambda_{min}(X)\,\geq\,-h_{1}$ and $\lambda_{\max}(U)\leq 1+h_{1}$ , then we can take

[TABLE]

It is obvious that $(x,X,U)\rightarrow(\bar{x},\bar{X},\bar{U})$ as $h_{1}\rightarrow 0$ . The resulting objective value at $(x,X,U)$ is $1.5-0.5\sqrt{h_{1}}-0.5h_{1}^{1.5}$ . Thus, for small $h_{1}$ , the difference in objective function value is $O(\sqrt{h_{1}})$ , while the perturbation in the constraints is only $O(h_{1})$ , so calmness does not hold.

Lack of constraint qualification indicates that algorithms such as the Augmented Lagrangian method may not converge in general if applied to the penalty formulation. However, if we enforce a Slater constraint qualification on the feasible region of $X$ , we show below that calmness will hold in the penalty problem (20) at local optimal points.

Proposition 5

Calmness holds at the local optima of the penalty formulation (20) if $\cal{C}$ contains a positive definite matrix.

Proof

Since the Slater condition holds for the feasible regions of both $X$ and $U$ , for each pair $(X_{l},U_{l})$ in the perturbed problem we can find $(\tilde{X}_{l},\tilde{U}_{l})$ in $\tilde{C}\cap S^{n}_{+}$ with the distance between $(X_{l},U_{l})$ and $(\tilde{X}_{l},\tilde{U}_{l})$ bounded by some constant times the magnitude of perturbation.

In particular, let $(\hat{X},\hat{U})$ be a strictly feasible solution to (20) with minimum eigenvalue $\delta>0$ . Let the minimum eigenvalue of $(X_{l},U_{l})$ be $-\epsilon<0$ . We construct

[TABLE]

Note that for $\epsilon<<\delta$ , we have

[TABLE]

exploiting the Lipschitz continuity of $\Psi(X)$ . As $(X_{l},U_{l})$ converges to $(\bar{X},\bar{U})$ , we also have $(\tilde{X}_{l},\tilde{U}_{l})\rightarrow(\bar{X},\bar{U})$ , so by the local optimality of $(\bar{X},\bar{U})$ we can give a bound on the optimal value of the perturbed problem and the statement holds.

It follows that the KKT conditions will hold at local minimizers for the penalty formulation.

Proposition 6

The first order KKT condition holds at local optimal solutions for the penalty formulation (20) if the Slater condition holds for the feasible region $\tilde{C}\cap S^{n}_{+}$ of $X$ .

4.2 Local Optimality Condition of Penalty Formulation

4.2.1 Property of KKT Stationary Points of Penalty Formulation

The KKT condition in the penalty formulation works in a similar way with some thresholding methods candes1 ; cambierabsil2016 ; vandereycken2013 ; weiCCL2016b . The objective function not only counts the number of zero eigenvalues, but also the number of eigenvalues below a certain threshold, as illustrated in the following proposition.

Proposition 7

Let $(\bar{X},\bar{U})$ be a local optimal solution to the penalty formulation. Let $\sigma_{i}$ be an eigenvalue of $\bar{U}$ and $v_{i}$ be a corresponding eigenvector of norm one. It follows that:

•

If $\sigma_{i}\,=\,1$ , then $v_{i}^{T}(\Psi+\rho\bar{X})v_{i}\,\leq\,1$ .

•

If $\sigma_{i}\,=\,0$ , then $v_{i}^{T}(\Psi+\rho\bar{X})v_{i}\,\geq\,1$ .

•

if $0\,<\,\sigma_{i}\,<\,1$ , then $v_{i}^{T}(\Psi+\rho\bar{X})v_{i}\,=\,1$

Proof

If $\sigma_{i}\,=\,1$ , since $v_{i}$ is an eigenvector of U, by the complementarity in the KKT condition it follows that:

[TABLE]

As $v_{i}^{T}Yv_{i}\,\geq\,0$ , we have $v_{i}^{T}(\Psi+\rho\bar{X})v_{i}\,\leq\,1$ .

If $\sigma_{i}\,=\,0$ , then $v_{i}$ is an eigenvector of $I-U$ with eigenvalue 1. By the complementarity of $I-U$ and $Y$ we have $v_{i}^{T}Yv_{i}=0$ and

[TABLE]

The above inequality is satisfied if and only if $v_{i}^{T}(\Psi+\rho\bar{X})v_{i}\geq 1$ .

If $0\,<\,\sigma_{i}\,<\,1$ ,then $v_{i}$ is an eigenvector of $I-U$ corresponding to an eigenvalue in $(0,1)$ . The complementarity in KKT condition implies that $v_{i}^{T}Yv_{i}\,=\,0\,=\,v_{i}^{T}(-I\,+\,\Psi\,+\,\rho\bar{X}\,+\,Y)v_{i}$ . It follows that $v_{i}^{T}(\Psi+\rho\bar{X})v_{i}\,=\,1$ .

Using the proposition above, we can show the equivalence between the stationary points of the SDCMPCC formulation and the penalty formulation.

Proposition 8

If $(\bar{X},\bar{U})$ is a stationary point of the SDCMPCC formulation with corresponding subgradient $\Psi$ then it is a stationary point of the penalty formulation if the penalty parameter $\rho$ is sufficiently large.

Proof

Choose $\rho$ so that the minimum positive eigenvalue of $\Psi+\rho\bar{X}$ is strictly greater than $1$ . By setting $\lambda=0$ and with a proper assignment of $Y$ we can see that first order optimality condition (21) holds at $(\bar{X},\bar{U})$ for such a choice of $\rho$ , thus $(\bar{X},\bar{U})$ is a KKT stationary point for the penalty formulation.

4.2.2 Local Convergence of KKT Stationary Points

We would like to investigate whether local convergence results can be established for the penalty formulation, that is, whether the limit points of KKT stationary points of the penalty scheme are KKT stationary points of the SDCMPCC formulation. Unfortunately, local convergence does not hold for the penalty formulation, although the limit points are feasible in the original SDCMPCC formulation.

Proposition 9

Let $(X_{k},U_{k})$ be a KKT stationary point of the penalty scheme with subgradient $\Psi_{k}$ and penalty parameter $\{\rho_{k}\}$ . As $\rho_{k}\rightarrow\infty$ , any limit point $(\bar{X},\bar{U})$ of the sequence $\{(X_{k},U_{k})\}$ is a feasible solution to the original problem.

Proof

The proposition can be verified by contradiction. Note that the norm of $\Psi_{k}$ is bounded since $\psi(X)$ is Lipschitz continuous. If the Frobenius inner product of $\bar{X}$ and $\bar{U}$ is greater than 0, then when k is large enough we have:

[TABLE]

which violates the complementarity in the KKT conditions of the penalty formulation.

We can show that a limit point may not be a KKT stationary point. Consider the following problem:

[TABLE]

The penalty formulation takes the form:

[TABLE]

Let $X_{k}$ and $U_{k}$ take the value:

[TABLE]

so $(X_{k},U_{k})$ is a KKT stationary point for the penalty formulation. However, the limit point:

[TABLE]

is not a KKT stationary point for the original SDCMPCC formulation.

5 Proximal Alternating Linearized Minization

The proximal alternating linearized minimization (PALM) algorithm of Bolte et al. bolte2014proximal is used to solve a wide class of nonconvex and nonsmooth problems of the form

[TABLE]

where $f(x)$ , $g(y)$ and $H(x,y)$ satisfy smoothness and continuity assumptions:

•

$f:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ and $g:\mathbb{R}^{m}\rightarrow(-\infty,+\infty]$ are proper and lower semicontinuous functions.

•

H: $\mathbb{R}^{n}\times\mathbb{R}^{m}\rightarrow\mathbb{R}$ is a $C^{1}$ function.

No convexity assumptions are made. Iterates are updated using a proximal map with respect to a function $\sigma$ and weight parameter $t$ :

[TABLE]

When $\sigma$ is a convex function, the objective is strongly convex and the map returns a unique solution. The PALM algorithm is given in Procedure 1.

It was shown in bolte2014proximal that it converges to a stationary point of $\Phi(x,y)$ under the following assumptions on the functions:

•

$\inf_{\mathbb{R}^{m}\times\mathbb{R}^{n}}\Phi>-\infty$ , $\inf_{\mathbb{R}^{n}}f>-\infty$ and $\inf_{\mathbb{R}^{m}}g>-\infty$ .

•

The partial gradient $\nabla_{x}H(x,y)$ is globally Lipschitz with moduli $L_{1}(y)$ , so:

[TABLE]

Similarly, the partial gradient $\nabla_{y}H(x,y)$ is globally Lipschitz with moduli $L_{2}(x)$ .

•

For $i=1,2$ , there exists $\lambda_{i}^{-},\lambda_{i}^{+}$ such that:

–

$\inf\{L_{1}(y^{k}):k\in\mathbb{N}\}\geq\lambda_{1}^{-}$ and $\inf\{L_{2}(x^{k}):k\in\mathbb{N}\}\geq\lambda_{2}^{-}$

–

$\sup\{L_{1}(y^{k}):k\in\mathbb{N}\}\leq\lambda_{1}^{+}$ and $\sup\{L_{2}(x^{k}):k\in\mathbb{N}\}\leq\lambda_{2}^{+}$ .

•

$\nabla H$ is continuous on bounded subsets of $\mathbb{R}^{n}\times\mathbb{R}^{m}$ , i.e, for each bounded subset $B_{1}\times B_{2}$ of $\mathbb{R}^{n}\times\mathbb{R}^{m}$ there exists $M>0$ such that for all $(x_{i},y_{i})\in B_{1}\times B_{2}$ , $i=1,2$ :

[TABLE]

The PALM method can be applied to the penalty formulation of SDCMPCC formulation of rank minimization (20) with the following assignment of the functions:

[TABLE]

where $\mathcal{I}(.)$ is an indicator function taking the value 0 or $+\infty$ , as appropriate. Note that we have added a regularization term $||X||_{F}^{2}$ to the objective function. When the feasible region for $X$ is bounded, the assumptions required for the convergence of the PALM procedure hold for the penalty formulation of SDCMPCC.

Proposition 10

The function $\Phi(X,U)=f(X)\,+\,g(U)\,+\,H(X,U)$ is bounded below for $X\in\mathbb{S}^{n}$ and $U\in\mathbb{S}^{n}$ if $\psi(X)$ is bounded below.

Proof

Since the eigenvalues of $U$ are bounded by 1, and the Frobenius norm of $X$ and $U$ must be nonnegative, the statement is obvious.

Proposition 11

If the feasible region of $X$ is bounded then $H(X,U)$ is globally Lipschitiz.

Proof

The gradient of $H(X,U)$ is:

[TABLE]

The statement results directly from the boundedness of the feasible region of $X$ and $U$ .

Proposition 12

$\nabla H(X,U)=\rho(U,\,X)$ * is continuous on bounded subsets of $\mathbb{S}^{n}\times\mathbb{S}^{n}$ .*

The proximal subproblems in Procedure 1 are both convex quadratic semidefinite programs. The update to $U^{k+1}$ has a closed form expression based on the eigenvalues and eigenvectors of $U^{k}-(\rho/d_{k})X^{k}$ . Rather than solving the update problem for $X^{k+1}$ directly using, for example, CVX dcp06 , we found it more effective to solve the dual problem using Newton’s method, with a conjugate gradient approach to approximately solve the direction-finding subproblem. This approach was motivated by work of Qi and Sun qi2006quadratically on finding nearest correlation matrices. The structure of the Hessian for our problem is such that the conjugate gradient approach is superior to a direct factorization approach, with matrix-vector products relatively easy to calculate compared to formulating and factorizing the Hessian. The updates to $X$ and $U$ are discussed in Sections 5.2 and 5.3, respectively. First, we discuss accelerating the PALM method.

5.1 Adding Momentum Terms to the PALM Method

One downside for proximal gradient type methods are their slow convergence rates. Nesterov nesterov2013introductory ; nesterov22 proposed accelerating gradient methods for convex programming by adding a momentum term. The accelerated algorithm has a quadratic convergence rate, compared with sublinear convergence rate of the normal gradient method. Recent accelerated proximal gradient methods include schmidt2011convergence ; toh2010accelerated .

Bolte et al. bolte2014proximal showed that the PALM proximal gradient method can be applied to nonconvex programs under certain assumptions and the method will converge to a local optimum. The question arises as to whether there exists an accelerated version in nonconvex programming. Ghadimi et al. ghadimi2016accelerated presented an accelerated gradient method for nonconvex nonlinear and stochastic programming, with quadratic convergence to a limit point satisfying the first order KKT condition.

There are various ways to set up the momentum term li2015accelerated ; lin2015accelerated . Here we adopted the following strategy while updating $x^{k}$ and $y^{k}$ :

[TABLE]

where $\beta^{k}\,=\,\frac{k-2}{k+1}$ (borrowing from nesterov22 ). We refer to this as the Fast PALM method.

5.2 Updating $X$

Assume $\mathcal{C}\,=\,\{X\in\mathbb{S}^{n}\,:\,\mathcal{A}(X)\,=\,b,\,X\succeq 0\}$ and the Slater condition holds for $\mathcal{C}$ . The proximal point $X$ of $\tilde{X}$ , or $\mbox{prox}_{c_{k}}^{f}(\tilde{X})$ can be calculated as the optimal solution to the problem:

[TABLE]

The objective can be replaced by:

[TABLE]

With the Fast PALM method, we use

[TABLE]

We observed that the structure of the subproblem to get the proximal point is very similar to the nearest correlation matrix problem when $\psi(X)\equiv 0$ . Qi and Sun qi2006quadratically showed that for the nearest correlation matrix problem, a semismooth Newton’s method is numerically very efficient compared to other existing methods, and it achieves a quadratic convergence rate if the Hessian at the optimal solution is positive definite. Provided Slater’s condition holds for the feasible region of the subproblem and its dual program, strong duality holds and instead of solving the primal program, we can solve the dual program which has the following form:

[TABLE]

where $(M)_{+}$ denotes the projection of the symmetric matrix $M\in\mathbb{S}^{n}$ onto the cone of positive semidefinite matrices $\mathbb{S}^{n}_{+}$ , and

[TABLE]

One advantage of the dual program over the primal program is that the dual program is unconstrained. Newton’s method can be applied to get the solution $y^{*}$ which satisfies the first order optimality condition:

[TABLE]

Note that $\nabla\theta(y)=F(y)-b$ . The algorithm is given in Procedure 2.

In each iteration, one key step is to construct the Hessian matrix $V_{k}$ . Given the eigenvalue decomposition

[TABLE]

let $\alpha,\beta,\gamma$ be the sets of indices corresponding to positive, zero and negative eigenvalues $\lambda_{i}$ respectively. Set:

[TABLE]

where all the entries in $E_{\alpha\alpha}$ , $E_{\alpha\beta}$ and $E_{\beta\alpha}$ take value 1 and

[TABLE]

The operator $V_{y}:\mathbb{R}^{m}\rightarrow\mathbb{R}^{m}$ is defined as:

[TABLE]

where $\circ$ denotes the Hadamard product. Qi and Sun qi2006quadratically showed

Proposition 13

The operator $V_{y}$ is positive semidefinite, with $\langle h,V_{y}h\rangle\,\geq\,0$ for any $h\in\mathbb{R}^{m}$ .

Since positive definiteness of $V_{y}$ is required for the conjugate gradient method, a perturbation term is added in the linear system:

[TABLE]

After getting the dual optimal solution $y^{*}$ , the primal optimal solution $X^{*}$ can be recovered by:

[TABLE]

5.3 Updating $U$

The subproblem to update $U$ is:

[TABLE]

with

[TABLE]

with the Fast PALM method. The objective is equivalent to minimizing $||U\,-\,(\tilde{U}+\frac{1}{d_{k}}I)||_{F}^{2}$ . An analytical solution can be found for this problem. Given the eigenvalue decomposition of $\tilde{U}+\frac{1}{d_{k}}I$ :

[TABLE]

the optimal solution $U^{*}$ is:

[TABLE]

where the eigenvalue $\sigma^{U^{*}}_{i}$ takes the value:

[TABLE]

Note also that

[TABLE]

It may be more efficient to work with this representation if there are many more eigenvalues at least equal to one as opposed to less than zero.

6 Test Problems

Our experiments included tests on coordinate recovery problems aapwolkowicz2011 ; ding1 ; laurent8 ; pongTsengMP ; so1 . In these problems, the distances between items in 3 are given and it is necessary to recover the positions of the items. Given an incomplete Euclidean distance matrix $D=(d_{ij}^{2})$ , where:

[TABLE]

we want to recover the coordinate $x_{i},i\,=\,1,\cdots,n$ . Since the coordinate is in 3-dimensional space, $x_{i},i=1,\cdots,n$ is a $1\times 3$ vector. Let $X\,=\,(x_{1},x_{2},\cdots,x_{n})^{T}\in\mathbb{R}^{n\times 3}$ . The problem turns into recovering the matrix $X$ . One way to solve the problem is to lift $X$ by introducing $B=XX^{T}$ and we would like to find B that satisfies

[TABLE]

where $\Omega$ is set of pairs of points whose distance has been observed. Since $X$ is of rank 3, the rank of the symmetric psd matrix $B$ is 3, and so we seek to minimize the rank of $B$ in the objective.

We generated 20 instances and in each instance we randomly sampled 150 entries from a $50\times 50$ distance matrix. We applied the PALM method and the Fast PALM method to solve the problem. For each case, we limit the maximum number of iterations to 200. Figures 1 and 2 each show the results on 10 of the instances. As can be seen, the Fast PALM approach dramatically speeds up the algorithm. The computational tests were conducted using Matlab R2013b running in Windows 10 on an Intel Core i7-4719HQ CPU @2.50GHz with 16GB of RAM.

We compared the rank of the solutions returned by the fast PALM method for (20) with the solution returned by the convex nuclear norm approximation to (6). Note that when we calculate the rank of the resulting $X$ in both the convex approximation and the penalty formulation, we count the number of eigenvalues above the threshold 0.01.

Figure 3 shows that when 150 distances are sampled, the solutions returned by the penalty formulation have notably lower rank when compared with the convex approximation. There was only one instance where the penalty formulation failed to find a solution with rank 3; in contrast, the lowest rank solution found by the nuclear norm approach had rank 5, and that was only for one instance.

We also experimented with various numbers of sampling distances from 150 to 195. For each number, we randomly sampled that number of distances, then compared the average rank returned by the penalty formulation and the convex approximation. Figure 4 shows that the penalty formulation is more likely to recover a low rank matrix when the number of sampled distances is the same. The nuclear norm approach appears to need about 200 sampled distances in order to obtain a solution with rank 3 in most cases. There has been some research on maximizing the nuclear norm os symmetric matrices to approximately minimize rank krislock3 . However, for our (very sparse) instances we obtained similar ranks either minimizing or maximizing the nuclear norm.

For the 20 cases where 150 distances are sampled, the average time for CVX is 0.6590 seconds, while for the penalty formulation the average time for the fast PALM method is 10.21 seconds. Although the fast PALM method cannot beat CVX in terms of speed, it can solve the problem in a reasonable amount of time and produces a lower rank solution for our test instances.

7 Conclusions

The SDCMPCC approach gives an equivalent nonlinear programming formulation for the combinatorial optimization problem of minimizing the rank of a symmetric positive semidefinite matrix subject to convex constraints. The disadvantage of the SDCMPCC approach lies in the nonconvex complementarity constraint, a type of constraint for which constraint qualifications might not hold. We showed that the calmness constraint qualification holds for the SDCMPCC formulation of the rank minimization problem, provided the convex constraints satisfy an appropriate condition. We developed a penalty formulation for the problem which satisfies calmness under the same condition on the convex constraint. The penalty formulation could be solved effectively using an alternating direction approach, accelerated through the use of a momentum term. For our test problems, our formulation outperformed a nuclear norm approach, in that it was able to recover a low rank matrix using fewer samples than the nuclear norm approach.

There are alternative nonlinear approaches to rank minimization problems, and it would be interesting to explore the relationships between the methods. The formulation we’ve presented is to minimize the rank; the approach could be extended to handle problems with upper bounds on the rank of the matrix, through the use of a constraint on the trace of the additional variable $U$ . Also of interest would be the extension of the approach to the nonsymmetric case.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) A. Y. Alfakih, M. F. Anjos, V. Piccialli, and H. Wolkowicz , Euclidean distance matrices, semidefinite programming, and sensor network localization , Portugaliae Mathematica, 68 (2011), pp. 53–102.
2(2) L. Bai, J. E. Mitchell, and J. Pang , On conic QPC Cs, conic QCQ Ps and completely positive programs , Mathematical Programming, 159 (2016), pp. 109–136.
3(3) J. Bolte, S. Sabach, and M. Teboulle , Proximal alternating linearized minimization for nonconvex and nonsmooth problems , Mathematical Programming, 146 (2014), pp. 459–494.
4(4) O. Burdakov, C. Kanzow, and A. Schwartz , Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method , SIAM Journal on Optimization, 26 (2016), pp. 397–425.
5(5) J.-F. Cai, E. J. Candès, and Z. Shen , A singular value thresholding algorithm for matrix completion , SIAM Journal on Optimization, 20 (2010), pp. 1956–1982.
6(6) L. Cambier and P.-A. Absil , Robust low-rank matrix completion by Riemannian optimization , SIAM Journal on Scientific Computing, 38 (2016), pp. S 440–S 460.
7(7) F. H. Clarke , Optimization and Nonsmooth Analysis , SIAM, Philadelphia, PA, USA, 1990.
8(8) C. Ding, D. Sun, and J. Ye , First order optimality conditions for mathematical programs with semidefinite cone complementarity constraints , Mathematical Programming, 147 (2014), pp. 539–579.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Penalty Method for Rank Minimization Problems

Abstract

1 Introduction to the Rank Minimization Problem

2 Semidefinite Cone Complementarity Formulation for Rank Minimization Problems

3 Constraint Qualification of the SDCMPCC Formulation

3.1 Introduction of Calmness

Definition 1

Theorem 3.1

Proposition 1

Proof

3.2 Calmness of SDCMPCC Formulation

Proposition 2

Proof

Proposition 3

Proof

Lemma 1

Proof

Proposition 4

Proof

Theorem 3.2

Proof

4 A Penalty Scheme for SDCMPCC Formulation

4.1 Constraint Qualification of Penalty Formulation

Proposition 5

Proof

Proposition 6

4.2 Local Optimality Condition of Penalty Formulation

4.2.1 Property of KKT Stationary Points of Penalty Formulation

Proposition 7

Proof

Proposition 8

Proof

4.2.2 Local Convergence of KKT Stationary Points

Proposition 9

Proof

5 Proximal Alternating Linearized Minization

Proposition 10

Proof

Proposition 11

Proof

Proposition 12

5.1 Adding Momentum Terms to the PALM Method

5.2 Updating XXX

Proposition 13

5.3 Updating UUU

6 Test Problems

7 Conclusions

5.2 Updating $X$

5.3 Updating $U$