Several classes of stationary points for rank regularized minimization   problems

Yulan Liu; Shaohua Pan

arXiv:1906.08922·math.OC·June 27, 2019·SIAM J. Optim.

Several classes of stationary points for rank regularized minimization problems

Yulan Liu, Shaohua Pan

PDF

Open Access

TL;DR

This paper introduces various stationary points for rank regularized minimization problems through different reformulations, providing a relation chart to guide low-rank solution search and characterizing conditions for local minimizers.

Contribution

It defines multiple stationary points for the problem and its reformulations, establishing their relations and offering conditions for local optimality in the PSD cone context.

Findings

01

Established a relation chart for stationary points across reformulations

02

Provided weaker conditions for local minimizers to be M-stationary

03

Characterized the directional limiting normal cone for the PSD cone

Abstract

For the rank regularized minimization problem, we introduce several kinds of stationary points by the problem itself and its equivalent reformulations including the mathematical program with an equilibrium constraint (MPEC), the global exact penalty of the MPEC,the surrogate yielded by eliminating the dual part in the exact penalty. A clear relation chart is established for these stationary points, which guides the user to choose an appropriate reformulation for seeking a low-rank solution. As a byproduct, we also provide a weaker condition for a local minimizer of the MPEC to be the M-stationary point by characterizing the directional limiting normal cone to the graph of the normal cone mapping of the positive semidefinite (PSD) cone.

Equations283

X \in R^{m \times n} min F (X) := ν f (X) + rank (X) + δ_{Ω} (X)

X \in R^{m \times n} min F (X) := ν f (X) + rank (X) + δ_{Ω} (X)

int (dom ϕ) \supseteq [0, 1], ϕ (1) = 1 > t^{*} := ar g min_{0 \leq t \leq 1} ϕ (t) and ϕ (t^{*}) = 0,

int (dom ϕ) \supseteq [0, 1], ϕ (1) = 1 > t^{*} := ar g min_{0 \leq t \leq 1} ϕ (t) and ϕ (t^{*}) = 0,

\psi(t):=\!\left\{\!\begin{array}[]{cl}\phi(t)&\textrm{if}\ t\in[0,1],\\ +\infty&\textrm{otherwise}.\end{array}\right.

\psi(t):=\!\left\{\!\begin{array}[]{cl}\phi(t)&\textrm{if}\ t\in[0,1],\\ +\infty&\textrm{otherwise}.\end{array}\right.

X, W \in R^{m \times n} min ν f (X) + \sum_{i = 1}^{m} ϕ (σ_{i} (W)) + δ_{Ω} (X)

X, W \in R^{m \times n} min ν f (X) + \sum_{i = 1}^{m} ϕ (σ_{i} (W)) + δ_{Ω} (X)

s.t. ∥ X ∥_{*} - ⟨ W, X ⟩ = 0, ∥ W ∥ \leq 1

X, W \in R^{m \times n} min ν f (X) + \sum_{i = 1}^{m} ϕ (σ_{i} (W)) + ρ (∥ X ∥_{*} - ⟨ W, X ⟩)

X, W \in R^{m \times n} min ν f (X) + \sum_{i = 1}^{m} ϕ (σ_{i} (W)) + ρ (∥ X ∥_{*} - ⟨ W, X ⟩)

s.t. X \in Ω, ∥ W ∥ \leq 1

\min_{X\in\Omega}\Big{\{}\nu f(X)+\rho\|X\|_{*}-{\textstyle\sum_{i=1}^{m}}\psi^{*}(\rho\sigma_{i}(X))\Big{\}}.

\min_{X\in\Omega}\Big{\{}\nu f(X)+\rho\|X\|_{*}-{\textstyle\sum_{i=1}^{m}}\psi^{*}(\rho\sigma_{i}(X))\Big{\}}.

\widehat{\mathcal{N}}_{S}(\overline{z}):=\Big{\{}v\in\mathbb{Z}\ |\ \limsup_{z\xrightarrow[S]{}\overline{z}}\frac{\langle v,z-\overline{z}\rangle}{\|z-\overline{z}\|}\leq 0\Big{\}}

\widehat{\mathcal{N}}_{S}(\overline{z}):=\Big{\{}v\in\mathbb{Z}\ |\ \limsup_{z\xrightarrow[S]{}\overline{z}}\frac{\langle v,z-\overline{z}\rangle}{\|z-\overline{z}\|}\leq 0\Big{\}}

\displaystyle\mathcal{N}_{S}(\overline{z}):=\Big{\{}v\in\mathbb{Z}\,|\,\exists\,z^{k}\xrightarrow[S]{}\overline{z},v^{k}\to v{\ \ \rm with\ \ }v^{k}\in\widehat{\mathcal{N}}_{S}(z^{k})\Big{\}}.

\displaystyle\mathcal{N}_{S}(\overline{z}):=\Big{\{}v\in\mathbb{Z}\,|\,\exists\,z^{k}\xrightarrow[S]{}\overline{z},v^{k}\to v{\ \ \rm with\ \ }v^{k}\in\widehat{\mathcal{N}}_{S}(z^{k})\Big{\}}.

\mathcal{T}_{S}(\overline{z}):=\big{\{}h\in\mathbb{Z}\ |\ \exists\,t_{k}\downarrow 0,\,h^{k}\to h\ {\rm with}\ \overline{z}+t_{k}h^{k}\in S\big{\}}.

\mathcal{T}_{S}(\overline{z}):=\big{\{}h\in\mathbb{Z}\ |\ \exists\,t_{k}\downarrow 0,\,h^{k}\to h\ {\rm with}\ \overline{z}+t_{k}h^{k}\in S\big{\}}.

\mathcal{N}_{S}(\overline{z};u):=\Big{\{}z^{*}\in\mathbb{Z}\ |\ \exists\,t_{k}\downarrow 0,\,u^{k}\to u,z^{k*}\to z^{*}\ {\rm with}\ z^{k*}\in\widehat{\mathcal{N}}_{S}(\overline{z}\!+\!t_{k}u^{k})\Big{\}}.

\mathcal{N}_{S}(\overline{z};u):=\Big{\{}z^{*}\in\mathbb{Z}\ |\ \exists\,t_{k}\downarrow 0,\,u^{k}\to u,z^{k*}\to z^{*}\ {\rm with}\ z^{k*}\in\widehat{\mathcal{N}}_{S}(\overline{z}\!+\!t_{k}u^{k})\Big{\}}.

\widehat{\partial}g(\overline{z}):=\bigg{\{}z^{*}\in\mathbb{X}\ \big{|}\ \liminf_{z\to\overline{z}\atop z\neq\overline{z}}\frac{g(z)-g(\overline{z})-\langle z^{*},z-\overline{z}\rangle}{\|z-\overline{z}\|}\geq 0\bigg{\}};

\widehat{\partial}g(\overline{z}):=\bigg{\{}z^{*}\in\mathbb{X}\ \big{|}\ \liminf_{z\to\overline{z}\atop z\neq\overline{z}}\frac{g(z)-g(\overline{z})-\langle z^{*},z-\overline{z}\rangle}{\|z-\overline{z}\|}\geq 0\bigg{\}};

\displaystyle\partial g(\overline{z})=\Big{\{}z^{*}\in\mathbb{X}\,|\,\exists\,z^{k}\xrightarrow[g]{}z,z^{k,*}\to z^{*}\ {\rm\,such\,that\,}\ z^{k,*}\in\widehat{\partial}g(z^{k})\Big{\}}.

\displaystyle\partial g(\overline{z})=\Big{\{}z^{*}\in\mathbb{X}\,|\,\exists\,z^{k}\xrightarrow[g]{}z,z^{k,*}\to z^{*}\ {\rm\,such\,that\,}\ z^{k,*}\in\widehat{\partial}g(z^{k})\Big{\}}.

N_{S} (z) = \partial δ_{S} (z) and N_{S} (z) = \partial δ_{S} (z) for z \in S .

N_{S} (z) = \partial δ_{S} (z) and N_{S} (z) = \partial δ_{S} (z) for z \in S .

F (z) \cap B_{δ} (\overline{w}) \subseteq F (z^{'}) + κ ∥ z - z^{'} ∥ B_{W} .

F (z) \cap B_{δ} (\overline{w}) \subseteq F (z^{'}) + κ ∥ z - z^{'} ∥ B_{W} .

F (z) \cap B_{δ} (\overline{w}) \subseteq F (\overline{z}) + κ ∥ z - \overline{z} ∥ B_{W} .

F (z) \cap B_{δ} (\overline{w}) \subseteq F (\overline{z}) + κ ∥ z - \overline{z} ∥ B_{W} .

u \in D^{*} F (\overline{z} ∣ \overline{w}) (v) ⟺ (u, - v) \in N_{gph F} (\overline{z}, \overline{w}),

u \in D^{*} F (\overline{z} ∣ \overline{w}) (v) ⟺ (u, - v) \in N_{gph F} (\overline{z}, \overline{w}),

v \in D F (\overline{z} ∣ \overline{w}) (u) ⟺ (u, v) \in T_{gph F} (\overline{z}, \overline{w}) .

v \in D F (\overline{z} ∣ \overline{w}) (u) ⟺ (u, v) \in T_{gph F} (\overline{z}, \overline{w}) .

\partial\|X\|_{*}=\Big{\{}[U_{1}\ \ U_{2}]\left[\begin{matrix}I&0\\ 0&Z\end{matrix}\right][V_{1}\ \ V_{2}]^{\mathbb{T}}\;|\;\|Z\|\leq 1\Big{\}}

\partial\|X\|_{*}=\Big{\{}[U_{1}\ \ U_{2}]\left[\begin{matrix}I&0\\ 0&Z\end{matrix}\right][V_{1}\ \ V_{2}]^{\mathbb{T}}\;|\;\|Z\|\leq 1\Big{\}}

α := {i \in [1, m] ∣ σ_{i} (\overline{Z}) > 1}, β := {i \in [1, m] ∣ σ_{i} (\overline{Z}) = 1}, c = [m + 1, n],

α := {i \in [1, m] ∣ σ_{i} (\overline{Z}) > 1}, β := {i \in [1, m] ∣ σ_{i} (\overline{Z}) = 1}, c = [m + 1, n],

\displaystyle\gamma:=\gamma_{1}\cup\gamma_{0}\ {\rm for}\ \gamma_{1}\!:=\big{\{}i\in[1,m]\ |\ 0<\sigma_{i}(\overline{Z})<1\big{\}},\gamma_{0}\!:=\!\big{\{}i\in[1,m]\ |\ \sigma_{i}(\overline{Z})=0\big{\}},

\displaystyle\big{(}\Omega_{1}\big{)}_{ij}

\displaystyle\big{(}\Omega_{1}\big{)}_{ij}

\displaystyle\big{(}\Omega_{2}\big{)}_{ij}

\displaystyle\big{(}\Omega_{3}\big{)}_{ij}

Θ_{1} := 0_{α α} 0_{β α} (Ω_{1})_{γ α} 0_{α β} 0_{β β} E_{γ β} (Ω_{1})_{α γ} E_{β γ} E_{γ γ}, Θ_{2} := E_{α α} E_{β α} E_{γ α} - (Ω_{1})_{γ α} E_{α β} 0_{β β} 0_{γ β} E_{α γ} - (Ω_{1})_{α γ} 0_{β γ} 0_{γ γ},

Θ_{1} := 0_{α α} 0_{β α} (Ω_{1})_{γ α} 0_{α β} 0_{β β} E_{γ β} (Ω_{1})_{α γ} E_{β γ} E_{γ γ}, Θ_{2} := E_{α α} E_{β α} E_{γ α} - (Ω_{1})_{γ α} E_{α β} 0_{β β} 0_{γ β} E_{α γ} - (Ω_{1})_{α γ} 0_{β γ} 0_{γ γ},

Σ_{1} := (Ω_{2})_{α α} (Ω_{2})_{β α} (Ω_{2})_{γ α} (Ω_{2})_{α β} 0_{β β} E_{γ β} (Ω_{2})_{α γ} E_{β γ} E_{γ γ}, Σ_{2} := E_{α α} - (Ω_{2})_{α α} E_{β α} - (Ω_{2})_{β α} E_{γ α} - (Ω_{2})_{γ α} E_{α β} - (Ω_{2})_{α β} 0_{β β} 0_{γ β} E_{α γ} - (Ω_{2})_{α γ} 0_{β γ} 0_{γ γ} .

\mathbb{R}_{>}^{|\beta|}:=\big{\{}z\in\mathbb{R}^{|\beta|}\!:\ z_{1}\geq\cdots\geq z_{|\beta|}>0\big{\}}.

\mathbb{R}_{>}^{|\beta|}:=\big{\{}z\in\mathbb{R}^{|\beta|}\!:\ z_{1}\geq\cdots\geq z_{|\beta|}>0\big{\}}.

(D(z))_{ij}:=\left\{\begin{array}[]{cl}\!\frac{\min(1,z_{i})-\min(1,z_{j})}{z_{i}-z_{j}}\in[0,1]&{\rm if}\ z_{i}\neq z_{j},\\ 0&{\rm if}\ z_{i}=z_{j}\geq 1,\\ 1&{\rm otherwise}.\end{array}\right.

(D(z))_{ij}:=\left\{\begin{array}[]{cl}\!\frac{\min(1,z_{i})-\min(1,z_{j})}{z_{i}-z_{j}}\in[0,1]&{\rm if}\ z_{i}\neq z_{j},\\ 0&{\rm if}\ z_{i}=z_{j}\geq 1,\\ 1&{\rm otherwise}.\end{array}\right.

Ξ_{1} = 0_{β_{+} β_{+}} 0_{β_{0} β_{+}} (Ξ_{1})_{β_{+} β_{-}}^{T} 0_{β_{+} β_{0}} 0_{β_{0} β_{0}} E_{β_{-} β_{0}} (Ξ_{1})_{β_{+} β_{-}} E_{β_{0} β_{-}} E_{β_{-} β_{-}},

Ξ_{1} = 0_{β_{+} β_{+}} 0_{β_{0} β_{+}} (Ξ_{1})_{β_{+} β_{-}}^{T} 0_{β_{+} β_{0}} 0_{β_{0} β_{0}} E_{β_{-} β_{0}} (Ξ_{1})_{β_{+} β_{-}} E_{β_{0} β_{-}} E_{β_{-} β_{-}},

Ξ_{2} = E_{β_{+} β_{+}} E_{β_{0} β_{+}} E_{β_{-} β_{+}} - (Ξ_{1})_{β_{+} β_{-}}^{T} E_{β_{+} β_{0}} 0_{β_{0} β_{0}} 0_{β_{-} β_{0}} E_{β_{+} β_{-}} - (Ξ_{1})_{β_{+} β_{-}} 0_{β_{0} β_{-}} 0_{β_{-} β_{-}} .

Ξ_{2} = E_{β_{+} β_{+}} E_{β_{0} β_{+}} E_{β_{-} β_{+}} - (Ξ_{1})_{β_{+} β_{-}}^{T} E_{β_{+} β_{0}} 0_{β_{0} β_{0}} 0_{β_{-} β_{0}} E_{β_{+} β_{-}} - (Ξ_{1})_{β_{+} β_{-}} 0_{β_{0} β_{-}} 0_{β_{-} β_{-}} .

Θ_{1} \circ S (H_{1}) + Θ_{2} \circ S (G_{1}) + Σ_{1} \circ X (H_{1}) + Σ_{2} \circ X (G_{1}) = 0,

Θ_{1} \circ S (H_{1}) + Θ_{2} \circ S (G_{1}) + Σ_{1} \circ X (H_{1}) + Σ_{2} \circ X (G_{1}) = 0,

G_{α c} + (Ω_{3})_{α c} \circ (H_{α c} - G_{α c}) = 0, H_{β c} = 0, H_{γ c} = 0,

\displaystyle(\widetilde{G}_{\beta\beta},\widetilde{H}_{\beta\beta})\in\!\bigcup_{Q\in\mathbb{O}^{|\beta|}\atop\Xi_{1}\in\mathcal{U}_{|\beta|}}\!\left\{(M,N)\ \bigg{|}\!\left.\begin{array}[]{ll}\Xi_{1}\circ\widehat{N}+\Xi_{2}\circ\mathcal{S}(\widehat{M})+\Xi_{2}\circ\mathcal{X}(\widehat{N})=0\\ \quad{\rm with}\ \widehat{N}=Q^{\mathbb{T}}NQ,\,\widehat{M}=Q^{\mathbb{T}}MQ,\\ \quad Q_{\beta_{0}}^{\mathbb{T}}MQ_{\beta_{0}}\preceq 0,\ Q_{\beta_{0}}^{\mathbb{T}}NQ_{\beta_{0}}\succeq 0\end{array}\right.\!\right\}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Optimization and Variational Analysis · Sparse and Compressive Sensing Techniques

Full text

Several classes of stationary points for rank regularized minimization problems

Yulan Liu111School of Applied Mathematics, Guangdong University of Technology, Guangzhou. and Shaohua Pan222Corresponding author ([email protected]), School of Mathematics, South China University of Technology, Guangzhou.

Abstract

For the rank regularized minimization problem, we introduce several classes of stationary points by the problem itself and its equivalent reformulations including the mathematical program with an equilibrium constraint (MPEC), the global exact penalty of the MPEC, and the surrogate yielded by eliminating the dual part of the exact penalty. A clear relation chart is established among these stationary points, which offers a guidance to choose an appropriate reformulation for seeking a low-rank solution. As a byproduct, for the positive semidefinite (PSD) rank regularized minimization problem, we also provide a weaker condition for a local minimizer of its MPEC reformulation to be the M-stationary point by characterizing the directional limiting normal cone to the graph of the normal cone mapping of the PSD cone.

Keywords: Rank regularized minimization problems; stationary points; matrix MPECs; calmness; directional limiting normal cone

Mathematics Subject Classification(2010): 90C26, 49J52, 49J53

1 Introduction

Let $\mathbb{R}^{m\times n}$ be the linear space of all $m\times n\ (m\leq n)$ real matrices equipped with the trace inner product $\langle\cdot,\cdot\rangle$ and its induced norm $\|\cdot\|_{F}$ , i.e., $\langle X,Y\rangle={\rm tr}(X^{\mathbb{T}}Y)$ for $X,Y\in\mathbb{R}^{m\times n}$ . Given a function $f\!:\mathbb{R}^{m\times n}\to\mathbb{R}$ , we are interested in the rank regularized problem:

[TABLE]

where $\nu>0$ is the regularization parameter and $\Omega\subseteq\mathbb{R}^{m\times n}$ is a closed convex set. Unless otherwise stated, we assume that $f$ is locally Lipschitz and $\widehat{\partial}f(X)=\partial\!f(X)$ for any $X\in\Omega$ , where $\widehat{\partial}f(X)$ and $\partial\!f(X)$ are the regular and limiting subdifferential of $f$ at $X$ , respectively; see Section 2.1 for their definitions. Such a model is frequently used to seek a low-rank matrix under the scenario where a tight estimation is unavailable for the rank of the target matrix, and is found to have a host of applications in a variety of fields such as statistics [26], control and system identification [8, 9], signal and image processing [3], finance [30], quantum tomography [12], and so on.

Owing to the combinatorial property of the rank function, the problem (1) is generally NP-hard and it is impossible to achieve a global optimal solution by using an algorithm with polynomial-time complexity. So, it is common to obtain a desirable local optimal even feasible solution by solving a convex relaxation or surrogate problem. Although the nuclear-norm convex relaxation method [7] is very popular, it has a weak ability to promote low-rank solutions and even fails to yielding low-rank solutions in some cases [23]. After recognizing this deficiency, some researchers pay their attentions to the nonconvex surrogates of low-rank optimization problems such as the log-determinant surrogate (see [8, 24]) and the Schatten $p\ (0<p<1)$ -norm surrogate [15]. As illustrated in [27], the efficiency of nonconvex surrogates depends on its approximation effect.

Recently, by the variational characterization of the rank function, the authors of [1, 21] reformulated the rank regularized problem (1) as an equivalent MPEC and derived an equivalent surrogate from its global exact penalty. In order to illustrate this, let $\mathscr{L}$ denote the family of proper lower semi-continuous (lsc) convex functions $\phi\!:\mathbb{R}\to(-\infty,+\infty]$ with

[TABLE]

and for each $\phi\in\!\mathscr{L}$ let $\psi\!:\mathbb{R}\to(-\infty,+\infty]$ be the associated lsc convex function given by

[TABLE]

With $\phi\in\mathscr{L}$ , the rank regularized problem (1) can be equivalently reformulated as

[TABLE]

which is a matrix MPEC since the constraints $\|X\|_{*}-\langle W,X\rangle=0$ and $\|W\|\leq 1$ are equivalent to $X\in\mathcal{N}_{\mathbb{B}}(W)$ with $\mathbb{B}\!:=\{Z\in\mathbb{R}^{m\times n}\,|\,\|Z\|\leq 1\},$ i.e., the optimality condition of $W\in\mathop{\arg\max}_{Z\in\mathbb{B}}\langle X,Z\rangle$ . Under a mild condition, it was shown in [1, 21] that the following penalized problem

[TABLE]

is a global exact penalty of the MPEC (1) in the sense that there exists $\overline{\rho}>0$ such that the problem (1) associated to each $\rho\geq\overline{\rho}$ has the same global optimal solution set as (1) does. With the conjugate function $\psi^{*}(s):=\sup_{t\in\mathbb{R}}\big{\{}st-\psi(t)\big{\}}$ of $\psi$ , one may eliminate the dual variable $W$ in (1) and get the following equivalent surrogate of the problem (1)

[TABLE]

As well known, when an algorithm is applied to nonconvex and nonsmooth optimization problems, one generally expects to achieve a stationary point, while the stationary points of equivalent reformulations may have a big difference. Thus, it is necessary to clarify the relation among the stationary points of (1) defined by its equivalent reformulations. Moreover, such a clarification is prerequisite to describe the landscape of stationary points for the rank regularized problem (1). Motivated by this, in Section 3 we introduce the R(egular)-stationary point, the M-stationary point, the EP-stationary point and the DC-stationary point by the problem (1) itself and its reformulation (1)-(6), respectively, and explore the relation among the four classes of stationary points. Figure 1 in Section 3 shows that the set of M-stationary points is almost same as that of R-stationary points, the latter includes that of EP-stationary points under a rank condition, and the set of EP-stationary points coincides with that of DC-stationary points for some appropriate $\phi$ . As a byproduct, for the PSD rank regularized minimization problem, we also provide a weaker condition than the one in [5] for a local minimizer of its MPEC reformulation to be the M-stationary point, by the directional limiting normal cone to the graph of the normal cone mapping of the PSD cone $\mathbb{S}_{+}^{n}$ .

We notice that some active research has been done for the stationary points of zero-norm constrained optimization problems (see, e.g., [2, 28, 10]); for example, Burdakov et al. [2] discussed the relation between the M-stationary point and the $S$ -stationary point of their equivalent MPEC reformulation; and Pan et al. [28] characterizes the first-order optimality condition which actually defines a class of stationary points by the tangent cone to the zero-norm constrained set. To the best of our knowledge, there are few works to study the stationary points of rank regularized optimization problems. For the special case $\Omega\subseteq\mathbb{S}_{+}^{n}$ , the rank regularized problem (1) can reduce to a mathematical program with semidefinite conic complementarity constraints (MPSCCC) and Ding et al. [5] have established the connection among several class of stationary points for the MPSCCC, which are defined by the equivalent reformulations of the complementarity constraints. However, this work is concerned with the relation among the stationary points defined by different equivalent reformulations of the rank regularized problem (1), and aims to establish a clear relation chart for these stationary points so that the user can be guided to choose an appropriate reformulation to seek a low-rank solution.

2 Notation and preliminaries

Throughout this paper, a hollow capital means a finite dimensional vector space equipped with the inner product $\langle\cdot,\cdot\rangle$ and its induced norm $\|\cdot\|$ . The notation $\mathbb{S}^{n}$ denotes the vector space of all $n\times n$ real symmetric matrices equipped with the Frobenius norm, and $\mathbb{S}_{+}^{n}$ means the set of all positive semidefinite matrices in $\mathbb{S}^{n}$ . Let $\mathbb{O}^{m\times n}$ be the set of $m\times n$ matrices with orthonormal columns and denote $\mathbb{O}^{m\times m}$ by $\mathbb{O}^{m}$ . For a given $X\in\mathbb{R}^{m\times n}$ , we denote by $\|X\|_{*}$ and $\|X\|$ the nuclear norm and the spectral norm of $X$ , respectively, and by $\sigma(X)\in\mathbb{R}^{m}$ the singular value vector arranged in a nonincreasing order; and write $\mathbb{O}^{m,n}(X):=\{(U,V)\in\mathbb{O}^{m}\times\mathbb{O}^{n}\,|\,X=U{\rm Diag}(\sigma(X))V^{\mathbb{T}}\}$ . For a given $X\in\mathbb{R}^{m\times n}$ and two index sets $\alpha\subseteq\{1,\ldots,m\}$ and $\beta\subseteq\{1,\ldots,n\}$ , $X_{\alpha\beta}$ means the submatrix consists of those entries $X_{ij}$ with $i\in\alpha$ and $j\in\beta$ . We denote by $E$ and $e$ the matrix and the vector of all ones respectively whose dimension are known from the context, and by $I$ an identity matrix whose dimension is known from the context. For a given set $S$ , $\delta_{S}$ denotes the indicator function of $S$ , i.e., $\delta_{S}(x)=0$ if $x\in S$ , otherwise $\delta_{S}(x)=+\infty$ . For a given vector space $\mathbb{Z}$ , $\mathbb{B}_{\mathbb{Z}}$ denotes the closed unit ball centered at the origin of $\mathbb{Z}$ , and $\mathbb{B}_{\delta}(z)$ means the closed ball of radius $\delta$ centered at $z\in\mathbb{Z}$ .

2.1 Normal cones and generalized differentials

Let $S\subset\mathbb{Z}$ be a given set. The regular normal cone to $S$ at a point $\overline{z}\in S$ is defined by

[TABLE]

where the symbol $z\xrightarrow[S]{}\overline{z}$ signifies $z\to\overline{z}$ with $z\in S$ , while the limiting normal cone to $S$ at $\overline{z}$ is defined as the outer limit of $\widehat{\mathcal{N}}_{S}(z)$ as $z\xrightarrow[S]{}\overline{z}$ , i.e.,

[TABLE]

The limiting normal cone $\mathcal{N}_{S}(\overline{z})$ is generally not convex, but the regular normal $\widehat{\mathcal{N}}_{S}(\overline{z})$ is always closed convex which is the negative polar of the contingent cone to $S$ at $\overline{x}$ :

[TABLE]

When $S$ is convex, $\mathcal{N}_{S}(\overline{z})$ and $\mathcal{\widehat{N}}_{S}(\overline{z})$ are the normal cone in the sense of convex analysis [31]. The directional limiting normal cone to $S$ at $\overline{z}$ in a direction $u\in\mathbb{X}$ is defined by

[TABLE]

By comparing with the definition of $\mathcal{N}_{S}(\overline{z})$ , clearly, $\mathcal{N}_{S}(\overline{z};u)\subseteq\mathcal{N}_{S}(\overline{z})$ for any $u\in\mathbb{X}$ .

Let $g\!:\mathbb{Z}\to[-\infty,+\infty]$ be an extended real-valued lsc function with $g(\overline{z})$ finite. The regular subdifferential of $g$ at $\overline{z}$ , denoted by $\widehat{\partial}g(\overline{z})$ , is defined as

[TABLE]

and the (limiting) subdifferential of $g$ at $\overline{z}$ , denoted by $\partial g(\overline{z})$ , is defined as

[TABLE]

From [32, Theorem 8.9] we know that there is close relation between the subdifferentials of $g$ at $\overline{z}$ and the normal cones of its epigraph at $(\overline{z},g(\overline{z}))$ . Also, from [32, Exercise 8.14],

[TABLE]

In the sequel, we call a point $z$ at which $0\in\partial g(z)$ (respectively, $0\in\widehat{\partial}g(z)$ ) is called a limiting (respectively, regular) critical point of $g$ . By [32, Theorem 10.1], a local minimizer of $g$ is necessarily a regular critical point of $g$ , and then a limiting critical point.

2.2 Lipschitz-like properties of multifunctions

Let $\mathcal{F}\!:\mathbb{Z}\rightrightarrows\mathbb{W}$ be a given multifunction. Consider an arbitrary point $(\overline{z},\overline{w})\in{\rm gph}\mathcal{F}$ at which $\mathcal{F}$ is locally closed, where ${\rm gph}\mathcal{F}$ denotes the graph of $\mathcal{F}$ . We recall from [32, 6] the concepts of the Aubin property, calmness and metric subregularity of $\mathcal{F}$ .

Definition 2.1

The multifunction $\mathcal{F}$ is said to have the Aubin property at $\overline{z}$ for $\overline{w}$ with modulus $\kappa>0$ , if there exist $\varepsilon>0$ and $\delta>0$ such that for all $z,z^{\prime}\in\mathbb{B}_{\varepsilon}(\overline{z})$ ,

[TABLE]

Definition 2.2

The multifunction $\mathcal{F}$ is said to be calm at $\overline{z}$ for $\overline{w}$ with modulus $\kappa>0$ if there exist $\varepsilon>0$ and $\delta>0$ such that for all $z\in\mathbb{B}_{\varepsilon}(\overline{z})$ ,

[TABLE]

If in addition $\mathcal{F}(\overline{z})\cap\mathbb{B}_{\delta}(\overline{w})=\{\overline{w}\}$ , $\mathcal{F}$ is said to be isolated calm at $\overline{z}$ for $\overline{w}$ .

By [6, Exercise 3H.4], the restriction on $z\in\mathbb{B}_{\varepsilon}(\overline{z})$ in Definition 2.2 can be removed. It is easily seen that the calmness of $\mathcal{F}$ is a “one-point” variant of the Aubin property, and the calmness of $\mathcal{F}$ at $(\overline{z},\overline{w})\in{\rm gph}\mathcal{F}$ is implied by its Aubin property or isolated calmness at this point. Notice that the calmness of $\mathcal{F}$ at $\overline{z}$ for $\overline{w}\in\mathcal{F}(\overline{z})$ is equivalent to the metric subregularity of $\mathcal{F}^{-1}$ at $\overline{w}$ for $\overline{z}\in\mathcal{F}^{-1}(\overline{w})$ by [6, Theorem 3H.3].

The coderivative and graphical derivative of $\mathcal{F}$ are an convenient tool to characterize the Aubin property and the isolated calmness of $\mathcal{F}$ , respectively. Recall from [32] that the coderivative of $\mathcal{F}$ at $\overline{z}$ for $\overline{w}$ is the mapping $D^{*}\mathcal{F}(\overline{z}|\overline{w})\!:\mathbb{W}\rightrightarrows\mathbb{Z}$ defined by

[TABLE]

and the graphical derivative of $\mathcal{F}$ at $\overline{z}$ for $\overline{w}$ is the mapping $D\mathcal{F}(\overline{z}|\overline{w})\!:\mathbb{Z}\rightrightarrows\mathbb{W}$ given by

[TABLE]

Lemma 2.1

(See [25, Theorem 5.7] or [32, Theorem 9.40]) Suppose that $\mathcal{F}$ is locally closed at $(\overline{z},\overline{w})$ . Then $\mathcal{F}$ has the Aubin property at $\overline{z}$ for $\overline{w}$ iff $D^{*}\mathcal{F}(\overline{z}|\overline{w})(0)=\{0\}$ .

Lemma 2.2

(See [14, Proposition 2.1] or [18, Proposition 4.1]) Suppose that $\mathcal{F}$ is locally closed at $(\overline{z},\overline{w})$ . Then $\mathcal{F}$ is isolated calm at $\overline{z}$ for $\overline{w}$ iff $D\mathcal{F}(\overline{z}|\overline{w})(0)=\{0\}$ .

2.3 Coderivative of the subdifferential mapping $\partial\|\cdot\|_{*}$

For a given $X\in\mathbb{R}^{m\times n}$ with SVD as $U[{\rm Diag}(\sigma(X))\ \ 0]V^{\mathbb{T}}$ , by [35, Example 2] we have

[TABLE]

where $U_{1}$ and $V_{1}$ are the submatrix consisting of the first $r={\rm rank}(X)$ columns of $U$ and $V$ , respectively, and $U_{2}$ and $V_{2}$ are the submatrix consisting of the last $m\!-r$ columns and $n-r$ columns of $U$ and $V$ , respectively. In this part we recall from [22] the coderivative of the subdifferential mapping $\partial\|\cdot\|_{*}$ . For this purpose, in the sequel for two positive integers $k_{1}$ and $k_{2}$ with $k_{2}\geq k_{1}$ , we denote by $[k_{1},k_{2}]$ the set $\{k_{1},k_{1}\!+\!1,\ldots,k_{2}\}$ . For a given $\overline{Z}\in\mathbb{R}^{m\times n}$ , define the following index sets associated to its singular values:

[TABLE]

and let $\Omega_{1},\Omega_{2}\in\mathbb{S}^{m}$ and $\Omega_{3}\in\mathbb{R}^{m\times(n-m)}$ be the matrices associated to $\sigma(\overline{Z})$ given by

[TABLE]

With the matrices $\Omega_{1},\Omega_{2}\in\mathbb{S}^{m}$ and $\Omega_{3}\in\mathbb{R}^{m\times(n-m)}$ , we define the following matrices

[TABLE]

For the index set $\beta$ , we denote the set of all partitions of $\beta$ by $\mathscr{P}(\beta)$ . Define the set

[TABLE]

For any $z\in\mathbb{R}_{>}^{|\beta|}$ , let $D(z)\in\mathbb{S}^{|\beta|}$ denote the first generalized divided difference matrix of $h(t)=\min(1,t)$ at $z$ , which is defined as

[TABLE]

Write $\mathcal{U}_{|\beta|}:=\big{\{}\overline{\Omega}\in\mathbb{S}^{|\beta|}\!:\ \overline{\Omega}=\lim_{k\to\infty}D(z^{k}),\,z^{k}\to e_{|\beta|},\,z^{k}\in\mathbb{R}_{>}^{|\beta|}\big{\}}.$ For each $\Xi_{1}\in\mathcal{U}_{|\beta|}$ , by equation (13) there exists a partition $(\beta_{+},\beta_{0},\beta_{-})\in\mathscr{P}(\beta)$ such that

[TABLE]

where each entry of $(\Xi_{1})_{\beta_{+}\beta_{-}}$ belongs to $[0,1]$ . Let $\Xi_{2}$ be the matrix associated to $\Xi_{1}$ :

[TABLE]

Now we are in a position to give the coderivative of the subdifferential mapping $\partial\|\cdot\|_{*}$ .

Lemma 2.3

(See [22, Theorem 3.2]) Fix an arbitrary $(X,W)\in\!{\rm gph}\,\partial\|\cdot\|_{*}$ and let $\alpha,\beta,\gamma$ and $c$ be defined by (10a)-(10b) with $\overline{Z}\!=X\!+W$ . Let $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{Z})$ with $\overline{V}=[\overline{V}_{1}\ \ \overline{V}_{2}]$ where $\overline{V}_{1}\in\mathbb{O}^{n\times m}$ and $\overline{V}_{2}\in\mathbb{O}^{n\times(n-m)}$ , and for each $H\in\mathbb{R}^{m\times n}$ write $\widetilde{H}=\overline{U}^{\mathbb{T}}\!H\overline{V}$ and $\widetilde{H}_{1}=\overline{U}^{\mathbb{T}}\!H\overline{V}_{1}$ . Then, $(G,H)\in\mathcal{N}_{{\rm gph}\,\partial\|\cdot\|_{*}}(X,W)$ iff the following relations hold

[TABLE]

where $\mathcal{S}\!:\mathbb{R}^{m\times m}\to\mathbb{S}^{m}$ and $\mathcal{X}\!:\mathbb{R}^{m\times m}\to\mathbb{R}^{m\times m}$ are linear the mappings defined by

[TABLE]

and the notation “ $\circ$ ” denotes the Hardmard product operator of two matrices.

3 Four classes of stationary points and their relations

To introduce the four classes of stationary points for the problem (1), with each $\phi\in\!\mathscr{L}$ , we write $\widehat{\phi}(t)\!:=\!\phi(|t|)$ for $t\in\mathbb{R}$ and $\widehat{\Phi}(x)\!:={\textstyle\sum_{i=1}^{m}}\widehat{\phi}(x_{i})$ for $x\in\mathbb{R}^{m}$ ; and with the associated $\psi$ , write $\widehat{\psi}(t)\!:=\!\psi(|t|)$ for $t\in\mathbb{R}$ and $\widehat{\Psi}(x)\!:={\textstyle\sum_{i=1}^{m}}\widehat{\psi}(x_{i})$ for $x\in\mathbb{R}^{m}$ . Clearly, $\widehat{\Phi}$ and $\widehat{\Psi}$ are absolutely symmetric, i.e., $\widehat{\Phi}(Px)=\widehat{\Phi}(x)$ and $\widehat{\Psi}(Px)=\widehat{\Psi}(x)$ for any $m\times m$ signed permutation matrix $P$ . Also, $\widehat{\Phi}\circ\sigma$ is globally Lipschitz continuous over the ball $\mathbb{B}$ . The following equivalent relations are often used in the subsequent analysis

[TABLE]

3.1 R-stationary point

Recall that $\overline{X}\in\mathbb{R}^{m\times n}$ is a regular critical point of $F$ if $0\in\widehat{\partial}F(\overline{X})$ . Since the rank function is regular by [37, Lemma 2.1] and [19, Corollary 7.5], by combining with the assumption on $f$ , from [32, Corollary 10.9] we have $\widehat{\partial}F(\overline{X})\supseteq\partial\!f(\overline{X})+\partial{\rm rank}(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ . In view of this, we introduce the following R-stationary point of the problem (1).

Definition 3.1

A matrix $\overline{X}\in\mathbb{R}^{m\times n}$ is called a R-stationary point of the problem (1) if

[TABLE]

Remark 3.1

Clearly, every R-stationary point of (1) is a regular critical point of $F$ . By the given assumption on $f$ and [32, Exercise 10.10], for any $X\in\Omega$ it holds that

[TABLE]

Thus, when $\partial({\rm rank}+\delta_{\Omega})(\overline{X})\subset\partial{\rm rank}(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ , the limiting critical point of $F$ is same as its regular critical point, and coincides with the R-stationary point of (1).

3.2 M-stationary point

By invoking the relation (18b), clearly, the MPEC (1) can be compactly written as

[TABLE]

Moreover, under a suitable constraint qualification (CQ), the following inclusion holds:

[TABLE]

Motivated by this, we introduce the M-stationary point of the problem (1) as follows.

Definition 3.2

A matrix $\overline{X}\in\mathbb{R}^{m\times n}$ is called an M-stationary point of the problem (1) associated to $\phi\in\mathscr{L}$ if there exist $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ and $\Delta W\in\partial(\widehat{\Phi}\circ\sigma)(\overline{W})$ such that

[TABLE]

Remark 3.2

When $\Omega\subseteq\mathbb{S}_{+}^{n}$ , the rank regularized problem (1) can be reformulated as

[TABLE]

Notice that $\langle I-W,X\rangle=0,X\in\mathbb{S}_{+}^{n},W\in\mathbb{S}_{+}^{n}$ and $I-W\in\mathbb{S}_{+}^{n}$ iff $(X,W\!-I)\in{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}$ and $W\in\mathbb{S}_{+}^{n}$ . So, for this case, $\overline{X}\in\Omega$ is an M-stationary point if and only if there exist $(\overline{Y},\Delta Y)\in\mathbb{S}_{+}^{n}\times\mathbb{S}^{n}$ with $\overline{Y}\!-I\in\mathcal{N}_{\mathbb{S}_{+}^{n}}(\overline{X})$ and $\Delta Y\in\partial(\widehat{\Phi}\circ\sigma)(\overline{Y})+\mathcal{N}_{\mathbb{S}_{+}^{n}}(\overline{Y})$ such that

[TABLE]

or equivalently, there exist $\overline{Y}\!\in\mathbb{S}_{+}^{n}$ with $\overline{Y}\!-I\in\mathcal{N}_{\mathbb{S}_{+}^{n}}(\overline{X})$ and $(\overline{\Gamma}_{1},\overline{\Gamma}_{2})\in\mathbb{S}^{n}\times\mathbb{S}^{n}$ such that

[TABLE]

For this class of stationary points, we have the following proposition that is the key to achieve the relation between the M-stationary point and the R-stationary point.

Proposition 3.1

Let $\mathscr{L}_{1}$ denote the family of those $\phi\in\!\mathscr{L}$ that is differentiable on $(0,1]$ . If $\overline{X}$ is an M-stationary point of the problem (1) associated to $\phi\in\!\mathscr{L}_{1}$ , then there exist $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ and $\Delta\Gamma\in\nu\partial\!f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ such that for the index sets $\alpha,\beta,c,\gamma,\gamma_{1}$ and $\gamma_{0}$ defined as in (10a)-(10b) with $\overline{Z}=\overline{X}+\overline{W}$ and $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{Z})$ ,

[TABLE]

In particular, if $\phi^{\prime}(t)\neq 0$ for all $t\in(0,1)$ , then $\gamma_{1}=\emptyset$ ; and if $0\notin\partial\widehat{\phi}(0)$ , then $\gamma_{0}=\emptyset$ .

Proof: Let $\overline{X}$ be an M-stationary point of the problem (1) associated to $\phi\in\!\mathscr{L}_{1}$ . By Definition 3.2, there exist $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ and $\Delta W\in\partial(\widehat{\Phi}\circ\sigma)(\overline{W})$ such that (20) holds. So, there exists $\Delta\Gamma\in\nu\partial f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ such that $-\Delta\Gamma\in D^{*}\partial\|\cdot\|_{*}(\overline{X}|\overline{W})(\Delta W)$ . We argue that $\Delta\Gamma$ has the form of (23). Since $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{Z})$ , from $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ ,

[TABLE]

Since $\widehat{\Phi}$ is absolutely symmetric and $\Delta W\!\in\partial(\widehat{\Phi}\circ\sigma)(\overline{W})$ , by [20, Corollary 2.5] and equation (24b) there exist $(\widehat{U},\widehat{V})\in\mathbb{O}^{m,n}(\overline{W})$ and $\overline{w}\in\partial\widehat{\Phi}(\sigma(\overline{W}))$ such that

[TABLE]

From $\overline{w}\in\partial\widehat{\Phi}(\sigma(\overline{W}))$ , the definition of $\widehat{\Phi}$ and equation (24b), it follows that

[TABLE]

Without loss of generality, we assume that the matrix $\overline{Z}$ has $r$ distinct singular values belonging to $(0,1)$ . Let $\overline{\mu}_{1}>\overline{\mu}_{2}>\cdots>\overline{\mu}_{r}$ be the $r$ distinct singular values and write

[TABLE]

Since $(\widehat{U},\widehat{V})\in\mathbb{O}^{m,n}(\overline{W})$ , from equation (24b) and [4, Proposition 5], there exist a block diagonal matrix $\widehat{Q}={\rm Diag}(Q_{0},Q_{1},\ldots,Q_{r})$ with $Q_{0}\in\mathbb{O}^{|\alpha|+|\beta|}$ and $Q_{k}\in\mathbb{O}^{|a_{k}|}$ for $k=1,2,\ldots,r$ , and orthogonal matrices $Q^{\prime}\in\mathbb{O}^{|\gamma_{0}|}$ and $Q^{\prime\prime}\in\mathbb{O}^{|\gamma_{0}\cup c|}$ such that

[TABLE]

Together with equations (25) and (26), it is not difficult to obtain that

[TABLE]

and consequently

[TABLE]

Since $(-\Delta\Gamma,-\Delta W)\in\!\mathcal{N}_{{\rm gph}\,\partial\|\cdot\|_{*}}(\overline{X},\overline{W})$ , by equation (16a)-(16b) of Lemma 2.3, we get

[TABLE]

where $\Delta\widetilde{\Gamma}_{1}\!:=\overline{U}^{\mathbb{T}}\Delta\Gamma\overline{V}_{\!\alpha\cup\beta\cup\gamma},\Delta\widetilde{W}_{1}\!:=\overline{U}^{\mathbb{T}}\Delta W\overline{V}_{\!\alpha\cup\beta\cup\gamma}$ , and the matrices $\Theta_{1},\Theta_{2},\Sigma_{1}$ and $\Sigma_{2}$ are defined as in Section 2.3. Notice that $[\Delta\widetilde{W}_{1}]_{\alpha\cup\beta\cup\gamma_{1},\alpha\cup\beta\cup\gamma_{1}}$ is a diagonal matrix by equation (27). Together with (28a)-(28b) and (11c)-(11i), it follows that

[TABLE]

Notice that (29b) is equivalent to $(E+\Sigma_{2})_{\alpha\alpha}(\Delta\widetilde{\Gamma}_{1})_{\alpha\alpha}+(E-\Sigma_{2})_{\alpha\alpha}(\Delta\widetilde{\Gamma}_{1}^{\mathbb{T}})_{\alpha\alpha}=0$ which, by the fact that the entries of $\Sigma_{2}$ belongs to $(0,1)$ , implies that $(\Delta\widetilde{\Gamma}_{1})_{\alpha\alpha}=0$ . Notice that equations (29c) and (29d) can be equivalently written as

[TABLE]

Since $[(\Delta\widetilde{\Gamma}_{1})_{\beta\alpha}]^{\mathbb{T}}=(\Delta\widetilde{\Gamma}_{1}^{\mathbb{T}})_{\alpha\beta}$ and $[(\Delta\widetilde{\Gamma}_{1}^{\mathbb{T}})_{\beta\alpha}]^{\mathbb{T}}=(\Delta\widetilde{\Gamma}_{1})_{\alpha\beta}$ , by imposing the transpose to the both sides of equality (30b) we immediately obtain that

[TABLE]

where “ $\oslash$ ” denotes the entries division operator of two matrices. Substituting this equality into (30a) yields that $(\Delta\widetilde{\Gamma}_{1})_{\alpha\beta}=0$ , and then $(\Delta\widetilde{\Gamma}_{1})_{\beta\alpha}=0$ . Similarly, from (29e) and (29f), we can obtain $(\Delta\widetilde{\Gamma}_{1})_{\alpha\gamma}=0$ and $(\Delta\widetilde{\Gamma}_{1})_{\beta\gamma}=0$ . Thus,

[TABLE]

Thus, to complete the proof of the first part, we only need to argue that $\mathcal{S}[(\Delta\widetilde{\Gamma})_{\beta\beta}]=0$ . Since $(-\Delta\Gamma,-\Delta W)\in\!\mathcal{N}_{{\rm gph}\,\partial\|\cdot\|_{*}}(\overline{X},\overline{W})$ , by (16f) there exist $Q\in\mathbb{O}^{|\beta|}$ and $\Xi_{1}\in\mathcal{U}_{|\beta|}$ having the form (14) for some partition $(\beta_{+},\beta_{0},\beta_{-})$ of $\beta$ such that

[TABLE]

where the matrix $\Xi_{2}$ associated with $\Xi_{1}$ has the form of (15). From (27) and the first equality in (26), $(\Delta\widetilde{W})_{\beta\beta}={\rm Diag}(\overline{w}_{\beta})=\phi^{\prime}(1)I$ . Notice that $\phi^{\prime}(1)>0$ by (2). We deduce $\beta_{0}=\emptyset$ from the second inequality of (32). Since $\mathcal{X}[Q^{\mathbb{T}}(\Delta\widetilde{W})_{\beta\beta}Q]=0$ , (31) reduces to

[TABLE]

Since $Q^{\mathbb{T}}{\rm Diag}(\overline{w}_{\beta})Q\succ 0$ , by using the expressions of $\Xi_{1}$ and $\Xi_{2}$ we have $\beta_{-}=\emptyset$ , and then the last equality reduces to $0=\mathcal{S}\big{[}Q^{\mathbb{T}}(\Delta\widetilde{\Gamma})_{\beta\beta}Q\big{]}=\mathcal{S}[(\Delta\widetilde{\Gamma})_{\beta\beta}]$ . Thus, we complete the proof of the first part. By combining $(\Delta\widetilde{W})_{\gamma\gamma}=0$ with (27) and (26), it is easy to see that if $\phi^{\prime}(t)\neq 0$ for any $t\in(0,1)$ , then $\gamma_{1}=\emptyset$ ; and if $0\notin\partial\widehat{\phi}(0)$ , then $\gamma_{0}=\emptyset$ . $\Box$

Now we state the relation between the M-stationary point and the R-stationary point.

Theorem 3.1

If $\overline{X}$ is an M-stationary point of the problem (1) associated to $\phi\in\!\mathscr{L}_{1}$ , then it is also a R-stationary point. Conversely, if $\overline{X}$ is a R-stationary point of (1), then it is an M-stationary point associated to those $\phi\in\!\mathscr{L}$ with $0\in\partial\widehat{\phi}(0)$ .

Proof: Let $\overline{X}$ be an M-stationary point of (1) associated to $\phi\in\!\mathscr{L}_{1}$ . By Proposition 3.1, there exist $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ and $\Delta\Gamma\in\nu\partial\!f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ such that for the index sets $\alpha,\beta,c,\gamma,\gamma_{1},\gamma_{0}$ defined as in (10a)-(10b) with $\overline{Z}=\overline{X}+\overline{W}$ and $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{Z})$ , the matrix $\Delta\Gamma$ takes the form of (23). Let $\Delta\widetilde{Z}=\left[\begin{matrix}(\Delta\widetilde{\Gamma})_{\beta\beta}&(\Delta\widetilde{\Gamma})_{\beta\gamma}&(\Delta\widetilde{\Gamma})_{\beta c}\\ (\Delta\widetilde{\Gamma})_{\gamma\beta}&(\Delta\widetilde{\Gamma})_{\gamma\gamma}&(\Delta\widetilde{\Gamma})_{\gamma c}\end{matrix}\right].$ Take $(P,P^{\prime})\in\mathbb{O}^{m-|\alpha|,n-|\alpha|}(\Delta\widetilde{Z})$ . Write $\widetilde{U}=[\overline{U}_{\alpha}\ \ \overline{U}_{\!\beta\cup\gamma}P]$ and $\widetilde{V}=[\overline{V}_{\alpha}\ \ \overline{V}_{\!\beta\cup\gamma\cup c}P^{\prime}]$ . Then,

[TABLE]

By the definitions of $\widetilde{U}$ and $\widetilde{V}$ and (24a), it is easy to check that $(\widetilde{U},\widetilde{V})\in\mathbb{O}^{m,n}(\overline{X})$ . Notice that ${\rm rank}(\overline{X})=|\alpha|$ . From [17, Theorem 4], we have $-\Delta\Gamma\in\partial{\rm rank}(\overline{X})$ . Thus, $0\in\partial{\rm rank}(\overline{X})+\nu\partial\!f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ . From Definition 3.1, $\overline{X}$ is a R-stationary point.

Now let $\overline{X}$ be a R-stationary point of (1) with ${\rm rank}(\overline{X})=\overline{r}$ . Suppose that $\overline{r}>1$ . Take $\phi\in\!\mathscr{L}$ with $0\in\partial\widehat{\phi}(0)$ . By Definition 3.1, there is $\Delta\Gamma\in\!\nu\partial\!f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ such that $-\Delta\Gamma\in\partial{\rm rank}(\overline{X})$ . Along with [17, Theorem 4], there exists $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{X})$ such that

[TABLE]

Next we proceed the arguments by $t^{*}=0$ and $t^{*}\neq 0$ , where $t^{*}$ is same as in (2).

Case 1: $t^{*}=0$ . Take $\overline{W}:=\overline{U}_{1}\overline{V}_{1}^{\mathbb{T}}$ , where $\overline{U}_{1}$ and $\overline{V}_{1}$ are the matrix consisting of the first $\overline{r}$ columns of $\overline{U}$ and $\overline{V}$ , respectively. Clearly, $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ and $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{Z})$ with $\overline{Z}=\overline{X}+\overline{W}$ . Let $\alpha,\beta,c,\gamma_{0},\gamma_{1}$ be defined as before. Clearly, $\beta=\emptyset=\gamma_{1}$ . Take

[TABLE]

Since $\phi$ is convex, from [32, Proposition 10.19(i)] it follows that $\overline{w}_{i}\in\partial\widehat{\phi}(1)$ for $i\in\alpha$ . Then $\Delta W\!=\overline{U}[{\rm Diag}(\overline{w})\ \ 0]\overline{V}^{\mathbb{T}}\!\in\partial(\widehat{\Phi}\circ\sigma)(\overline{W})$ . Let $\Delta\widetilde{\Gamma}\!:=\overline{U}^{\mathbb{T}}\Delta\Gamma\overline{V}$ and $\Delta\widetilde{W}\!:=\overline{U}^{\mathbb{T}}\Delta W\overline{V}$ . Clearly, $\mathcal{X}(\Delta\widetilde{\Gamma}_{1})=\mathcal{X}(\Delta\widetilde{W}_{1})=0$ where $\Delta\widetilde{\Gamma}_{1}:=\overline{U}^{\mathbb{T}}\Delta\Gamma\overline{V}_{1}$ and $\Delta\widetilde{W}_{1}:=\overline{U}^{\mathbb{T}}\Delta W\overline{V}_{1}$ with $\overline{V}_{1}$ being the matrix consisting of the first $m$ columns of $\overline{V}$ . Together with $\Theta_{2}$ and $\Sigma_{2}$ defined as in Section 2.3, it is immediate to verify that $(-\Delta\widetilde{\Gamma},-\Delta\widetilde{W})$ satisfies

[TABLE]

Since $\beta=\emptyset$ , from Lemma 2.3 it follows that $(-\Delta\Gamma,-\Delta W)\in\mathcal{N}_{{\rm gph}\,\partial\|\cdot\|_{*}}(\overline{X},\overline{W})$ , i.e., $-\Delta\Gamma\in\!D^{*}\partial\|\cdot\|_{*}(\overline{X}|\overline{W})(\Delta W)$ . By Definition 3.2, $\overline{X}$ is M-stationary associated to $\phi$ .

Case 2: $t^{*}\neq 0$ . Now $t^{*}\in(0,1)$ . Take $\overline{W}:=\overline{U}_{1}\overline{V}_{1}^{\mathbb{T}}+t^{*}\overline{U}_{2}\overline{V}_{2}^{\mathbb{T}}$ , where $\overline{U}_{2}$ and $\overline{V}_{2}$ are the matrix consisting of the last $m-\overline{r}$ and $n-\overline{r}$ columns of $\overline{U}$ and $\overline{V}$ , respectively. Clearly, $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ and $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{Z})$ with $\overline{Z}=\overline{X}+\overline{W}$ . Let $\alpha,\beta,c$ and $\gamma=\gamma_{0}\cup\gamma_{1}$ be defined as before. Then $\beta=\emptyset$ and $\gamma_{0}=\emptyset$ . Let $\Delta W=\overline{U}[{\rm Diag}(\overline{w})\ \ 0]\overline{V}^{\mathbb{T}}$ with

[TABLE]

Using the same arguments as those for Case 1 can prove that $\overline{X}$ is M-stationary.

When $\overline{r}=0$ , choose $\overline{W}=0$ . Clearly, $\overline{W}\in\partial\|\cdot\|_{*}(\overline{X})$ since $\overline{X}=0$ . Write $\overline{Z}=\overline{X}+\overline{W}$ . Then, $\alpha=\beta=\emptyset=\gamma_{1}$ . Take $\Delta W=0$ . Since $0\in\partial\widehat{\phi}(0)$ , we have $\Delta W\in\partial(\widehat{\Phi}\circ\sigma)(\overline{W})$ . Moreover, by Lemma 2.3 it is easy to check that $D^{*}\partial\|\cdot\|_{*}(\overline{X}|\overline{W})(\Delta W)=\mathbb{R}^{m\times n}.$ Thus, $\overline{X}$ is M-stationary associated to $\phi$ . The proof is then completed. $\Box$

To close this subsection, we provide a condition for a local minimizer of the MPEC (1) associated with $\phi\in\!\mathscr{L}$ to be an M-stationary point associated to $\phi$ .

Proposition 3.2

Let $(\overline{W},\overline{X})$ be a local minimizer of the MPEC (1) associated to $\phi\in\!\mathscr{L}$ . Then $\overline{X}$ is an M-stationary point of the problem (1) associated to $\phi$ , provided that

[TABLE]

where $\mathbb{B}:=\{Z\in\mathbb{R}^{m\times n}\,|\,\|Z\|\leq 1\}$ , and if in addition $\phi\in\!\mathscr{L}_{1}$ , $\overline{X}$ is a $R$ -stationary point.

Proof: By invoking the relation (18a), $(W,X)$ is a feasible point of (1) if and only if $(W,X)\in{\rm gph}\,\mathcal{N}_{\mathbb{B}}\cap(\mathbb{R}^{m\times n}\times\Omega)$ . This implies that (1) can be compactly written as

[TABLE]

From the local optimality of $(\overline{W},\overline{X})$ , the assumption on $f$ , the Lipschitz continuity of $\widehat{\Phi}\circ\sigma$ over the ball $\mathbb{B}$ , and [32, Theorem 10.1 $\&$ Exercise 10.10], it follows that

[TABLE]

where $\widetilde{f}(W,X)\equiv\nu f(X)$ . Together with the inclusion (36) and [32, Exercise 10.10],

[TABLE]

which is equivalent to saying that there exists $(-\Delta W,\Delta X)\in\mathcal{N}_{{\rm gph}\,\mathcal{N}_{\mathbb{B}}}(\overline{W},\overline{X})$ such that

[TABLE]

Notice that $(-\Delta W,\Delta X)\in\mathcal{N}_{{\rm gph}\,\mathcal{N}_{\mathbb{B}}}(\overline{W},\overline{X})$ if and only if $\Delta X\in D^{*}\partial\|\cdot\|_{*}(\overline{X}|\overline{W})(\Delta W)$ . So, equation (37) is equivalent to saying that there exists $\Delta W\in\partial(\widehat{\Phi}\circ\sigma)(\overline{W})$ such that

[TABLE]

In addition, notice that $\overline{X}\in\mathcal{N}_{\mathbb{B}}(\overline{W})$ which is equivalent to $\partial\|\cdot\|_{*}(\overline{X})$ by (18b). Thus, by Definition 3.2, $\overline{X}$ is an M-stationary point of the problem (1) associated to $\phi$ . The second part is a direct consequence of Theorem 3.1. The proof is completed. $\Box$

Remark 3.3

(i)* If $\Omega=\mathbb{R}^{m\times n}$ , the inclusion (36) automatically holds. If $\Omega\subset\mathbb{R}^{m\times n}$ , by [13, Page 211] the inclusion (36) is implied by the calmness of the following multifunction*

[TABLE]

at the origin for $(\overline{W},\overline{X})$ , where $(\overline{W},\overline{X})$ is an arbitrary feasible point of the MPEC (1).

(ii)* When $\Omega\subseteq\mathbb{S}^{n}_{+}$ , together with (3.2) and the Lipschitz continuity of $\widehat{\Phi}\circ\sigma$ in $\mathbb{B}$ , in order to achieve the conclusion of Proposition 3.2, we need to replace the inclusion (36) by*

[TABLE]

where $C:=\{(W,X)\in\mathbb{S}_{+}^{n}\times\Omega\ |\ (X,W-I)\in{\rm gph}\,\mathcal{N}_{\mathbb{S}_{+}^{n}}\}$ . By invoking **[13, Page 211]**, this inclusion is implied by the calmness of the following multifunction

[TABLE]

at the origin for $(\overline{W},\overline{X})$ , where $(\overline{W},\overline{X})$ is an arbitrary feasible point of the MPEC (3.2). By the definition of calmness, it is easy to check that the calmness of $\mathcal{M}$ at the origin for $(\overline{W},\overline{X})$ is implied by that of $\mathcal{\widetilde{M}}$ in (4) with $\Omega_{x}=\Omega$ and $\Omega_{y}=\mathbb{S}_{+}^{n}$ at the corresponding point, while by Theorem 4.2 the latter holds if for any $0\neq H=(H_{1};H_{2})\in\!\mathcal{T}_{\Omega}(\overline{X})\times\mathcal{T}_{\mathbb{S}^{n}_{+}}(\overline{W})$ such that $(H_{1},H_{2})\in\!\mathcal{T}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}(\overline{X},\overline{W}-I),$ the following implication relation holds:

[TABLE]

For the characterization of $\mathcal{N}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}((\overline{X},\overline{W}-I);(H_{1},H_{2}))$ , please refer to Appendix.

3.3 EP-stationary points

By the definition of the function $\widehat{\Psi}$ , clearly, the problem (1) can be compactly written as

[TABLE]

Based on this equivalent reformulation, we introduce the following stationary point.

Definition 3.3

A matrix $\overline{X}\in\mathbb{R}^{m\times n}$ is said to be an EP-stationary point of the problem (1) associated to $\phi\in\!\mathscr{L}$ if there exist a constant $\rho>0$ and $\overline{W}\!\in\mathbb{B}$ such that

[TABLE]

Remark 3.4

By the given assumption on $f$ and the Lipschitz continuity of $\widehat{\Phi}\circ\sigma$ in $\mathbb{B}$ , if $(\overline{X},\overline{W})$ is a limiting critical point of the objective function of (43), it is an EP-stationary point of (1). Thus, every local optimal solution of (1) is an EP-stationary point of (1).

The following proposition characterizes a key property of the EP-stationary point.

Proposition 3.3

Suppose that $\overline{X}\in\mathbb{R}^{m\times n}$ is an EP-stationary point of (1) associated to $\phi\in\!\mathscr{L}$ . Then, there exist $\overline{W}\in\mathbb{B}$ and $(\overline{U},\overline{V})\in\mathbb{O}^{m,n}(\overline{W})\cap\mathbb{O}^{m,n}(\overline{X})$ such that ${\rm rank}(\overline{X})\geq|\{i\ |\ \sigma_{i}(\overline{W})>t^{*}\}|$ , and there exists $\Delta\Gamma\in\nu\partial\!f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ such that

[TABLE]

Proof: Since $\overline{X}$ is an EP-stationary point of the problem (1), there exist a constant $\rho>0$ and a matrix $\overline{W}\in\mathbb{B}$ such that the inclusions in (44) hold. Define the index sets

[TABLE]

Since $\rho\overline{X}\in\partial(\widehat{\Psi}\circ\sigma)(\overline{W})$ , by [20, Corollary 2.5] there exists $(\overline{U},\overline{V})\!\in\!\mathbb{O}^{m,n}(\overline{W})$ such that

[TABLE]

Notice that $\sigma_{1}(X)\geq\cdots\geq\sigma_{m}(X)$ with $\sigma_{i}(X)\in\partial\psi(1)$ for $i\in\theta_{1}$ , $\sigma_{i}(X)\in\partial\psi(\sigma_{i}(\overline{W}))$ for $i\in\theta_{2}$ and $\sigma_{i}(X)\in\partial\widehat{\psi}(0)$ for $i\in\theta_{0}$ . Since $\partial\psi(t)\subset(0,+\infty)$ for any $t>t^{*}$ , we have ${\rm rank}(\overline{X})\geq|\{i\ |\ \sigma_{i}(\overline{W})>t^{*}\}|$ , and the first part follows. Since $(\overline{X},\overline{W})$ satisfies the second inclusion of (44), there exist $\Delta\Gamma\in\nu\partial f(\overline{X})+\mathcal{N}_{\Omega}(\overline{X})$ such that $-\Delta\Gamma\in\rho[\partial\|\cdot\|_{*}(\overline{X})-\overline{W}]$ . Write $\overline{r}={\rm rank}(\overline{X})$ . From the SVD of $\overline{X}$ in the last equation and equation (9), we have

[TABLE]

where $\overline{U}_{1}$ and $\overline{V}_{1}$ are the matrix consisting of the first $\overline{r}$ columns of $\overline{U}$ and $\overline{V}$ , respectively, and $\overline{U}_{2}$ and $\overline{V}_{2}$ are the matrices consisting of the last $m-\overline{r}$ and $n-\overline{r}$ columns of $\overline{U}$ and $\overline{V}$ , respectively. Together with $-\Delta\Gamma\in\rho[\partial\|\cdot\|_{*}(\overline{X})-\overline{W}]$ and $\overline{W}=\overline{U}[{\rm Diag}(\sigma(\overline{W}))\ \ 0]\overline{V}^{\mathbb{T}}$ , the inclusion in (45) holds. In fact, the matrix $Z$ in the set of (45) has the following form

[TABLE]

for some $\Gamma\in\mathbb{R}^{(m-\overline{r})\times(n-\overline{r})}$ with $\|\Gamma\|\leq 1$ , where $\theta_{2}^{\prime}:=\{i\in\theta_{2}\ |\ \sigma_{i}(\overline{W})\leq t^{*}\}$ . $\Box$

Remark 3.5

If $\overline{X}$ is an EP-stationary point of (1) and the associated $\overline{W}\in\mathbb{B}$ is such that $|\theta_{1}|={\rm rank}(\overline{X})$ , then by Definition 3.1 and [17, Theorem 4] $\overline{X}$ is a R-stationary point of (1). However, when $\overline{X}$ is a R-stationary point, it is not necessarily EP-stationary.

3.4 DC-stationary point

With the conjugate $\widehat{\Psi}^{*}$ of $\widehat{\Psi}$ , the surrogate problem (6) can be equivalently written as

[TABLE]

By [20, Lemma 2.3], we know that $\widehat{\Psi}^{*}$ is also absolutely symmetric. Along with its lsc and convexity, from [20, Corollary 2.6] it follows that $\widehat{\Psi}^{*}\circ\sigma$ is an absolutely symmetric convex function on $\mathbb{R}^{m}$ . Thus, $\delta_{\Omega}(X)+\rho\|X\|_{*}-(\widehat{\Psi}^{*}\circ\sigma)(\rho X)$ is a DC function on $\mathbb{R}^{m\times n}$ . In view of this, we present the following DC-stationary point by the reformulation (46).

Definition 3.4

A matrix $\overline{X}\in\mathbb{R}^{m\times n}$ is called a DC-stationary point of the problem (1) associated to $\phi\in\!\mathscr{L}$ if there exists a constant $\rho>0$ such that

[TABLE]

When $f$ is convex, the problem (46) is a DC program, and now $\overline{X}\in\mathbb{R}^{m\times n}$ is a DC-stationary point if and only if it is a critical point of the objective function of (46) defined by Pang et al.[29]. It is worthwhile to point out that the limiting critical point of the objective function of (46) is a DC-stationary point, but the converse does not hold. For the discussion on the DC-stationary point, the reader may refer to [29]. Here, we focus on the relation between the DC-stationary point and the EP-stationary point.

Theorem 3.2

Let $\overline{X}$ be a DC-stationary point of (1) associated to $\phi\in\!\mathscr{L}$ . Suppose that

[TABLE]

Then $\overline{X}$ is an EP-stationary point. Conversely, if $\overline{X}$ is an EP-stationary point associated to $\phi\in\Phi$ with $\phi$ nondecreasing on $[0,1]$ , then $\overline{X}$ is necessarily a DC-stationary point.

Proof: From the symmetry of $\widehat{\psi}$ , it follows that $\widehat{\psi}^{*}(s)=\psi^{*}(|s|)$ for any $s\in\mathbb{R}$ . Together with the given assumption, we have $(\widehat{\psi}^{*})^{\prime}(s)=(\psi^{*})^{\prime}(s)$ for any $s\geq 0$ . By the differentiability of $\widehat{\psi}^{*}$ on $\mathbb{R}_{+}$ , clearly, $\widehat{\Psi}^{*}$ is differentiable on $\mathbb{R}_{+}^{m}$ . Along with its absolute symmetry and convexity, from [20, Theorem 3.1] it follows that $\widehat{\Psi}^{*}\circ\sigma$ is differentiable in $\mathbb{R}^{m\times n}$ , and consequently $\partial(\widehat{\Psi}^{*}\circ\sigma)(\rho\overline{X})=\{\nabla(\widehat{\Psi}^{*}\circ\sigma)(\rho\overline{X})\}.$ Since $\overline{X}$ is a DC-stationary point of (1), there exists a constant $\rho>0$ such that (47) holds. Take $(\overline{U},\overline{V})\in\!\mathbb{O}^{m,n}(\overline{X})$ . Let

[TABLE]

Since $\psi$ is a closed proper convex function, we have ${\rm range}\,\partial\psi^{*}\subseteq{\rm dom}\psi=[0,1]$ by [31, Section 23], which implies that $\overline{w}_{i}\in[0,1]$ for $i=1,\ldots,m$ and consequently $\|\overline{W}\|\leq 1$ . Combining $\overline{w}_{i}=(\psi^{*})^{\prime}(\rho\sigma_{i}(\overline{X}))$ with [31, Corollary 23.5.1], we obtain

[TABLE]

where the second inclusion is due to $\overline{w}_{i}\in[0,1]$ and $\partial\psi(0)=\partial\widehat{\psi}(0)$ . By the definition of $\widehat{\Psi}$ , it is not hard to obtain $\rho\overline{X}\in\partial(\widehat{\Psi}\circ\sigma)(\overline{W})$ . Thus, by Definition 3.3 and (47), to achieve the first part we only need to argue that $\overline{W}=\nabla(\widehat{\Psi}^{*}\circ\sigma)(\rho\overline{X})$ . Recall that $\overline{w}_{i}\in(\psi^{*})^{\prime}(\rho\sigma_{i}(\overline{X}))$ for each $i$ and $(\psi^{*})^{\prime}(s)=(\widehat{\psi}^{*})^{\prime}(s)$ for all $s\geq 0$ , we have $\overline{w}_{i}=(\widehat{\psi}^{*})^{\prime}(\rho\sigma_{i}(\overline{X}))$ for each $i$ . This along with the expression of $\widehat{\Psi}^{*}$ means that $\overline{W}=\nabla(\widehat{\Psi}^{*}\circ\sigma)(\rho\overline{X})$ .

Now suppose $\overline{X}$ is a EP-stationary point associated to $\phi\in\!\mathscr{L}$ with $\phi$ nondecreasing on $[0,1]$ . Then, there exist $\rho>0$ and $\overline{W}\in\mathbb{B}$ such that the inclusions in (44) hold. Notice that $\psi$ is nondecreasing and convex. Hence, $\widehat{\psi}$ is convex. Together with its absolute symmetry and convexity, it follows that $\widehat{\Psi}$ is absolutely symmetric and convex. From [20, Corollary 2.5] it follows that $\widehat{\Psi}\circ\sigma$ is convex over $\mathbb{R}^{m\times n}$ . From $\rho\overline{X}\in\partial(\widehat{\Psi}\circ\sigma)(\overline{W})$ , we get $\overline{W}\in\partial(\widehat{\Psi}\circ\sigma)^{*}(\rho\overline{X})$ . By the von Neumman trace inequality, it is easy to check that $(\widehat{\Psi}\circ\sigma)^{*}=\widehat{\Psi}^{*}\circ\sigma$ , and then $\overline{W}\in\partial(\widehat{\Psi}^{*}\circ\sigma)(\rho\overline{X})$ . Together with the second inclusion in (44) and Definition 3.4, we conclude that $\overline{X}$ is a DC-stationary point of (1). $\Box$

To sum up the previous discussions, we obtain the relations as shown in Figure 1, where $\theta_{1}$ is the index set defined as in (45) and $\mathscr{L}_{2}$ denotes the family of those $\phi\in\!\mathscr{L}$ that is nondecreasing on $[0,1]$ . We see that the set of R-stationary points is almost same as that of M-stationary points and includes that of EP-stationary points under the rank condition $|\theta_{1}|={\rm rank}(\overline{X})$ , while for some $\phi$ the set of EP-stationary points coincides with that of DC-stationary points, for example, the following special $\phi$ .

Example 3.1

Let $\phi(t)=\frac{a-1}{a+1}t^{2}+\frac{2}{a+1}t\ (a>1)$ for $t\in\mathbb{R}$ . Clearly, $\phi\in\!\mathscr{L}_{1}\cap\mathscr{L}_{2}$ . Also,

[TABLE]

After an elementary calculation, the conjugate $\psi^{*}$ and $\widehat{\psi}^{*}$ of $\psi$ and $\widehat{\psi}$ take the form of

[TABLE]

It is easy to check that $\phi$ satisfies the conditions in (48) and is nondecreasing in $[0,1]$ .

4 M-stationary point of MPSCCC

In Section 3.2, the MPEC (1) is the key to characterize the M-stationary point of (1). When $\Omega\subseteq\mathbb{S}_{+}^{n}$ , it corresponds to (3.2) which is a special case of the following MPSCCC

[TABLE]

where $\Omega_{x}\subseteq\mathbb{X}$ and $\Omega_{y}\subseteq\mathbb{Y}$ are the closed sets, $\varphi\!:\mathbb{X}\times\mathbb{Y}\to\mathbb{R}$ and $f,g\!:\mathbb{X}\times\mathbb{Y}\to\mathbb{S}^{n}$ are smooth functions. For this class of problems, since the Robinson CQ does not hold, it is common to seek an M-stationary point which is weaker than the classical KKT point (also called the strong stationary point). In this section, we shall provide a weaker condition for a local minimizer of (4) to be the M-stationary point. For this purpose, we need the multifunction $\mathcal{\widetilde{M}}\!:\mathbb{X}\times\mathbb{Y}\times\mathbb{S}^{n}\times\mathbb{S}^{n}\rightrightarrows\mathbb{X}\times\mathbb{Y}$ defined as follows:

[TABLE]

By [5, Proposition 2.1 and Theorem 2.1], it is immediate to have the following result.

Theorem 4.1

Let $(\overline{x},\overline{y})$ be a local minimizer of (4). If the perturbed mapping $\mathcal{\widetilde{M}}$ is calm at the origin for $(\overline{x},\overline{y})$ , then $(\overline{x},\overline{y})$ is an M-stationary point of the problem (4).

By [11, Corollary 1], one may achieve the calmness of $\widetilde{M}$ at the origin for $(\overline{x},\overline{y})$ by the directional limiting normal cone to ${\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}$ . That is, the following result holds.

Theorem 4.2

Consider an arbitrary $(\overline{x},\overline{y})\in\mathcal{\widetilde{M}}(0,0,0,0)$ . If for any $0\neq w=(w_{1};w_{2})\in\!\mathcal{T}_{\Omega_{x}}(\overline{x})\times\mathcal{T}_{\Omega_{y}}(\overline{y})$ such that $\left(\begin{matrix}f^{\prime}(\overline{x},\overline{y})\\ g^{\prime}(\overline{x},\overline{y})\end{matrix}\right)w\in\!\mathcal{T}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}(f(\overline{x},\overline{y}),g(\overline{x},\overline{y})),$ the implication holds:

[TABLE]

then the multifunction $\widetilde{M}$ is calm at the origin for $(\overline{x},\overline{y})$ .

Remark 4.1

(i)* Notice that $(\Delta x,\Delta y)\in D\mathcal{\widetilde{M}}((0,0,0,0)|(\overline{x},\overline{y}))(\Delta u,\Delta v,\Delta\xi,\Delta\eta)$ iff*

[TABLE]

Together with Lemma 2.2, there is no nonzero $w=(w_{1};w_{2})\in(\mathcal{T}_{\Omega_{x}}(\overline{x})\times\mathcal{T}_{\Omega_{y}}(\overline{y}))$ such that $\left(\begin{matrix}f^{\prime}(\overline{x},\overline{y})\\ g^{\prime}(\overline{x},\overline{y})\end{matrix}\right)w\in\mathcal{T}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}(f(\overline{x},\overline{y}),g(\overline{x},\overline{y}))$ if and only if $\mathcal{\widetilde{M}}$ is isolated calmness at the origin for $(\overline{x},\overline{y})$ . Thus, Theorem 4.2 is stating that if $\mathcal{\widetilde{M}}$ is not isolated calm and the implication in (55) holds, then $\mathcal{\widetilde{M}}$ is necessarily calm at the origin for $(\overline{x},\overline{y})$ .

(ii)* Notice that $(\Delta u,\Delta v,\Delta\xi,\Delta\eta)\in D^{*}\mathcal{\widetilde{M}}((\overline{x},\overline{y})|(0,0,0,0))(-\Delta x,-\Delta y)$ if and only if*

[TABLE]

Together with Lemma 2.1, the Aubin property of $\mathcal{\widetilde{M}}$ is equivalent to the implication

[TABLE]

Since $\mathcal{N}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}((f(\overline{x},\overline{y}),g(\overline{x},\overline{y}));(d_{1},d_{2}))\subset\!\mathcal{N}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}(f(\overline{x},\overline{y}),g(\overline{x},\overline{y}))$ for $(d_{1},d_{2})\in\!\mathbb{S}^{n}\times\mathbb{S}^{n}$ , the implication in (55) is weaker than the one in (62) which is precisely the M-stationary point condition given in **[5, Theorem 6.1(i)]**. For the characterization of the directional normal cone $\mathcal{N}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}((f(\overline{x},\overline{y}),g(\overline{x},\overline{y}));(d_{1},d_{2}))$ , the reader refers to Appendix.

To close this section, we illustrate Theorem 4.2 by the following special example

[TABLE]

where $C={\rm Diag}(1,0,0)$ , $D={\rm Diag}(0,0,-1)$ , and $\mathcal{A}\!:\mathbb{R}^{3}\to\mathbb{S}^{3}$ is the linear mapping

[TABLE]

Consider $\overline{x}=(0,0,0)^{\mathbb{T}}$ and $\overline{y}=(0,0,0)^{\mathbb{T}}$ . Write $X:=f(\overline{x},\overline{y})$ and $Y:=g(\overline{x},\overline{y})$ . Clearly, $(X,Y)=(C,D)\in{\rm gph}\mathbb{S}_{+}^{3}$ . Moreover, the index sets $\alpha,\beta$ and $\gamma$ defined by (66) with $A=X+Y$ satisfy $\alpha=\{1\},\beta=\{2\}$ and $\gamma=\{3\}.$ Fix an arbitrary $0\neq w=(w_{1};w_{2})\in\mathbb{R}^{3}\times\mathbb{R}^{3}$ with $w_{1}=(w_{11},w_{12},w_{13})^{\mathbb{T}}$ and $w_{2}=(w_{21},w_{22},w_{23})^{\mathbb{T}}$ such that $(G,H)\in\!\mathcal{T}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}(f(\overline{x},\overline{y}),g(\overline{x},\overline{y}))$ , where $G=f_{x}^{\prime}(\overline{x},\overline{y})w_{1}+f_{y}^{\prime}(\overline{x},\overline{y})w_{2}$ and $H=g_{x}^{\prime}(\overline{x},\overline{y})w_{1}+g_{y}^{\prime}(\overline{x},\overline{y})w_{2}$ . Since $(G,H)\in\!\mathcal{T}_{{\rm gph}\mathcal{N}_{\mathbb{S}^{n}_{+}}}(f(\overline{x},\overline{y}),g(\overline{x},\overline{y}))$ , by the expressions of $f$ and $g$ it is not hard to obtain

[TABLE]

with $0\leq w_{12}\perp w_{22}\leq 0$ and $w_{12}+w_{22}\neq 0$ . Then $B:=\!P_{\beta}^{\mathbb{T}}(G\!+\!H)P_{\beta}=w_{12}+w_{22}$ . Let $(d_{1},d_{2})\in\mathbb{R}^{3}\times\mathbb{R}^{3}$ and $(\Lambda,\Delta)\in\mathbb{S}^{3}\times\mathbb{S}^{3}$ satisfy the conditions on the left hand side of (55). Since $\Omega_{x}=\Omega_{y}=\mathbb{R}^{3}$ , we have $(d_{1},d_{2})=0$ . Thus,

[TABLE]

Case 1: $w_{12}>0$ . Now we have $w_{22}=0$ , and the index sets $\pi,\delta$ and $\nu$ defined by (67) with $B$ satisfy $\pi=\{1\},\delta=\emptyset$ and $\nu=\emptyset$ . From Theorem 1 in Appendix, it follows that $(\Lambda,\Delta)\in\!\mathcal{N}_{{\rm gph}\mathcal{N}_{\mathbb{S}_{+}^{3}}}((X,Y);(G,H))$ if and only if

[TABLE]

Together with (64) and (65), we get $\Lambda=0$ and $\Delta=0$ . Thus, $(d_{1},d_{2},\Lambda,\Delta)=(0,0,0,0)$ .

Case 2: $w_{12}=0$ . Now we have $w_{22}<0$ , and consequently the index sets $\pi,\delta$ and $\nu$ defined by (67) with $B$ satisfy $\pi=\emptyset,\delta=\emptyset$ and $\nu=\{1\}$ . From Theorem 1 in Appendix, it follows that $(\Lambda,\Delta)\in\!\mathcal{N}_{{\rm gph}\mathcal{N}_{\mathbb{S}_{+}^{3}}}((X,Y);(G,H))$ if and only if

[TABLE]

Together with (64) and (65), we get $\Lambda=0$ and $\Delta=0$ . Thus, $(d_{1},d_{2},\Lambda,\Delta)=(0,0,0,0)$ .

The above arguments show that the implication (55) holds, and then the condition in Theorem 4.2 is satisfied. Thus, the global minimizer $(\overline{x},\overline{y})$ is a M-stationary point of (4), but by [5, Theorem 6.1(i)] we can not judge whether $(\overline{x},\overline{y})$ is a M-stationary or not.

Acknowledgement This work is supported by the National Natural Science Foundation of China under project No.11571120 and No.11701186.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. J. Bi and S. H. Pan , Multistage convex relaxation approach to rank regularized minimizataion problems based on equivalent mathematical program with a generalized complementarity constraint , SIAM Journal on Control and Optimization, 55(2017): 2493-2518.
2[2] O. P. Burdakov, C. Kanzow and A. Schwartz , Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method , SIAM Journal of Optimization, 26(2016): 397-425.
3[3] M. A. Davenport and J. Romberg , An overview of low-rank matrix recovery from incomplete observations , IEEE Journal of Selected Topics in Signal Processing, 10(2016): 608-622.
4[4] C. Ding, D. F. Sun and K. C. Toh , An introduction to a class of matrix cone programming , Mathematical Programming, 144(2014): 141-179.
5[5] C. Ding, D. F. Sun and J. J. Ye , First order optimality conditions for mathematical programs with semidefinite cone complementarity constraints , Mathematical Programming, 144(2014): 141-179.
6[6] A. L. Dontchev and R. T. Rockafellar , Implicit Functions and Solution Mappings , Springer Monographs in Mathematics, LLC, New York, 2009.
7[7] M. Fazel , Matrix Rank Minimization with Applications , Stanford University, Ph D thesis, 2002.
8[8] M. Fazel, H. Hindi and S. Boyd , Log-det heuirstic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , American Control Conference, 2003. Proceedings of the 2003, 3: 2156-2162.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Several classes of stationary points for rank regularized minimization problems

Abstract

1 Introduction

2 Notation and preliminaries

2.1 Normal cones and generalized differentials

2.2 Lipschitz-like properties of multifunctions

Definition 2.1

Definition 2.2

Lemma 2.1

Lemma 2.2

2.3 Coderivative of the subdifferential mapping ∂∥⋅∥∗\partial\|\cdot\|_{*}∂∥⋅∥∗​

Lemma 2.3

3 Four classes of stationary points and their relations

3.1 R-stationary point

Definition 3.1

Remark 3.1

3.2 M-stationary point

Definition 3.2

Remark 3.2

Proposition 3.1

Theorem 3.1

Proposition 3.2

Remark 3.3

3.3 EP-stationary points

Definition 3.3

Remark 3.4

Proposition 3.3

Remark 3.5

3.4 DC-stationary point

Definition 3.4

Theorem 3.2

Example 3.1

4 M-stationary point of MPSCCC

Theorem 4.1

Theorem 4.2

Remark 4.1

2.3 Coderivative of the subdifferential mapping $\partial\|\cdot\|_{*}$