Low-rank matrix recovery with Ky Fan 2-k-norm

Xuan Vinh Doan; Stephen Vavasis

arXiv:1904.05590·math.OC·April 12, 2019·J. Glob. Optim.

Low-rank matrix recovery with Ky Fan 2-k-norm

Xuan Vinh Doan, Stephen Vavasis

PDF

TL;DR

This paper introduces Ky Fan 2-k-norm models for low-rank matrix recovery, utilizing a difference of convex algorithm to enhance recoverability, with promising numerical results demonstrating effectiveness.

Contribution

The paper presents a novel Ky Fan 2-k-norm-based approach and a DCA algorithm for nonconvex low-rank matrix recovery, improving upon existing methods.

Findings

01

High recoverability rates achieved in numerical experiments

02

Effective application of Ky Fan 2-k-norm models to matrix recovery

03

Successful implementation of DCA for nonconvex optimization

Abstract

We propose Ky Fan 2-k-norm-based models for the nonconvex low-rank matrix recovery problem. A general difference of convex algorithm (DCA) is developed to solve these models. Numerical results show that the proposed models achieve high recoverability rates.

Equations50

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{X}}}&\left\lVert{\boldsymbol{X}}\right\rVert_{*}\\ \mathop{\rm s.t.}&{\cal{A}}({\boldsymbol{X}})=\mbox{\boldmath$b$},\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{X}}}&\left\lVert{\boldsymbol{X}}\right\rVert_{*}\\ \mathop{\rm s.t.}&{\cal{A}}({\boldsymbol{X}})=\mbox{\boldmath$b$},\end{array}

\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}=\left(\sum_{i=1}^{k}\sigma_{i}^{2}(\mbox{\boldmath$A$})\right)^{\frac{1}{2}},

\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}=\left(\sum_{i=1}^{k}\sigma_{i}^{2}(\mbox{\boldmath$A$})\right)^{\frac{1}{2}},

\begin{array}[]{rl}\displaystyle\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star}=\max_{{\boldsymbol{X}}}&\langle\mbox{\boldmath$A$},{\boldsymbol{X}}\rangle\\ \mathop{\rm s.t.}&\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}\leq 1.\end{array}

\begin{array}[]{rl}\displaystyle\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star}=\max_{{\boldsymbol{X}}}&\langle\mbox{\boldmath$A$},{\boldsymbol{X}}\rangle\\ \mathop{\rm s.t.}&\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}\leq 1.\end{array}

\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}=\left(\sum_{i=1}^{k}\sigma_{i}^{2}(\mbox{\boldmath$A$})\right)^{\frac{1}{2}}\leq\left\lVert\mbox{\boldmath$A$}\right\rVert_{F}=\left(\sum_{i=1}^{\min\{m,n\}}\sigma_{i}^{2}(\mbox{\boldmath$A$})\right)^{\frac{1}{2}},

\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}=\left(\sum_{i=1}^{k}\sigma_{i}^{2}(\mbox{\boldmath$A$})\right)^{\frac{1}{2}}\leq\left\lVert\mbox{\boldmath$A$}\right\rVert_{F}=\left(\sum_{i=1}^{\min\{m,n\}}\sigma_{i}^{2}(\mbox{\boldmath$A$})\right)^{\frac{1}{2}},

\left\lVert\mbox{\boldmath$A$}\right\rVert_{F}^{2}=\langle\mbox{\boldmath$A$},\mbox{\boldmath$A$}\rangle\leq\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star}\cdot\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}.

\left\lVert\mbox{\boldmath$A$}\right\rVert_{F}^{2}=\langle\mbox{\boldmath$A$},\mbox{\boldmath$A$}\rangle\leq\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star}\cdot\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}.

\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}\leq\left\lVert\mbox{\boldmath$A$}\right\rVert_{F}\leq\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star},\quad k\leq\min\{m,n\}.

\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}\leq\left\lVert\mbox{\boldmath$A$}\right\rVert_{F}\leq\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star},\quad k\leq\min\{m,n\}.

\begin{array}[]{rl}\min&\displaystyle\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}\right\rVert_{F}}\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$},\end{array}

\begin{array}[]{rl}\min&\displaystyle\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}\right\rVert_{F}}\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$},\end{array}

\begin{array}[]{rl}\min&\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\left\lVert{\boldsymbol{X}}\right\rVert_{F}\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$}.\end{array}

\begin{array}[]{rl}\min&\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\left\lVert{\boldsymbol{X}}\right\rVert_{F}\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$}.\end{array}

(1 - δ_{k} (A)) ∥ X ∥_{F} \leq ∥ A (X) ∥_{2} \leq (1 + δ_{k} (A)) ∥ X ∥_{F}

(1 - δ_{k} (A)) ∥ X ∥_{F} \leq ∥ A (X) ∥_{2} \leq (1 + δ_{k} (A)) ∥ X ∥_{F}

\begin{array}[]{rl}\displaystyle\max_{{\boldsymbol{Z}},z}&\left\lVert\boldsymbol{Z}\right\rVert_{F}^{2}\\ \mathop{\rm s.t.}&\lVert\mkern-2.0mu|\boldsymbol{Z}|\mkern-2.0mu\rVert_{k,2}^{\star}\leq 1,\\ &{\cal A}(\boldsymbol{Z})-z\cdot\mbox{\boldmath$b$}=\mbox{\boldmath$0$},\\ &z>0,\end{array}

\begin{array}[]{rl}\displaystyle\max_{{\boldsymbol{Z}},z}&\left\lVert\boldsymbol{Z}\right\rVert_{F}^{2}\\ \mathop{\rm s.t.}&\lVert\mkern-2.0mu|\boldsymbol{Z}|\mkern-2.0mu\rVert_{k,2}^{\star}\leq 1,\\ &{\cal A}(\boldsymbol{Z})-z\cdot\mbox{\boldmath$b$}=\mbox{\boldmath$0$},\\ &z>0,\end{array}

Z, z min δ_{Z} (Z, z) - ∥ Z ∥_{F}^{2} /2,

Z, z min δ_{Z} (Z, z) - ∥ Z ∥_{F}^{2} /2,

\begin{array}[]{rl}\displaystyle\max_{{\boldsymbol{Z}},z}&\langle\boldsymbol{Z}^{s},\boldsymbol{Z}\rangle\\ \mathop{\rm s.t.}&\lVert\mkern-2.0mu|\boldsymbol{Z}|\mkern-2.0mu\rVert_{k,2}^{\star}\leq 1\\ &{\cal A}(\boldsymbol{Z})-z\cdot\mbox{\boldmath$b$}=\mbox{\boldmath$0$}\\ &z>0.\end{array}

\begin{array}[]{rl}\displaystyle\max_{{\boldsymbol{Z}},z}&\langle\boldsymbol{Z}^{s},\boldsymbol{Z}\rangle\\ \mathop{\rm s.t.}&\lVert\mkern-2.0mu|\boldsymbol{Z}|\mkern-2.0mu\rVert_{k,2}^{\star}\leq 1\\ &{\cal A}(\boldsymbol{Z})-z\cdot\mbox{\boldmath$b$}=\mbox{\boldmath$0$}\\ &z>0.\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{Y}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}^{2}}\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$}.\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{Y}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}^{2}}\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$}.\end{array}

(\frac{X ^{s} + Y}{∥ ∣ X ^{s} + Y ∣ ∥ _{k, 2}^{⋆}}, \frac{1}{∥ ∣ X ^{s} + Y ∣ ∥ _{k, 2}^{⋆}}) \in Z .

(\frac{X ^{s} + Y}{∥ ∣ X ^{s} + Y ∣ ∥ _{k, 2}^{⋆}}, \frac{1}{∥ ∣ X ^{s} + Y ∣ ∥ _{k, 2}^{⋆}}) \in Z .

∥ ∣ X^{s} + Y ∣ ∥_{k, 2}^{⋆} - \frac{∥ ∣ X ^{s} ∣ ∥ _{k, 2}^{⋆}}{∥ X ^{s} ∥ _{F}^{2}} \cdot ⟨ X^{s}, Y ⟩ \geq ∥ ∣ X^{s} ∣ ∥_{k, 2}^{⋆} .

∥ ∣ X^{s} + Y ∣ ∥_{k, 2}^{⋆} - \frac{∥ ∣ X ^{s} ∣ ∥ _{k, 2}^{⋆}}{∥ X ^{s} ∥ _{F}^{2}} \cdot ⟨ X^{s}, Y ⟩ \geq ∥ ∣ X^{s} ∣ ∥_{k, 2}^{⋆} .

\begin{array}[]{rl}\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{Y}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\frac{1}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}}\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$}.\end{array}\end{array}

\begin{array}[]{rl}\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{Y}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\frac{1}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}}\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$}.\end{array}\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{Y}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\alpha({\boldsymbol{X}}^{s})\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$},\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{Y}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\alpha({\boldsymbol{X}}^{s})\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$},\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{X}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\alpha({\boldsymbol{X}}^{s})\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{X}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$}.\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{X}}}&\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\alpha({\boldsymbol{X}}^{s})\cdot\langle{\boldsymbol{X}}^{s},{\boldsymbol{X}}\rangle\\ \mathop{\rm s.t.}&{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$}.\end{array}

\frac{1}{∥ X ∥ _{F}} \leq α (X) \leq \frac{∥ ∣ X ^{s} ∣ ∥ _{k, 2}^{⋆}}{∥ X ^{s} ∥ _{F}^{2}} .

\frac{1}{∥ X ∥ _{F}} \leq α (X) \leq \frac{∥ ∣ X ^{s} ∣ ∥ _{k, 2}^{⋆}}{∥ X ^{s} ∥ _{F}^{2}} .

\partial\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star}=\arg\max_{{\boldsymbol{X}}:\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}\leq 1}\langle{\boldsymbol{X}},\mbox{\boldmath$A$}\rangle,

\partial\lVert\mkern-2.0mu|\mbox{\boldmath$A$}|\mkern-2.0mu\rVert_{k,2}^{\star}=\arg\max_{{\boldsymbol{X}}:\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}\leq 1}\langle{\boldsymbol{X}},\mbox{\boldmath$A$}\rangle,

∥ ∣ X^{s} + Y ∣ ∥_{k, 2}^{⋆} - ∥ ∣ X^{s} ∣ ∥_{k, 2}^{⋆} \geq ⟨ α (X^{s}) \cdot X^{s}, Y ⟩ .

∥ ∣ X^{s} + Y ∣ ∥_{k, 2}^{⋆} - ∥ ∣ X^{s} ∣ ∥_{k, 2}^{⋆} \geq ⟨ α (X^{s}) \cdot X^{s}, Y ⟩ .

\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}\geq\langle\alpha({\boldsymbol{X}}^{s})\cdot{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle,\quad\forall\,{\boldsymbol{Y}}\,:\,{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$}.

\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}\geq\langle\alpha({\boldsymbol{X}}^{s})\cdot{\boldsymbol{X}}^{s},{\boldsymbol{Y}}\rangle,\quad\forall\,{\boldsymbol{Y}}\,:\,{\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$0$}.

\begin{array}[]{rl}\lVert\mkern-2.0mu|\mbox{\boldmath$X$}|\mkern-2.0mu\rVert_{k,2}^{\star}=\min&p+\mbox{trace}(\mbox{\boldmath$R$})\\ \mathop{\rm s.t.}&kp-\mbox{trace}(\mbox{\boldmath$P$})=0,\\ &p\mbox{\boldmath$I$}-\mbox{\boldmath$P$}\succeq 0,\\ &\begin{pmatrix}\mbox{\boldmath$P$}&-\frac{1}{2}\mbox{\boldmath$X$}^{T}\\ -\frac{1}{2}\mbox{\boldmath$X$}&\mbox{\boldmath$R$}\end{pmatrix}\succeq 0.\end{array}

\begin{array}[]{rl}\lVert\mkern-2.0mu|\mbox{\boldmath$X$}|\mkern-2.0mu\rVert_{k,2}^{\star}=\min&p+\mbox{trace}(\mbox{\boldmath$R$})\\ \mathop{\rm s.t.}&kp-\mbox{trace}(\mbox{\boldmath$P$})=0,\\ &p\mbox{\boldmath$I$}-\mbox{\boldmath$P$}\succeq 0,\\ &\begin{pmatrix}\mbox{\boldmath$P$}&-\frac{1}{2}\mbox{\boldmath$X$}^{T}\\ -\frac{1}{2}\mbox{\boldmath$X$}&\mbox{\boldmath$R$}\end{pmatrix}\succeq 0.\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{X}}}&\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}\\ \mathop{\rm s.t.}&{\cal{A}}({\boldsymbol{X}})=\mbox{\boldmath$b$}.\end{array}

\begin{array}[]{rl}\displaystyle\min_{{\boldsymbol{X}}}&\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}\\ \mathop{\rm s.t.}&{\cal{A}}({\boldsymbol{X}})=\mbox{\boldmath$b$}.\end{array}

{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$}\,\Leftrightarrow\,\langle\mbox{\boldmath$A$}_{i},{\boldsymbol{X}}\rangle=\langle\mbox{\boldmath$A$}_{i},\mbox{\boldmath$M$}\rangle=b_{i},\quad i=1,\ldots,s.

{\cal A}({\boldsymbol{X}})=\mbox{\boldmath$b$}\,\Leftrightarrow\,\langle\mbox{\boldmath$A$}_{i},{\boldsymbol{X}}\rangle=\langle\mbox{\boldmath$A$}_{i},\mbox{\boldmath$M$}\rangle=b_{i},\quad i=1,\ldots,s.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Operations Group, Warwick Business School, University of Warwick, Coventry, CV4 7AL, United Kingdom

11email: [email protected] 22institutetext: The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, United Kingdom 33institutetext: Combinatorics and Optimization, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada

33email: [email protected]

Low-rank matrix recovery with Ky Fan $2$ - $k$ -norm††thanks: This work is partially supported by the Alan Turing Fellowship of the first author.

Xuan Vinh Doan 1122

Stephen Vavasis 33

Abstract

We propose Ky Fan $2$ - $k$ -norm-based models for the non-convex low-rank matrix recovery problem. A general difference of convex algorithm (DCA) is developed to solve these models. Numerical results show that the proposed models achieve high recoverability rates.

Keywords:

Rank minimization Ky Fan $2$ - $k$ -norm Matrix recovery.

1 Introduction

Matrix recovery problem concerns the construction of a matrix from incomplete information of its entries. This problem has a wide range of applications such as recommendation systems with incomplete information of users’ ratings or sensor localization problem with partially observed distance matrices (see, e.g., [3]). In these applications, the matrix is usually known to be (approximately) low-rank. Finding these low-rank matrices are theoretically difficult due to their non-convex properties. Computationally, it is important to study the tractability of these problems given the large scale of datasets considered in practical applications. Recht et al. [11] studied the low-rank matrix recovery problem using a convex relaxation approach which is tractable. More precisely, in order to recover a low-rank matrix $\mbox{\boldmath$ X $}\in\mathbb{R}^{m\times n}$ which satisfy ${\cal A}(\mbox{\boldmath$ X $})=\mbox{\boldmath$ b $}$ , where the linear map ${\cal A}:\mathbb{R}^{m\times n}\rightarrow\mathbb{R}^{p}$ and $\mbox{\boldmath$ b $}\in\mathbb{R}^{p}$ , $\mbox{\boldmath$ b $}\neq\mbox{\boldmath$ 0 $}$ , are given, the following convex optimization problem is proposed:

[TABLE]

where $\displaystyle\left\lVert{\boldsymbol{X}}\right\rVert_{*}=\sum_{i}\sigma_{i}({\boldsymbol{X}})$ is the nuclear norm, the sum of all singular values of ${\boldsymbol{X}}$ . Recht et al. [11] showed the recoverability of this convex approach using some restricted isometry conditions of the linear map $\cal A$ . In general, these restricted isometry conditions are not satisfied and the proposed convex relaxation can fail to recover the matrix ${\boldsymbol{X}}$ .

Low-rank matrices appear to be appropriate representations of data in other applications such as biclustering of gene expression data. Doan and Vavasis [5] proposed a convex approach to recover low-rank clusters using dual Ky Fan $2$ - $k$ -norm instead of the nuclear norm. Ky Fan $2$ - $k$ -norm is defined as

[TABLE]

where $\sigma_{1}\geq\ldots\sigma_{k}\geq 0$ are the first $k$ largest singular values of $A$ , $k\leq k_{0}=\mbox{rank}(\mbox{\boldmath$ A $})$ . The dual norm of the Ky Fan $2$ - $k$ -norm is denoted by $\lVert\mkern-2.0mu|\,\cdot\,|\mkern-2.0mu\rVert_{k,2}^{\star}$ ,

[TABLE]

These unitarily invariant norms (see, e.g., Bhatia [2]) and their gauge functions have been used in sparse prediction problems [1], low-rank regression analysis [6] and multi-task learning regularization [7]. When $k=1$ , the Ky Fan $2$ - $k$ -norm is the spectral norm, $\left\lVert\mbox{\boldmath$ A $}\right\rVert=\sigma_{1}(\mbox{\boldmath$ A $})$ , the largest singular value of $A$ , whose dual norm is the nuclear norm. Similar to the nuclear norm, the dual Ky Fan $2$ - $k$ -norm with $k>1$ can be used to compute the $k$ -approximation of a matrix $A$ (Proposition 2.9, [5]), which demonstrates its low-rank property. Motivated by this low-rank property of the (dual) Ky Fan $2$ - $k$ -norm, which is more general than that of the nuclear norm, and its usage in other applications, in this paper, we propose a Ky Fan $2$ - $k$ -norm-based non-convex approach for the matrix recovery problem which aims to recover matrices which are not recoverable by the convex relaxation formulation $(\ref{eq:nuc})$ . In Section 2, we discuss the proposed models in detail and in Section 3, we develop numerical algorithms to solve those models. Some numerical results will also be presented.

2 Ky Fan $2$ - $k$ -Norm-Based Models

The Ky Fan $2$ - $k$ -norm is the $\ell_{2}$ -norm of the vector of $k$ largest singular values with $k\leq\min\{m,n\}$ . Thus we have:

[TABLE]

where $\left\lVert\cdot\right\rVert$ is the Frobenius norm. Now consider the dual Ky Fan $2$ - $k$ -norm and use the definition of the dual norm, we obtain the following inequality:

[TABLE]

Thus we have:

[TABLE]

It is clear that these inequalities become equalities if and only if $\text{rank}(\mbox{\boldmath$ A $})\leq k$ . It shows that to find a low-rank matrix ${\boldsymbol{X}}$ that satisfies ${\cal A}({\boldsymbol{X}})=\mbox{\boldmath$ b $}$ with $\text{rank}({\boldsymbol{X}})\leq k$ , we can solve either the following optimization problem

[TABLE]

or

[TABLE]

It is straightforward to see that these non-convex optimization problems can be used to recover low-rank matrices as stated in the following theorem given the norm inequalities in $(\ref{eq:normin})$ .

Theorem 2.1

If there exists a matrix ${\boldsymbol{X}}\in\mathbb{R}^{m\times n}$ such that $\mbox{rank}({\boldsymbol{X}})\leq k$ and ${\cal A}({\boldsymbol{X}})=\mbox{\boldmath$ b $}$ , then ${\boldsymbol{X}}$ is an optimal solution of $(\ref{eq:ratio})$ and $(\ref{eq:diff})$ .

Given the result in Theorem 2.1, the exact recovery of a low-rank matrix using $(\ref{eq:ratio})$ or $(\ref{eq:diff})$ relies on the uniqueness of the low-rank solution of ${\cal A}({\boldsymbol{X}})=\mbox{\boldmath$ b $}$ . Recht et al. [11] generalized the restricted isometry property of vectors introduced by Candès and Tao [4] to matrices and use it to provide sufficient conditions on the uniqueness of these solutions.

Definition 1 (Recht et al. [11])

For every integer $k$ with $1\leq k\leq\min\{m,n\}$ , the $k$ -restricted isometry constant is defined as the smallest number $\delta_{k}({\cal A})$ such that

[TABLE]

holds for all matrices ${\boldsymbol{X}}$ of rank at most $k$ .

Using Theorem 3.2 in Recht et al. [11], we can obtain the following exact recovery result for $(\ref{eq:ratio})$ and $(\ref{eq:diff})$ .

Theorem 2.2

Suppose that $\delta_{2k}<1$ and there exists a matrix ${\boldsymbol{X}}\in\mathbb{R}^{m\times n}$ which satisfies ${\cal A}({\boldsymbol{X}})=\mbox{\boldmath$ b $}$ and $\text{rank}({\boldsymbol{X}})\leq k$ , then ${\boldsymbol{X}}$ is the unique solution to $(\ref{eq:ratio})$ and $(\ref{eq:diff})$ , which implies exact recoverability.

The condition in Theorem 2.2 is indeed better than those obtained for the nuclear norm approach (see, e.g., Theorem 3.3 in Recht et al. [11]). The non-convex optimization problems $(\ref{eq:ratio})$ and $(\ref{eq:diff})$ use norm ratio and difference. When $k=1$ , the norm ratio and difference are computed between the nuclear and Frobenius norm. The idea of using these norm ratios and differences with $k=1$ has been used to generate non-convex sparse generalizer in the vector case, i.e., $m=1$ . Yin et al. [13] investigated the ratio $\ell_{1}/\ell_{2}$ while Yin et al. [14] analyzed the difference $\ell_{1}-\ell_{2}$ in compressed sensing. Note that even though optimization formulations based on these norm ratios and differences are non-convex, they are still relaxations of $\ell_{0}$ -norm minimization problem unless the sparsity level of the optimal solution is $s=1$ . Our proposed approach is similar to the idea of the truncated difference of the nuclear norm and Frobenius norm discussed in Ma et al [8]. Given a parameter $t\geq 0$ , the truncated difference is defined as $\displaystyle\left\lVert\mbox{\boldmath$ A $}\right\rVert_{*,t-F}=\sum_{i=t+1}^{\min\{m,n\}}\sigma_{i}(\mbox{\boldmath$ A $})-\left(\sum_{i=t+1}^{\min\{m,n\}}\sigma_{i}^{2}(\mbox{\boldmath$ A $})\right)^{\frac{1}{2}}\geq 0$ . For $t\geq k-1$ , the problem of truncated difference minimization can be used to recover matrices with rank at most $k$ given that $\left\lVert{\boldsymbol{X}}\right\rVert_{*,t-F}=0$ if $\text{rank}({\boldsymbol{X}})\leq t+1$ . Similar results for exact recovery as in Theorem 2.2 are provided in Theorem 3.7(a) in Ma et al [8]. Despite the similarity with respect to the recovery results, the problems $(\ref{eq:ratio})$ and $(\ref{eq:diff})$ are motivated from a different perspective. We are now going to discuss how to solve these problems next.

3 Numerical Algorithm

3.1 Difference of Convex Algorithms

We start with the problem $(\ref{eq:ratio})$ . It can be reformulated as

[TABLE]

with the change of variables $z=1/\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}$ and $\boldsymbol{Z}={\boldsymbol{X}}/\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}$ . The compact formulation

[TABLE]

where $\cal Z$ is the feasible set of the problem $(\ref{eq:zprob})$ and $\delta_{{\cal Z}}(\cdot)$ is the indicator function of $\cal Z$ . The problem $(\ref{eq:czprob})$ is a difference of convex (d.c.) optimization problem (see, e.g. [9]). The differnce of convex algorithm DCA proposed in [9] can be applied to the problem $(\ref{eq:czprob})$ as follows.

Step 1. Start with $(\boldsymbol{Z}^{0},z^{0})=({\boldsymbol{X}}^{0}/\lVert\mkern-2.0mu|{\boldsymbol{X}}^{0}|\mkern-2.0mu\rVert_{k,2}^{\star},1/\lVert\mkern-2.0mu|{\boldsymbol{X}}^{0}|\mkern-2.0mu\rVert_{k,2}^{\star})$ for some ${\boldsymbol{X}}^{0}$ such that ${\cal A}({\boldsymbol{X}}^{0})=\mbox{\boldmath$ b $}$ and set $s=0$ .

Step 2. Update $(\boldsymbol{Z}^{s+1},z^{s+1})$ as an optimal solution of the following convex optimization problem

[TABLE]

Step 3. Set $s\leftarrow s+1$ and repeat Step 2.

Let ${\boldsymbol{X}}^{s}=\boldsymbol{Z}^{s}/z^{s}$ and use the general convergence analysis of DCA (see, e.g., Theorem 3.7 in [10]), we can obtain the following convergence results.

Proposition 1

Given the sequence $\{{\boldsymbol{X}}^{s}\}$ obtained from the DCA algorithm for the problem $(\ref{eq:czprob})$ , the following statements are true.

(i)

The sequence $\displaystyle\left\{\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}}\right\}$ is non-increasing and convergent.

(ii)

$\displaystyle\left\lVert\frac{{\boldsymbol{X}}^{s+1}}{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s+1}|\mkern-2.0mu\rVert_{k,2}^{\star}}-\frac{{\boldsymbol{X}}^{s}}{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}}\right\rVert_{F}\rightarrow 0$ * when $s\rightarrow\infty$ .*

The convergence results show that the DCA algorithm improves the objective of the ratio minimization problem $(\ref{eq:ratio})$ . The DCA algorithm can stop if $(\boldsymbol{Z}^{s},z^{s})\in{\cal O}(\boldsymbol{Z}^{s})$ , where ${\cal O}(\boldsymbol{Z}^{s})$ is the set of optimal solution of 10 and $(\boldsymbol{Z}^{s},z^{s})$ which satisfied this condition is called a critical point. Note that (local) optimal solutions of $(\ref{eq:czprob})$ can be shown to be critical points. The following proposition shows that an equivalent condition for critical points.

Proposition 2

$(\boldsymbol{Z}^{s},z^{s})\in{\cal O}(\boldsymbol{Z}^{s})$ * if and only if ${\boldsymbol{Y}}=\mbox{\boldmath$ 0 $}$ is an optimal solution of the following optimization problem*

[TABLE]

Proof

Consider ${\boldsymbol{Y}}\in\mbox{Null}({\cal A})$ , i.e., ${\cal A}({\boldsymbol{Y}})=\mbox{\boldmath$ 0 $}$ , we then have:

[TABLE]

Clearly, $\displaystyle\langle\frac{{\boldsymbol{X}}^{s}}{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}},\frac{{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}}{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}}\rangle\leq\langle\frac{{\boldsymbol{X}}^{s}}{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}},\frac{{\boldsymbol{X}}^{s}}{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}}\rangle$ is equivalent to

[TABLE]

When ${\boldsymbol{Y}}=\mbox{\boldmath$ 0 $}$ , we achieve the equality. We have: $(\boldsymbol{Z}^{s},z^{s})\in{\cal O}(\boldsymbol{Z}^{s})$ if and only the above inequality holds for all ${\boldsymbol{Y}}\in\mbox{Null}({\cal A})$ , which means $f({\boldsymbol{Y}};{\boldsymbol{X}}^{s})\geq f(\mbox{\boldmath$ 0 $};{\boldsymbol{X}}^{s})$ for all ${\boldsymbol{Y}}\in\mbox{Null}({\cal A})$ , where $f({\boldsymbol{Y}};{\boldsymbol{X}})=\displaystyle\lVert\mkern-2.0mu|{\boldsymbol{X}}+{\boldsymbol{Y}}|\mkern-2.0mu\rVert_{k,2}^{\star}-\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}\right\rVert_{F}^{2}}\cdot\langle{\boldsymbol{X}},{\boldsymbol{Y}}\rangle$ . Clearly, it is equivalent to the fact that ${\boldsymbol{Y}}=\mbox{\boldmath$ 0 $}$ is an optimal solution of $(\ref{eq:null})$ .

The result of Proposition 2 shows the similarity between the norm ratio minimization problem $(\ref{eq:ratio})$ and the norm different minimization problem $(\ref{eq:diff})$ with respect to the implementation of the DCA algorithm. It is indeed that the problem $(\ref{eq:diff})$ is a d.c. optimization problem and the DCA algorithm can be applied as follows.

Step 1. Start with some ${\boldsymbol{X}}^{0}$ such that ${\cal A}({\boldsymbol{X}}^{0})=\mbox{\boldmath$ b $}$ and set $s=0$ .

Step 2. Update ${\boldsymbol{X}}^{s+1}={\boldsymbol{X}}^{s}+{\boldsymbol{Y}}$ , where ${\boldsymbol{Y}}$ is an optimal solution of the following convex optimization problem

[TABLE]

Step 3. Set $s\leftarrow s+1$ and repeat Step 2.

It is clear that ${\boldsymbol{X}}^{s}$ is a critical point for the problem $(\ref{eq:diff})$ if and only if ${\boldsymbol{Y}}$ is an optimal solution of $(\ref{eq:dsprob})$ . Both problems $(\ref{eq:rsprob})$ and $(\ref{eq:dsprob})$ can be written in the general form as

[TABLE]

where $\displaystyle\alpha({\boldsymbol{X}})=\frac{\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}^{2}}$ for $(\ref{eq:rsprob})$ and $\displaystyle\alpha({\boldsymbol{X}})=\frac{1}{\left\lVert{\boldsymbol{X}}^{s}\right\rVert_{F}}$ for $(\ref{eq:dsprob})$ , respectively. Given that ${\cal A}({\boldsymbol{X}}^{s})=\mbox{\boldmath$ b $}$ , this problem can be written as

[TABLE]

The following proposition shows that ${\boldsymbol{X}}^{s}$ is a critical point of the problem $(\ref{eq:xsprob})$ for many functions $\alpha(\cdot)$ if $\text{rank}({\boldsymbol{X}}^{s})\leq k$ .

Proposition 3

If $\text{rank}({\boldsymbol{X}}^{s})\leq k$ , ${\boldsymbol{X}}^{s}$ is a critical point of the problem $(\ref{eq:xsprob})$ for any function $\alpha(\cdot)$ which satisfies

[TABLE]

Proof

If $\text{rank}({\boldsymbol{X}}^{s})\leq k$ , we have: $\alpha({\boldsymbol{X}}^{s})=1/\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}$ since $\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}=\left\lVert{\boldsymbol{X}}\right\rVert_{F}=\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}$ . Given that

[TABLE]

we have: $\alpha({\boldsymbol{X}}^{s})\cdot{\boldsymbol{X}}^{s}\in\partial\lVert\mkern-2.0mu|{\boldsymbol{X}}^{s}|\mkern-2.0mu\rVert_{k,2}^{\star}$ . Thus for all ${\boldsymbol{Y}}$ , the following inequality holds:

[TABLE]

It implies ${\boldsymbol{Y}}=\mbox{\boldmath$ 0 $}$ is an optimal solution of the problem $(\ref{eq:asprob})$ since the optimality condition is

[TABLE]

Thus ${\boldsymbol{X}}^{s}$ is a critical point of the problem $(\ref{eq:xsprob})$ .

Proposition 3 shows that one can choose different functions $\alpha(\cdot)$ such as $\alpha({\boldsymbol{X}})=1/\lVert\mkern-2.0mu|{\boldsymbol{X}}|\mkern-2.0mu\rVert_{k,2}$ for the sub-problem in the general DCA framework to solve the original problem. This generalized sub-problem $(\ref{eq:xsprob})$ is a convex optimization problem, which can be formulated as a semidefinite optimization problem given the following calculation of the dual Ky Fan $2$ - $k$ -norm provided in [5]:

[TABLE]

In order to implement the DCA algorithm, one also needs to consider how to find the initial solution ${\boldsymbol{X}}^{0}$ . We can use the nuclear norm minimization problem 1, the convex relaxation of the rank minimization problem, to find ${\boldsymbol{X}}^{0}$ . A similar approach is to use the following dual Ky Fan $2$ - $k$ -norm minimization problem to find ${\boldsymbol{X}}^{0}$ given its low-rank properties:

[TABLE]

This initial problem can be considered as an instance of $(\ref{eq:xsprob})$ with ${\boldsymbol{X}}^{s}=\mbox{\boldmath$ 0 $}$ (and $\alpha(\mbox{\boldmath$ 0 $})=1$ ), which is equivalent to starting the iterative algorithm with ${\boldsymbol{X}}^{0}=\mbox{\boldmath$ 0 $}$ one step ahead. We are now ready to provide some numerical results.

3.2 Numerical Results

Similar to Candès and Recht [3], we construct the following the experiment. We generate $M$ , an $m\times n$ matrix of rank $r$ , by sampling two $m\times r$ and $n\times r$ factors $\mbox{\boldmath$ M $}_{L}$ and $\mbox{\boldmath$ M $}_{R}$ with i.i.d. Gaussian entries and setting $\mbox{\boldmath$ M $}=\mbox{\boldmath$ M $}_{L}\mbox{\boldmath$ M $}_{R}$ . The linear map $\cal A$ is constructed with $s$ independent Gaussian matrices $\mbox{\boldmath$ A $}_{i}$ whose entries follows ${\cal N}(0,1/s)$ , i.e.,

[TABLE]

We generate $K=50$ matrix $M$ with $m=50$ , $n=40$ , and $r=2$ . The dimension of these matrices is $d_{r}=r(m+n-r)=176$ . For each $M$ , we generate $s$ matrices for the random linear map with $s$ ranging from $180$ to $500$ . We set the maximum number of iterations of the algorithm to be $N_{\max}=100$ . The instances are solved using SDPT3 solver [12] for semi-definite optimization problems in Matlab. The computer used for these numerical experiments is a 64-bit Windows 10 machine with 3.70GHz quad-core CPU, and 32GB RAM. The performance measure is the relative error $\displaystyle\frac{\left\lVert{\boldsymbol{X}}-\mbox{\boldmath$ M $}\right\rVert_{F}}{\left\lVert\mbox{\boldmath$ M $}\right\rVert_{F}}$ and the threshold $\epsilon=10^{-6}$ is chosen. We run three different algorithms, nuclear used the nuclear optimization formulation $(\ref{eq:nuc})$ , k2-nuclear used the proposed iterative algorithm with initial solution obtained from $(\ref{eq:nuc})$ , and k2-zero used the same algorithm with initial solution ${\boldsymbol{X}}^{0}=\mbox{\boldmath$ 0 $}$ . Figure 1 shows recovery probabilities and average computation times (in seconds) for different sizes of the linear map.

The results show that the proposed algorithm can recover exactly the matrix $M$ with $100\%$ rate when $s\geq 250$ with both initial solutions while the nuclear norm approach cannot recover any matrix at all, i.e., $0\%$ rate, if $s\leq 300$ . k2-nuclear is slightly better than k2-zero in terms of recoverability when $s$ is small while their average computational times are almost the same in all cases. The efficiency of the proposed algorithm when $s$ is small comes with higher average computational times as compared to that of the nuclear norm approach. For example, when $s=180$ , on average, one needs $80$ iterations to reach the solution when the proposed algorithm is used instead of $1$ with the nuclear norm optimization approach. Note that the average number of iterations is computed for all cases including cases when the matix $M$ cannot be recovered. For recoverable cases, the average number of iterations is much less. For example, when $s=180$ , the average number of iterations for recoverable case is $40$ instead of $80$ . When the size of the linear map increases, the average number of iterations is decreased significantly. We only need $2$ extra iterations when $s=250$ or $1$ extra iteration on average when $s=300$ to obtain $100\%$ recover rate when the nuclear norm optimization approach still cannot recover any of the matrices ( $0\%$ rate). These results show that the proposed algorithm achieve significantly better recovery rate with a small number of extra iterations in many cases. We also test the algorithms with higher ranks including $r=5$ and $r=10$ . Figure 2 shows the results when the size of linear map is $s=\lceil 1.05d_{r}\rceil$ .

These results show that when the size of linear maps is small, the proposed algorithms are significantly better than the nuclear norm optimization approach. With $s=\lceil 1.05d_{r}\rceil$ , the recovery probability increases when $r$ increases and it is close to 1 when $r=10$ . The computational time increases when $r$ increases given that the size of the sub-problems depends on the size of the linear map. With respect to the number of iterations, it remains low. When $r=10$ , the average numbers of iterations are 22 and 26 for k2-nuclear and k2-zero, respectively. It shows that k2-nuclear is slightly better than k2-zero both in terms of recovery probability and computational time.

4 Conclusion

We have proposed non-convex models based on the dual Ky Fan $2$ - $k$ -norm for low-rank matrix recovery and developed a general DCA framework to solve the models. The computational results are promising. Numerical experiments with larger instances will be conducted with first-order algorithm development for the proposed modes as a future research direction.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the k 𝑘 k -support norm. In: NIPS. pp. 1466–1474 (2012)
2[2] Bhatia, R.: Matrix Analysis, Graduate Texts in Mathematics, vol. 169. Springer-Verlag, New York (1997)
3[3] Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Foundations of Computational mathematics 9 (6), 717–772 (2009)
4[4] Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Transactions on Information Theory 51 (12), 4203–4215 (2005)
5[5] Doan, X.V., Vavasis, S.: Finding the largest low-rank clusters with Ky Fan 2 2 2 - k 𝑘 k -norm and ℓ 1 subscript ℓ 1 \ell_{1} -norm. SIAM Journal on Optimization 26 (1), 274–312 (2016)
6[6] Giraud, C.: Low rank multivariate regression. Electronic Journal of Statistics 5 , 775–799 (2011)
7[7] Jacob, L., Bach, F., Vert, J.P.: Clustered multi-task learning: a convex formulation. In: NIPS. vol. 21, pp. 745–752 (2009)
8[8] Ma, T.H., Lou, Y., Huang, T.Z.: Truncated ℓ 1 − 2 subscript ℓ 1 2 \ell_{1-2} models for sparse recovery and rank minimization. SIAM Journal on Imaging Sciences 10 (3), 1346–1380 (2017)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Low-rank matrix recovery with Ky Fan 222-kkk-norm††thanks: This work is partially supported by the Alan Turing Fellowship of the first author.

Abstract

Keywords:

1 Introduction

2 Ky Fan 222-kkk-Norm-Based Models

Theorem 2.1

Definition 1 (Recht et al. [11])

Theorem 2.2

3 Numerical Algorithm

3.1 Difference of Convex Algorithms

Proposition 1

Proposition 2

Proof

Proposition 3

Proof

3.2 Numerical Results

4 Conclusion

Low-rank matrix recovery with Ky Fan $2$ - $k$ -norm††thanks: This work is partially supported by the Alan Turing Fellowship of the first author.

2 Ky Fan $2$ - $k$ -Norm-Based Models