Low-rank matrix recovery with Ky Fan 2-k-norm
Xuan Vinh Doan, Stephen Vavasis

TL;DR
This paper introduces Ky Fan 2-k-norm models for low-rank matrix recovery, utilizing a difference of convex algorithm to enhance recoverability, with promising numerical results demonstrating effectiveness.
Contribution
The paper presents a novel Ky Fan 2-k-norm-based approach and a DCA algorithm for nonconvex low-rank matrix recovery, improving upon existing methods.
Findings
High recoverability rates achieved in numerical experiments
Effective application of Ky Fan 2-k-norm models to matrix recovery
Successful implementation of DCA for nonconvex optimization
Abstract
We propose Ky Fan 2-k-norm-based models for the nonconvex low-rank matrix recovery problem. A general difference of convex algorithm (DCA) is developed to solve these models. Numerical results show that the proposed models achieve high recoverability rates.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Operations Group, Warwick Business School, University of Warwick, Coventry, CV4 7AL, United Kingdom
11email: [email protected] 22institutetext: The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, United Kingdom 33institutetext: Combinatorics and Optimization, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
33email: [email protected]
Low-rank matrix recovery with Ky Fan --norm††thanks: This work is partially supported by the Alan Turing Fellowship of the first author.
Xuan Vinh Doan 1122
Stephen Vavasis 33
Abstract
We propose Ky Fan --norm-based models for the non-convex low-rank matrix recovery problem. A general difference of convex algorithm (DCA) is developed to solve these models. Numerical results show that the proposed models achieve high recoverability rates.
Keywords:
Rank minimization Ky Fan --norm Matrix recovery.
1 Introduction
Matrix recovery problem concerns the construction of a matrix from incomplete information of its entries. This problem has a wide range of applications such as recommendation systems with incomplete information of users’ ratings or sensor localization problem with partially observed distance matrices (see, e.g., [3]). In these applications, the matrix is usually known to be (approximately) low-rank. Finding these low-rank matrices are theoretically difficult due to their non-convex properties. Computationally, it is important to study the tractability of these problems given the large scale of datasets considered in practical applications. Recht et al. [11] studied the low-rank matrix recovery problem using a convex relaxation approach which is tractable. More precisely, in order to recover a low-rank matrix \mbox{\boldmathX}\in\mathbb{R}^{m\times n} which satisfy {\cal A}(\mbox{\boldmathX})=\mbox{\boldmathb}, where the linear map and \mbox{\boldmathb}\in\mathbb{R}^{p}, \mbox{\boldmathb}\neq\mbox{\boldmath0}, are given, the following convex optimization problem is proposed:
[TABLE]
where is the nuclear norm, the sum of all singular values of . Recht et al. [11] showed the recoverability of this convex approach using some restricted isometry conditions of the linear map . In general, these restricted isometry conditions are not satisfied and the proposed convex relaxation can fail to recover the matrix .
Low-rank matrices appear to be appropriate representations of data in other applications such as biclustering of gene expression data. Doan and Vavasis [5] proposed a convex approach to recover low-rank clusters using dual Ky Fan --norm instead of the nuclear norm. Ky Fan --norm is defined as
[TABLE]
where are the first largest singular values of , k\leq k_{0}=\mbox{rank}(\mbox{\boldmathA}). The dual norm of the Ky Fan --norm is denoted by ,
[TABLE]
These unitarily invariant norms (see, e.g., Bhatia [2]) and their gauge functions have been used in sparse prediction problems [1], low-rank regression analysis [6] and multi-task learning regularization [7]. When , the Ky Fan --norm is the spectral norm, \left\lVert\mbox{\boldmathA}\right\rVert=\sigma_{1}(\mbox{\boldmathA}), the largest singular value of , whose dual norm is the nuclear norm. Similar to the nuclear norm, the dual Ky Fan --norm with can be used to compute the -approximation of a matrix (Proposition 2.9, [5]), which demonstrates its low-rank property. Motivated by this low-rank property of the (dual) Ky Fan --norm, which is more general than that of the nuclear norm, and its usage in other applications, in this paper, we propose a Ky Fan --norm-based non-convex approach for the matrix recovery problem which aims to recover matrices which are not recoverable by the convex relaxation formulation . In Section 2, we discuss the proposed models in detail and in Section 3, we develop numerical algorithms to solve those models. Some numerical results will also be presented.
2 Ky Fan --Norm-Based Models
The Ky Fan --norm is the -norm of the vector of largest singular values with . Thus we have:
[TABLE]
where is the Frobenius norm. Now consider the dual Ky Fan --norm and use the definition of the dual norm, we obtain the following inequality:
[TABLE]
Thus we have:
[TABLE]
It is clear that these inequalities become equalities if and only if \text{rank}(\mbox{\boldmathA})\leq k. It shows that to find a low-rank matrix that satisfies {\cal A}({\boldsymbol{X}})=\mbox{\boldmathb} with , we can solve either the following optimization problem
[TABLE]
or
[TABLE]
It is straightforward to see that these non-convex optimization problems can be used to recover low-rank matrices as stated in the following theorem given the norm inequalities in .
Theorem 2.1
If there exists a matrix such that and {\cal A}({\boldsymbol{X}})=\mbox{\boldmathb}, then is an optimal solution of and .
Given the result in Theorem 2.1, the exact recovery of a low-rank matrix using or relies on the uniqueness of the low-rank solution of {\cal A}({\boldsymbol{X}})=\mbox{\boldmathb}. Recht et al. [11] generalized the restricted isometry property of vectors introduced by Candès and Tao [4] to matrices and use it to provide sufficient conditions on the uniqueness of these solutions.
Definition 1 (Recht et al. [11])
For every integer with , the -restricted isometry constant is defined as the smallest number such that
[TABLE]
holds for all matrices of rank at most .
Using Theorem 3.2 in Recht et al. [11], we can obtain the following exact recovery result for and .
Theorem 2.2
Suppose that and there exists a matrix which satisfies {\cal A}({\boldsymbol{X}})=\mbox{\boldmathb} and , then is the unique solution to and , which implies exact recoverability.
The condition in Theorem 2.2 is indeed better than those obtained for the nuclear norm approach (see, e.g., Theorem 3.3 in Recht et al. [11]). The non-convex optimization problems and use norm ratio and difference. When , the norm ratio and difference are computed between the nuclear and Frobenius norm. The idea of using these norm ratios and differences with has been used to generate non-convex sparse generalizer in the vector case, i.e., . Yin et al. [13] investigated the ratio while Yin et al. [14] analyzed the difference in compressed sensing. Note that even though optimization formulations based on these norm ratios and differences are non-convex, they are still relaxations of -norm minimization problem unless the sparsity level of the optimal solution is . Our proposed approach is similar to the idea of the truncated difference of the nuclear norm and Frobenius norm discussed in Ma et al [8]. Given a parameter , the truncated difference is defined as \displaystyle\left\lVert\mbox{\boldmathA}\right\rVert_{*,t-F}=\sum_{i=t+1}^{\min\{m,n\}}\sigma_{i}(\mbox{\boldmathA})-\left(\sum_{i=t+1}^{\min\{m,n\}}\sigma_{i}^{2}(\mbox{\boldmathA})\right)^{\frac{1}{2}}\geq 0. For , the problem of truncated difference minimization can be used to recover matrices with rank at most given that if . Similar results for exact recovery as in Theorem 2.2 are provided in Theorem 3.7(a) in Ma et al [8]. Despite the similarity with respect to the recovery results, the problems and are motivated from a different perspective. We are now going to discuss how to solve these problems next.
3 Numerical Algorithm
3.1 Difference of Convex Algorithms
We start with the problem . It can be reformulated as
[TABLE]
with the change of variables and . The compact formulation
[TABLE]
where is the feasible set of the problem and is the indicator function of . The problem is a difference of convex (d.c.) optimization problem (see, e.g. [9]). The differnce of convex algorithm DCA proposed in [9] can be applied to the problem as follows.
Step 1. Start with for some such that {\cal A}({\boldsymbol{X}}^{0})=\mbox{\boldmathb} and set .
Step 2. Update as an optimal solution of the following convex optimization problem
[TABLE]
Step 3. Set and repeat Step 2.
Let and use the general convergence analysis of DCA (see, e.g., Theorem 3.7 in [10]), we can obtain the following convergence results.
Proposition 1
Given the sequence obtained from the DCA algorithm for the problem , the following statements are true.
- (i)
The sequence is non-increasing and convergent.
- (ii)
* when .*
The convergence results show that the DCA algorithm improves the objective of the ratio minimization problem . The DCA algorithm can stop if , where is the set of optimal solution of 10 and which satisfied this condition is called a critical point. Note that (local) optimal solutions of can be shown to be critical points. The following proposition shows that an equivalent condition for critical points.
Proposition 2
* if and only if {\boldsymbol{Y}}=\mbox{\boldmath0} is an optimal solution of the following optimization problem*
[TABLE]
Proof
Consider , i.e., {\cal A}({\boldsymbol{Y}})=\mbox{\boldmath0}, we then have:
[TABLE]
Clearly, is equivalent to
[TABLE]
When {\boldsymbol{Y}}=\mbox{\boldmath0}, we achieve the equality. We have: if and only the above inequality holds for all , which means f({\boldsymbol{Y}};{\boldsymbol{X}}^{s})\geq f(\mbox{\boldmath0};{\boldsymbol{X}}^{s}) for all , where . Clearly, it is equivalent to the fact that {\boldsymbol{Y}}=\mbox{\boldmath0} is an optimal solution of .
The result of Proposition 2 shows the similarity between the norm ratio minimization problem and the norm different minimization problem with respect to the implementation of the DCA algorithm. It is indeed that the problem is a d.c. optimization problem and the DCA algorithm can be applied as follows.
Step 1. Start with some such that {\cal A}({\boldsymbol{X}}^{0})=\mbox{\boldmathb} and set .
Step 2. Update , where is an optimal solution of the following convex optimization problem
[TABLE]
Step 3. Set and repeat Step 2.
It is clear that is a critical point for the problem if and only if is an optimal solution of . Both problems and can be written in the general form as
[TABLE]
where for and for , respectively. Given that {\cal A}({\boldsymbol{X}}^{s})=\mbox{\boldmathb}, this problem can be written as
[TABLE]
The following proposition shows that is a critical point of the problem for many functions if .
Proposition 3
If , is a critical point of the problem for any function which satisfies
[TABLE]
Proof
If , we have: since . Given that
[TABLE]
we have: . Thus for all , the following inequality holds:
[TABLE]
It implies {\boldsymbol{Y}}=\mbox{\boldmath0} is an optimal solution of the problem since the optimality condition is
[TABLE]
Thus is a critical point of the problem .
Proposition 3 shows that one can choose different functions such as for the sub-problem in the general DCA framework to solve the original problem. This generalized sub-problem is a convex optimization problem, which can be formulated as a semidefinite optimization problem given the following calculation of the dual Ky Fan --norm provided in [5]:
[TABLE]
In order to implement the DCA algorithm, one also needs to consider how to find the initial solution . We can use the nuclear norm minimization problem 1, the convex relaxation of the rank minimization problem, to find . A similar approach is to use the following dual Ky Fan --norm minimization problem to find given its low-rank properties:
[TABLE]
This initial problem can be considered as an instance of with {\boldsymbol{X}}^{s}=\mbox{\boldmath0} (and \alpha(\mbox{\boldmath0})=1), which is equivalent to starting the iterative algorithm with {\boldsymbol{X}}^{0}=\mbox{\boldmath0} one step ahead. We are now ready to provide some numerical results.
3.2 Numerical Results
Similar to Candès and Recht [3], we construct the following the experiment. We generate , an matrix of rank , by sampling two and factors \mbox{\boldmathM}_{L} and \mbox{\boldmathM}_{R} with i.i.d. Gaussian entries and setting \mbox{\boldmathM}=\mbox{\boldmathM}_{L}\mbox{\boldmathM}_{R}. The linear map is constructed with independent Gaussian matrices \mbox{\boldmathA}_{i} whose entries follows , i.e.,
[TABLE]
We generate matrix with , , and . The dimension of these matrices is . For each , we generate matrices for the random linear map with ranging from to . We set the maximum number of iterations of the algorithm to be . The instances are solved using SDPT3 solver [12] for semi-definite optimization problems in Matlab. The computer used for these numerical experiments is a 64-bit Windows 10 machine with 3.70GHz quad-core CPU, and 32GB RAM. The performance measure is the relative error \displaystyle\frac{\left\lVert{\boldsymbol{X}}-\mbox{\boldmathM}\right\rVert_{F}}{\left\lVert\mbox{\boldmathM}\right\rVert_{F}} and the threshold is chosen. We run three different algorithms, nuclear used the nuclear optimization formulation , k2-nuclear used the proposed iterative algorithm with initial solution obtained from , and k2-zero used the same algorithm with initial solution {\boldsymbol{X}}^{0}=\mbox{\boldmath0}. Figure 1 shows recovery probabilities and average computation times (in seconds) for different sizes of the linear map.
The results show that the proposed algorithm can recover exactly the matrix with rate when with both initial solutions while the nuclear norm approach cannot recover any matrix at all, i.e., rate, if . k2-nuclear is slightly better than k2-zero in terms of recoverability when is small while their average computational times are almost the same in all cases. The efficiency of the proposed algorithm when is small comes with higher average computational times as compared to that of the nuclear norm approach. For example, when , on average, one needs iterations to reach the solution when the proposed algorithm is used instead of with the nuclear norm optimization approach. Note that the average number of iterations is computed for all cases including cases when the matix cannot be recovered. For recoverable cases, the average number of iterations is much less. For example, when , the average number of iterations for recoverable case is instead of . When the size of the linear map increases, the average number of iterations is decreased significantly. We only need extra iterations when or extra iteration on average when to obtain recover rate when the nuclear norm optimization approach still cannot recover any of the matrices ( rate). These results show that the proposed algorithm achieve significantly better recovery rate with a small number of extra iterations in many cases. We also test the algorithms with higher ranks including and . Figure 2 shows the results when the size of linear map is .
These results show that when the size of linear maps is small, the proposed algorithms are significantly better than the nuclear norm optimization approach. With , the recovery probability increases when increases and it is close to 1 when . The computational time increases when increases given that the size of the sub-problems depends on the size of the linear map. With respect to the number of iterations, it remains low. When , the average numbers of iterations are 22 and 26 for k2-nuclear and k2-zero, respectively. It shows that k2-nuclear is slightly better than k2-zero both in terms of recovery probability and computational time.
4 Conclusion
We have proposed non-convex models based on the dual Ky Fan --norm for low-rank matrix recovery and developed a general DCA framework to solve the models. The computational results are promising. Numerical experiments with larger instances will be conducted with first-order algorithm development for the proposed modes as a future research direction.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the k 𝑘 k -support norm. In: NIPS. pp. 1466–1474 (2012)
- 2[2] Bhatia, R.: Matrix Analysis, Graduate Texts in Mathematics, vol. 169. Springer-Verlag, New York (1997)
- 3[3] Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Foundations of Computational mathematics 9 (6), 717–772 (2009)
- 4[4] Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Transactions on Information Theory 51 (12), 4203–4215 (2005)
- 5[5] Doan, X.V., Vavasis, S.: Finding the largest low-rank clusters with Ky Fan 2 2 2 - k 𝑘 k -norm and ℓ 1 subscript ℓ 1 \ell_{1} -norm. SIAM Journal on Optimization 26 (1), 274–312 (2016)
- 6[6] Giraud, C.: Low rank multivariate regression. Electronic Journal of Statistics 5 , 775–799 (2011)
- 7[7] Jacob, L., Bach, F., Vert, J.P.: Clustered multi-task learning: a convex formulation. In: NIPS. vol. 21, pp. 745–752 (2009)
- 8[8] Ma, T.H., Lou, Y., Huang, T.Z.: Truncated ℓ 1 − 2 subscript ℓ 1 2 \ell_{1-2} models for sparse recovery and rank minimization. SIAM Journal on Imaging Sciences 10 (3), 1346–1380 (2017)
