A multi-stage convex relaxation approach to noisy structured low-rank matrix recovery
Shujun Bi, Shaohua Pan, Defeng Sun

TL;DR
This paper introduces a multi-stage convex relaxation method for noisy structured low-rank matrix recovery, providing theoretical guarantees and demonstrating its effectiveness through numerical experiments.
Contribution
It proposes a novel multi-stage convex relaxation approach based on an exact penalty formulation, with proven convergence and error reduction guarantees.
Findings
The method achieves geometric convergence of the error sequence.
The approach outperforms standard nuclear norm relaxation in experiments.
Theoretical analysis confirms error reduction and rank approximation bounds.
Abstract
This paper concerns with a noisy structured low-rank matrix recovery problem which can be modeled as a structured rank minimization problem. We reformulate this problem as a mathematical program with a generalized complementarity constraint (MPGCC), and show that its penalty version, yielded by moving the generalized complementarity constraint to the objective, has the same global optimal solution set as the MPGCC does whenever the penalty parameter is over a threshold. Then, by solving the exact penalty problem in an alternating way, we obtain a multi-stage convex relaxation approach. We provide theoretical guarantees for our approach under a mild restricted eigenvalue condition, by quantifying the reduction of the error and approximate rank bounds of the first stage convex relaxation (which is exactly the nuclear norm relaxation) in the subsequent stages and establishing the geometric…
| 0 | 0.1 | 0.3 | 0.5 | 0.7 | 0.9 | |||
| 0.819 | 0.766 | 0.658 | 0.547 | 0.433 | 0.316 | |||
| 0.818 | 0.763 | 0.652 | 0.537 | 0.420 | 0.302 | |||
| 0.818 | 0.763 | 0.651 | 0.536 | 0.420 | 0.302 | |||
| 0.818 | 0.763 | 0.651 | 0.536 | 0.420 | 0.302 | |||
| 0.975 | 0.969 | 0.955 | 0.934 | 0.905 | 0.856 | |||
| 0.967 | 0.958 | 0.931 | 0.888 | 0.816 | 0.689 | |||
| 0.965 | 0.954 | 0.920 | 0.760 | 0.752 | 0.572 | |||
| 0.965 | 0.953 | 0.915 | 0.744 | 0.714 | 0.516 | |||
| 0.817 | 0.759 | 0.644 | 0.528 | 0.413 | 0.297 |
| Algorithm 3.1 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| | | eigr | The first stage | The first two stages | Final result | ||||
| relerr(rank) | iter | relerr(rank) | iter | relerr(rank) | iter | ||||
| 0 | 1.19 | 5.94e-1(1000) | 89 | 1.59e-1(6) | 1100 | 5 | 1.56e-1(5) | 4458 | |
| 0 | 2.86 | 4.43e-1(1000) | 101 | 1.51e-1(5) | 629 | 4 | 1.52e-1(5) | 1805 | |
| 0 | 4.36 | 3.57e-1(1000) | 98 | 1.49e-1(6) | 769 | 5 | 1.54e-1(5) | 2006 | |
| 5 | 100 | 1.17 | 5.81e-1(1000) | 88 | 1.53e-1(6) | 894 | 5 | 1.50e-1(5) | 4098 |
| 100 | 2.79 | 4.48e-1(1000) | 100 | 1.47e-1(5) | 697 | 4 | 1.48e-1(5) | 1734 | |
| 100 | 4.23 | 3.60e-1(1000) | 98 | 1.48e-1(6) | 832 | 5 | 1.49e-1(5) | 2004 | |
| 0 | 1.36 | 4.16e-1(1000) | 67 | 1.48e-1(10) | 608 | 4 | 1.43e-1(10) | 1987 | |
| 0 | 3.52 | 3.49e-1(1000) | 70 | 1.42e-1(10) | 464 | 4 | 1.41e-1(10) | 1152 | |
| 0 | 6.39 | 2.95e-1(1000) | 59 | 1.34e-1(10) | 373 | 4 | 1.37e-1(10) | 852 | |
| 10 | 100 | 1.42 | 3.97e-1(1000) | 67 | 1.46e-1(10) | 562 | 4 | 1.42e-1(10) | 1934 |
| 100 | 3.31 | 3.43e-1(1000) | 70 | 1.40e-1(10) | 470 | 4 | 1.40e-1(10) | 1218 | |
| 100 | 6.35 | 2.87e-1(1000) | 66 | 1.38e-1(10) | 344 | 5 | 1.42e-1(10) | 818 | |
| Algorithm 3.1 | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| | eigr | The first stage | The first two stages | Final result | |||||
| relerr(rank) | iter | relerr(rank) | iter | relerr(rank) | iter | ||||
| 1.18 | 4.80e-1(36) | 787 | 2.42e-1(7) | 1309 | 4 | 2.25e-1(5) | 1996 | ||
| 4.59 | 3.19e-1(32) | 676 | 1.92e-1(8) | 1214 | 4 | 1.82e-1(5) | 1800 | ||
| 5 | 1.20 | 4.86e-1(36) | 275 | 2.42e-1(7) | 451 | 4 | 2.21e-1(5) | 996 | |
| 4.26 | 3.24e-1(33) | 349 | 1.98e-1(7) | 523 | 4 | 1.90e-1(5) | 1016 | ||
| 1.21 | 4.74e-1(36) | 1038 | 2.42e-1(6) | 1734 | 4 | 2.24e-1(5) | 2746 | ||
| 4.07 | 3.33e-1(33) | 879 | 1.92e-1(7) | 1354 | 4 | 1.80e-1(5) | 2155 | ||
| 5.33 | 2.20e-1(53) | 253 | 1.56e-1(13) | 409 | 3 | 1.54e-1(13) | 554 | ||
| 7.72 | 1.90e-1(48) | 258 | 1.46e-1(13) | 393 | 3 | 1.45e-1(13) | 557 | ||
| 13 | 5.17 | 2.20e-1(53) | 177 | 1.58e-1(13) | 321 | 3 | 1.55e-1(13) | 466 | |
| 9.01 | 1.78e-1(45) | 172 | 1.44e-1(13) | 323 | 3 | 1.43e-1(13) | 484 | ||
| 4.58 | 2.30e-1(54) | 256 | 1.59e-1(13) | 402 | 3 | 1.56e-1(13) | 548 | ||
| 8.11 | 1.85e-1(48) | 221 | 1.47e-1(13) | 369 | 3 | 1.46e-1(13) | 530 | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Photoacoustic and Ultrasonic Imaging · Numerical methods in inverse problems
A multi-stage convex relaxation approach to noisy structured low-rank matrix recovery
Shujun Bi111School of Mathematics, South China University of Technology, Tianhe District of Guangzhou City, China ([email protected]).
Shaohua Pan222Corresponding author. School of Mathematics, South China University of Technology, Tianhe District of Guangzhou City, China ([email protected]). and Defeng Sun333Department of Mathematics and Risk Management Institute, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 ([email protected]).
(March 1, 2017)
Abstract
This paper concerns with a noisy structured low-rank matrix recovery problem which can be modeled as a structured rank minimization problem. We reformulate this problem as a mathematical program with a generalized complementarity constraint (MPGCC), and show that its penalty version, yielded by moving the generalized complementarity constraint to the objective, has the same global optimal solution set as the MPGCC does whenever the penalty parameter is over a threshold. Then, by solving the exact penalty problem in an alternating way, we obtain a multi-stage convex relaxation approach. We provide theoretical guarantees for our approach under a mild restricted eigenvalue condition, by quantifying the reduction of the error and approximate rank bounds of the first stage convex relaxation (which is exactly the nuclear norm relaxation) in the subsequent stages and establishing the geometric convergence of the error sequence in a statistical sense. Numerical experiments are conducted for some structured low-rank matrix recovery examples to confirm our theoretical findings.
Keywords Structured rank minimization; MPGCC; Exact penalty; Convex relaxation
Mathematics Subject Classification(2010). 90C27, 90C33, 49M20
1 Introduction
The task of noisy structured low-rank matrix recovery is to find a low-rank matrix with a certain structure consistent with some noisy linear measurements. Let be the target matrix to be recovered and be the noisy measurement vector, where is the sampling operator and is the noisy vector with for some . The noisy structured low-rank matrix recovery problem can be modeled as
[TABLE]
where is a compact convex set to represent the structure of . Without loss of generality, we assume that for , where are the given matrices in . Such a structured rank minimization problem has wide applications in system identification and control [10, 12], signal and image processing [16, 6], machine learning [33], multi-dimensional scaling in statistics [28], finance [27], quantum tomography [14], and so on. For instance, one is often led to seek a low-rank Hankel matrix in system identification and control, a low-rank correlation matrix in finance and a low-rank density matrix in quantum tomography.
Due to the combinatorial property of the rank function, problem (1) is generally NP-hard. One popular way to deal with NP-hard problems is to use the convex relaxation technique, which typically yields a desirable local optimal or at least a feasible solution via solving a single or a sequence of numerically tractable convex optimization problems. Fazel [10] initiated the research for the nuclear norm relaxation method, motivated by the fact that the nuclear norm is the convex envelope of the rank function in the unit ball on the spectral norm. In the past ten years, this relaxation method has received much attention from many fields such as information, computer science, statistics, optimization, and so on (see, e.g., [4, 14, 30, 19, 20, 25, 35]), and it has been shown that a single nuclear norm minimization problem can recover the target matrix under a certain restricted isometry property (RIP) of when [30] or can yield a solution satisfying a certain error bound when [5]. For its recoverability and error bounds under other conditions, the interested readers may refer to the literature [9, 25, 31] and references therein.
Most of the existing low-rank matrix optimization models are focused on the case that . When the structure on the target matrix is known, it is reasonable to consider the rank minimization problem (1) with indicating the available information. However, the (hard) constraint often contradicts the role of the nuclear norm in promoting a low-rank solution. For example, when consists of the set of correlation matrices, the nuclear norm relaxation method for (1) may fail in generating a low-rank solution since the nuclear norm becomes a constant in the set . In addition, although some error bounds have been established for the nuclear norm relaxation method in the noisy setting [5, 25, 26], they are minimax-optimal up to a logarithmic factor of the dimension [26], instead of a constant factor like the -norm relaxation method for sparse regression [29]. These two considerations motivate us to seek more efficient convex relaxations.
1.1 Our main contribution
The main contribution of this work is the introduction of a multi-stage convex relaxation approach via an equivalent Lipschitz optimization reformulation. This approach can efficiently reduce the error bounds obtained from the nuclear norm convex relaxation. More specifically, we reformulate problem (1) as an equivalent MPGCC by using a variational characterization of the rank function and verify that its penalized version, yielded by moving the generalized complementarity constraint to the objective, has the same global optimal solution set as the MPGCC does once the penalty parameter is over a threshold. This exact penalty problem not only has a convex feasible set but also possesses a Lipschitz objective function with a bilinear structure, which offers a favorable Lipschitz reformulation for problem (1). To the best of our knowledge, this is the first equivalent Lipschitz characterization for low-rank matrix optimization problems. Then with this reformulation, we propose a multi-stage convex relaxation approach by solving the exact penalty problem in an alternating way. In particular, under a restricted eigenvalue condition weaker than the RIP condition used in [5, 23], we quantify the reduction of the error and approximate bounds derived from the first stage nuclear norm convex relaxation in the subsequent stages, and establish the geometric convergence of the error sequence in a statistical sense. Among others, the latter entails an upper estimation for the stage number of the convex relaxations to make the estimation error to reach the statistical error level. The analysis shows that the error and approximate rank bounds of the nuclear norm relaxation are reduced most in the second stage and the reduction rate is at least for those problems with a relatively worse restricted eigenvalue property, and the reduction becomes less as the number of stages increases and can be ignored after the fifth stage.
1.2 Related works
The idea of using the multi-stage convex relaxation for low-rank optimization problems is not new. In order to improve the solution quality of the nuclear norm relaxation method, some researchers pay their attention to nonconvex surrogates of low-rank optimization problems. Since seeking a global optimal solution of a nonconvex surrogate problem is almost as difficult as solving a low-rank optimization problem itself, they relax nonconvex surrogates into a sequence of simple matrix optimization problems, and develop the reweighted minimization methods (see [11, 24, 21]). In contrast to our multi-stage convex relaxation approach, such sequential convex relaxation methods are designed by solving a sequence of convex relaxation problems of nonconvex surrogates instead of the equivalent reformulation. We also notice that the theoretical analysis in [23] for the reweighted trace norm minimization method [11] depends on the special property of the log-determinant function, which is not applicable to general low-rank optimization problems, and the theoretical guarantees in [21] were established only for the noiseless recovery problem.
Additionally, some researchers have reformulated low-rank optimization problems as smooth nonconvex problems with the help of low-rank decomposition of matrices in the attempt to achieve a desirable solution by solving the smooth nonconvex problems in an alternating way (actually by solving a sequence of simple convex matrix optimization problems); see, e.g., [32, 18]. This class of convex relaxation methods has a theoretical guarantee, but is not applicable to those problems with hard constraints such as (1).
Finally, it is worthwhile to point out that our multi-stage convex relaxation approach is highly relevant to the one proposed by Zhang [39] for sparse regularization problems and the rank-corrected procedure for the matrix completion problem with fixed coefficients [22]. The former is designed via solving a sequence of convex relaxation problems for the nonconvex surrogates of the zero-norm regularization problem. Since the singular values vectors are involved in low-rank matrix recovery, the analysis technique in [39] is not applicable to our multi-stage convex approach to problem (1). In particular, for low-rank matrix recovery, it is not clear whether the error sequence yielded by the multi-stage convex relaxation approach shrinks geometrically or not in a statistical sense, and if it does, under what conditions. We will answer these questions affirmatively in Section 4. The rank-corrected procedure [22] is actually a two-stage convex relaxation approach in which the first-stage is to find a reasonably good initial estimator and the second-stage is to solve the rank-corrected problem. This procedure has already been applied to nonlinear dimensionality reduction problems [8] and tensor completion problems [3]. However, when the rank of the true matrix is unknown, the rank-corrected problem in [22] needs to be constructed heuristically with the knowledge of the initial estimator, while each subproblem in our multi-stage convex relaxation approach stems from the global exact penalty of the equivalent MPGCC. In addition, the analytical technique used in [22] is more reliant on concentration inequalities in probability analysis, whereas our analysis is deterministic and relies on the restricted eigenvalue property of linear operators.
1.3 Notation
We stipulate and let be the vector space of all real matrices endowed with the trace inner product and its induced norm . For , we denote by the singular value vector of with entries arranged in a non-increasing order, and and by the nuclear norm and the spectral norm of , respectively. Let be the set in consisting of all matrices whose columns are of unit length and are mutually orthogonal to each other, and denote . Let be the vector of all ones whose dimension is known from the context.
Let be the family of closed proper convex functions satisfying
[TABLE]
For each , let be the associated proper closed convex function:
[TABLE]
Then from convex analysis [34] we know that the conjugate of has the properties:
[TABLE]
We also need the eigenvalues of restricted to a set of low-rank matrices, where denotes the adjoint of . To this end, for a given positive integer , we define
[TABLE]
which can be viewed as the largest and the smallest rank -restricted eigenvalues of , respectively.
2 Exact penalty for an equivalent reformulation
First of all, we shall provide an equivalent reformulation of the rank minimization problem (1) with the help of the following variational characterization of the rank function.
Lemma 2.1
Let . Then, for any given , it holds that
[TABLE]
Proof: Fix an arbitrary matrix . Write and assume that has the SVD as , where and with and . Let be an arbitrary feasible point of the minimization problem (6). It then follows from [17, Equation (3.3.25)] that
[TABLE]
which implies that . Along with for , we obtain that if , and if . Consequently, i.e., is a lower bound for the optimal value of (6). Clearly, with defined in (2) is feasible to (6) with the objective value being . This shows that is an optimal solution of the minimization problem (6) with the optimal value equal to .
Recall that for . By Lemma 2.1, we readily have the following result.
Proposition 2.1
Let . Then, the rank minimization problem (1) is equivalent to
[TABLE]
in the following sense: if is a global optimal solution of (1), where and with and for , then is globally optimal to (2.1); and conversely, if is a global optimal solution to (2.1), then is globally optimal to (1).
The constraints and involve a complementarity relation that, for the positive semidefinite (PSD) rank minimization problem, is exactly the PSD cone complementarity relation. In view of this, we call problem (2.1) an MPGCC.
Due to the presence of the nonconvex constraint the MPGCC (2.1) is as difficult as the original problem (1). Nevertheless, it provides us a new view to tackle the difficult rank minimization problem (1). Since numerically it is usually more convenient to handle nonconvex objective functions than to handle nonconvex constraints, we are motivated to investigate the following penalized problem of the MPGCC (2.1):
[TABLE]
Next we shall verify that (2) is an exact penalty version for (2.1) in the sense that there exists a constant such that the global optimal solution set of (2) associated to any coincides with that of (2.1). To the best of our knowledge, there are only a few works devoted to mathematical programs with matrix cone complementarity constraints [7, 37], which mainly focus on the optimality conditions, but not the exact penalty conditions.
Theorem 2.1
Let the optimal value of (1) be . Then, there exists a constant such that for all , where is the feasible set of (1), and for the global optimal solution set of (2) associated to any is the same as that of (2.1).
Proof: Suppose that there exists a sequence such that . Notice that is bounded since is bounded. Let be an accumulation point of . By the closedness of and the continuity of , and , so that . This is a contradiction which establishes the existence of .
We denote by and the feasible set and the global optimal solution set of (2.1), respectively. For any given , let and be the feasible set and the global optimal solution set of the corresponding penalty problem (2), respectively. By the first part of our conclusion, there exists a constant such that for all . Let be an arbitrary constant with . Then, for any and each ,
[TABLE]
First we verify that each satisfies and . Indeed, since and is the optimal value of problem (2.1), it holds that
[TABLE]
In addition, from [17, Equation (3.3.25)], it follows that
[TABLE]
where the second inequality is by the nonnegativity of and for all , and the last one is due to (9). Together with (10), we obtain that
[TABLE]
This, along with (9), implies that for . Substituting for into the last equation and using the nonnegativity of in , we deduce that and . This means that for and , where is defined in (2). Then, and for . Since the global optimal value of (2.1) is , we have . For the reverse inclusion, let be an arbitrary point from . Then and . While the optimal value of (2) is by the last equation. Thus, we get .
Theorem 2.1 extends the exact penalty result of [1, Theorem 3.3] for the zero-norm minimization problem to the matrix setting, and further develops the exact penalty result of the rank-constrained minimization problems in [2, Theorem 3.1]. Observe that the objective function of problem (2) is globally Lipschitz continuous over its feasible set. Combining Theorem 2.1 with Proposition 2.1, we conclude that the rank minimization problem (1) is equivalent to the Lipschitz optimization problem (2).
3 A multi-stage convex relaxation approach
In the last section, we prove that the rank minimization problem (1) is equivalent to a single penalty problem (2). This penalty problem depends on the parameter , the lower bound for the th largest singular value of all , which can be difficult to estimate. This means that a sequence of penalty problems of the form (2) with non-decreasing should be solved so as to target achieving a global optimal solution of (1). The penalty problem (2) associated to a given is not globally solvable due to the nonconvexity of the objective function. However, it becomes a nuclear semi-norm minimization with respect to if the variable is fixed and has a closed form solution of (as will be shown later) if the variable is fixed. This motivates us to propose a multi-stage convex relaxation approach to (1) by solving a single penalty problem (2) in an alternating way.
Algorithm 3.1
(A multi-stage convex relaxation approach)
(S.0)
Choose a function . Let and set .
(S.1)
Solve the following nuclear semi-norm minimization problem
X^{k}\in\mathop{\arg\min}_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\big{\{}\|X\|_{*}\!-\!\langle W^{k-1},X\rangle\!:\ \|\mathcal{A}(X)\!-b\|\leq\delta,\ X\in\Omega\big{\}}.
(11)
If , select a suitable and go to Step (S.3); else go to Step (S.2).
(S.2)
Select a suitable ratio factor and set .
(S.3)
Solve the following minimization problem
W^{k}\in\mathop{\arg\min}_{W\in\mathbb{R}^{n_{1}\times n_{2}}}\big{\{}{\textstyle\sum_{i=1}^{n_{1}}}\phi(\sigma_{i}(W))\!-\rho_{k}\langle W,X^{k}\rangle\!:\ \|W\|\leq 1\big{\}}.
(12)
(S.4)
Let , and then go to Step (S.1).
The subproblem (11) corresponds to the penalty problem (2) associated to with the variable fixed to . Since the set is assumed to be compact, its solution is well defined. Assume that has the SVD as . By [17, Eq.(3.3.25)], it is easy to check that if is optimal to the convex minimization
[TABLE]
then is a global optimal solution to (12); and conversely if is globally optimal to (12), then is an optimal solution to (13). Write
[TABLE]
Then, together with [34, Theorem 23.5], it follows that such is optimal to the subproblem (12). This means that the main computational work of Algorithm 3.1 consists of solving a sequence of subproblems (11). Unless otherwise stated, in the sequel we choose when , which ensures that .
Since , the function defines a semi-norm over . So, the subproblem (11) is a nuclear semi-norm minimization problem. When , it reduces to the nuclear norm minimization problem, i.e., the first stage of Algorithm 3.1 is exactly the nuclear norm convex relaxation. It should be emphasized that Algorithm 3.1 is different from the reweighted trace norm minimization method [11, 23] and the iterative reweighted algorithm [21]. The former is proposed from the primal and dual viewpoint by solving an equivalent Lipschitz reformulation in an alternating way, whereas the latter is proposed from the primal viewpoint by relaxing a smooth nonconvex surrogate of (1).
To close this section, we illustrate the choice of in formula (14) with two .
Example 3.1
Let for . Clearly, with . Moreover, for the function defined by (3) with , an elementary calculation yields that
[TABLE]
Thus, one may choose w_{i}^{k}\!=\!\left\{\begin{array}[]{ll}1&\textrm{if}\ \sigma_{i}(X^{k})\geq\frac{1}{\rho_{k}};\\ 0&\textrm{otherwise}\end{array}\right. for the matrix in formula (14).
Example 3.2
Let for with , where is a constant. One can check that with . For the function defined by the equation (3) with , an elementary calculation yields that
[TABLE]
Hence, one may take w_{i}^{k}=\min\big{[}1+\epsilon-(\rho_{k}\sigma_{i}(X^{k})\!+\!1)^{q-1},1\big{]} for the matrix in (14),
Remark 3.1
A constant is introduced in the function so as to ensure that , and then problem (2) is a global exact penalization of (1). Thus, once yielded by Algorithm 3.1 satisfies , is at least a local minimum of the problem (1) since each feasible solution of (1) is locally optimal.
4 Theoretical guarantees of Algorithm 3.1
In this section, we shall provide the theoretical guarantees of Algorithm 3.1 under a mild condition for the restricted eigenvalues of , which is stated as follows.
Assumption 1
There exist a constant and an integer such that , where and are the functions defined by (5).
Assumption 1 requires the restricted eigenvalue ratio of to grow sublinearly in . This condition, extending the sparse eigenvalue condition used for the analysis of sparse regularization (see [38, 39]), is weaker than the RIP condition used in [5] for , where is the -restricted isometry constant of defined as in [5]. Indeed, from the definitions of and , it is immediate to have that
[TABLE]
This shows that is such that for . In addition, this condition is also weaker than the RIP condition used in [23] for , where is an even number or is an odd number greater than . To see this, assume that , and is an even number or is an odd number greater than . Then,
[TABLE]
So, and are respectively such that for and .
In the sequel, we let have the SVD as , where and with and for , and write where is the tangent space at associated to the rank constraint (see equation (28) for its definition). For convenience, for , let
[TABLE]
The proofs of all the results in the subsequent subsections are given in Appendix C.
4.1 Error and approximate rank bounds
Under Assumption 1, when for some , we can establish the following error bound and approximate rank bound for the solution of the th subproblem.
Proposition 4.1
Suppose that Assumption 1 holds and for some . Then
[TABLE]
where and are the increasing functions defined by
[TABLE]
Remark 4.1
(a)* Since implies that , it is reasonable to view as a measure for the approximate rank of . So, the second inequality in (18) provides an approximate rank bound for . The error and approximate rank bounds in (18) consist of two parts: one part is the statistical error from the noise and the operator , and the other part is the estimation error from .*
(b)* Since , we have . Hence, under Assumption 1, the error and approximate rank bounds of the nuclear norm convex relaxation are*
[TABLE]
Moreover, if Assumption 1 is satisfied with and for , then the error bound is tighter than the bound given by **[23, Theorem III.1]** with for the nuclear norm relaxation because
[TABLE]
Remark 4.1 (b) says that under Assumption 1 the solution of the first stage convex relaxation has the error and approximate rank bounds as in (19). However, it is not clear whether has such error and approximate rank bounds or not. The following theorem states that if in addition and and are appropriately chosen, all have the bounds as in (18), and more importantly, their error and approximate rank bounds are, respectively, smaller than those of . To achieve this result, we need the sequence , which is defined recursively with as
[TABLE]
Theorem 4.1
Suppose that Assumption 1 holds and . If the parameters and are respectively chosen such that and \mu_{k}\in\big{[}1,\frac{\Xi(\widetilde{\gamma}_{k-2})}{\Xi(\widetilde{\gamma}_{k-1})}\big{]}, then all satisfy the inequalities in (18) and for it also holds that
[TABLE]
Remark 4.2
(a)* Theorem 4.1 shows that under Assumption 1 and , if and are chosen appropriately, then the error and approximate rank bounds of improve those of at least by and , respectively.*
(b)* The choice of depends on . For instance, take the function in Example 3.1. If for , then by virtue of the definitions of and and equation (15) it is easy to check that and , and consequently . This means that \big{(}\frac{1}{(\alpha-1)\Xi(\gamma_{0})},\frac{1}{\Xi(\gamma_{0})}\big{)} is the range of choice for . For numerical computations, one may estimate and with the help of .*
To close this subsection, we illustrate the ratios and by the functions and with and . For this purpose, we suppose that Assumption 1 holds with and for . Then, for those in the first row of Table 1, one may compute the ratios and as those in the last six columns of Table 1 with chosen as the middle point of the interval and . We see that the error bound of the first stage is reduced most in the second stage, and as the number of stages increases, the reduction becomes less. For Algorithm 3.1 with , the reduction is close to the limit when , but for Algorithm 3.1 with , there is a little room for the reduction especially for those with .
4.2 Geometric convergence
Generally speaking, because of the presence of the noise, it is impossible for the error sequence to decrease and then converge geometrically. However, one may establish its geometric convergence in a statistical sense as in the following theorem.
Theorem 4.2
Suppose that Assumption 1 holds and for . If and are chosen as in Theorem 4.1, then for ,
[TABLE]
Remark 4.3
(a)* The requirement in Theorem 4.2 is bit stronger than . Take for example. When , this requirement is automatically satisfied. Also, now we have that .*
(b)* The first term of the sum on the right hand side of (21) represents the statistical error arising from the noise and the sampling operator , and the second term is the estimation error related to the multi-stage convex relaxation. Clearly, the statistical error is of a certain order of . Thus, to guarantee that the second term is less than the statistical error, at most stage convex relaxations are required, where*
[TABLE]
Take for example. When , one can calculate that if , and if . This means that, for those with a worse restricted eigenvalue condition, more than two stage convex relaxations are needed to yield a satisfactory solution.
For the analysis in the previous two subsections, the condition for a certain is required for the decreasing of the error and approximate rank bounds of the first stage convex relaxation and the contraction of the error sequence. Such a condition is necessary for the low-rank recovery since, when the smallest nonzero singular is mistaken as a zero, the additional singular vectors will yield a large error. In fact, in the geometric convergence analysis of sparse vector optimization (see [39]), the error bound of the first stage was implicitly assumed not to be too large. In addition, we observe that the structure information of does not lend any help to the low-rank matrix recovery in terms of convergence rates. However, when the true matrix has a certain structure, it is necessary to incorporate such structure information into model (1). Otherwise, the solution yielded by the multi-stage convex relaxation may not satisfy the structure constraint, and then it is impossible to control the error of to the true matrix .
Finally, we point out that when the components of the noisy vector are independent (but not necessarily identically distributed) sub-Gaussians, i.e., there exists a constant such that holds for all and any , by Lemma 7 in Appendix C, the conclusions of Theorems 4.1 and 4.2 hold with with probability at least for an absolute constant . For the random , the following result is immediate by [5, Theorem 2.3] and the first inequality in (16).
Theorem 4.3
Fix and let be a random measurement ensemble obeying the following conditions: for any given and any fixed ,
[TABLE]
for fixed constants (which may depend on ). If with , then Assumption 1 holds for and with probability exceeding where . Consequently, when , the bounds in (18) holds with probability at least for such random measurements.
As remarked after [5, Theorem 2.3], the condition in (22) holds when is a Gaussian random measurement ensemble (i.e., are independent from each other and each contains i.i.d. entries ); or when each entry of each has i.i.d. entries that are equally likely to take or ; or when is a random projection (see [30]).
5 Numerical experiments
In this section, we shall test the theoretical results in Section 4 by applying Algorithm 3.1 to some low-rank matrix recovery problems, including matrix sensing and matrix completion problems. During the testing, we chose with and for the function in Algorithm 3.1. Although Table 1 shows that Algorithm 3.1 with reduces the error faster than Algorithm 3.1 with does, our preliminary testing indicates that the latter has a little better performance in reducing the relative error. This accounts for choosing instead of for our numerical testing. In addition, we always chose and . All the results in the subsequent subsections were run on the Windows system with an Intel(R) Core(TM) i3-2120 CPU 3.30GHz.
5.1 Low-rank matrix sensing problems
We tested the performance of Algorithm 3.1 with some matrix sensing problems, for which some entries are known exactly. Specifically, we assumed that entries of the true of rank are known exactly, and generated in the following command
XR = randn(n1,r); XL = randn(n2,r); Xbar = XR*XL’.
We successively generated the matrices with i.i.d. standard normal entries to formulate the sampling operator . Such satisfies the RIP property with a high probability by [30], which means that the restricted eigenvalues of can satisfy Assumption 1 with a high probability from the discussions after Assumption 1. Then, we successively generated the standard Gaussian noises to formulate by
[TABLE]
Take and \Omega=\big{\{}X\in\mathbb{R}^{n_{1}\times n_{2}}\ |\ \mathcal{B}X=d,\,\|X\|\leq R\big{\}} for , where for with being the index set of known entries, and is the vector consisting of for . Let denote the indicator function over a set . The subproblem (11) in Algorithm 3.1 now has the form
[TABLE]
where and . The dual of (5.1) is
[TABLE]
During the testing, we solved the subproblems of the form (5.1) until the primal and dual relative infeasibility is less than and the difference between the primal objective value and the dual one is less than , with the powerful Schur-complement based semi-proximal ADMM (alternating direction method of multipliers) [15] for problem (5.1).
5.1.1 Performance of Algorithm 3.1 in different stages
We generated randomly a matrix sensing problem with some entries known as above with , and to test the performance of Algorithm 3.1 in different stages. Figure 1 plots the relative error of Algorithm 3.1 in the first fifteen stages. We see that Algorithm 3.1 reduces the relative error of the nuclear norm relaxation method most in the second stage, and after the third stage the reduction becomes insignificant. This performance coincides with the analysis results shown as in Table 1.
5.1.2 Performance of Algorithm 3.1 with different samples
We generated randomly a matrix sensing problem with some entries known as above with , and for to test the performance of Algorithm 3.1 under different samples. Figure 2 depicts the relative error curves and the rank curves of the first stage convex relaxation and the first five stages convex relaxation, respectively. We see that the relative errors of the first stage convex relaxation and the first five stages convex relaxation decrease as the number of samples increases, but the relative error of the latter is always smaller than that of the former. Moreover, the first five stages convex relaxation reduces those of the first stage convex relaxation at least for , and the reduction becomes less as the number of samples increases. In particular, the rank of is higher than that of even for sampling ratio, but the rank of equals that of even for sampling ratio.
5.2 Low-rank PSD matrix completion problems
We applied Algorithm 3.1 to two classes of low-rank PSD matrix completion problems. Although the sampling operators for such problems do not satisfy the RIP property, it is possible for the restricted eigenvalues of to satisfy Assumption 1. For these problems, \Omega=\big{\{}X\in\mathbb{S}_{+}^{n}\ |\ \mathcal{E}_{1}(X)=g_{1},\,\mathcal{E}_{2}(X)\leq g_{2}\big{\}} where and are the linear operators, and and are the given vectors. For this case, the subproblem (11) in Algorithm 3.1 now takes the form of
[TABLE]
After an elementary calculation, the dual problem of (5.2) has the following form
[TABLE]
During the testing, we solved the subproblems of the form (5.2) until the primal and dual relative infeasibility is less than and the difference between the primal objective value and the dual one is less than with the Schur-complement based semi-proximal ADMM [15] for problem (5.2), and stopped Algorithm 3.1 at the th iteration once
[TABLE]
where is the number of nonzero singular values of less than .
5.2.1 Low-rank correlation matrix completion problems
A correlation matrix is a real symmetric PSD matrix with all diagonals being . We generated the true correlation matrix of rank in the following command:
L = randn(n,r); W = weight*L(:,1:1); L(:,1:1) = W; G = L*L’;
M = diag(1./sqrt(diag(G)))*G*diag(1./sqrt(diag(G))); Xbar = (M+M’)/2.
In this way, one can control the ratio of the largest eigenvalue and the smallest nonzero eigenvalue of by weight. We assume that some off-diagonal entries of are known. Thus, for , , and , where the operator and the vector are defined as in Subsection 5.1. The noise vector and the observation vector are generated in the same way as in (23).
Table 2 reports the numerical results of Algorithm 3.1 for some examples generated randomly. The information of is reported in the first three columns, where the second column lists the number of known off-diagonal entries for , and the third column gives the ratio of the largest eigenvalue of to the smallest nonzero eigenvalue of . For each test example, we sampled partial unknown off-diagonal entries uniformly at random to formulate the operator , where the sample ratio is for and for . The fourth and the fifth columns report the results of the first stage convex relaxation and the first two stages convex relaxation, respectively, and the sixth column reports the final result, where relerr(rank) means the relative error and the rank of solutions, and iter is the total number of iterations required by the Schur-complement based semi-proximal ADMM for the corresponding convex relaxation.
We see that the solution given by the trace-norm relaxation method has a high relative error and a full rank, while the two-stage convex relaxation reduces the relative error of the trace-norm relaxation at least for all test problems. Although the two-stage convex relaxation may yield the desirable relative error for all the test problems, the ranks of some problems (for example, the third and the fourth) are higher than that of . With the number of stages increasing, Algorithm 3.1 yields the same rank as that of . This indicates that for the problems with suitable sample ratios, the two-stage convex relaxation is enough; while for the problem with very low sample ratios, more than two stages convex relaxation is needed. In addition, since all the constraints to define the set are of the hard type, some of the relative errors in the sixth column are little higher than those in the fifth column.
5.2.2 Low-rank covariance matrix completion problems
We generated the true covariance matrix of rank in the following command:
L = randn(n,r)/sqrt(sqrt(n)); W = weight*L(:,1:1);
L(:,1:1) = W; G = L*L’; Xbar = (G+G’)/2.
In this case, and where and are defined as in Subsection 5.1, for with being the index set of unknown diagonal entries of , and is the vector consisting of the upper bounds for unknown diagonal entries of . We set .
Table 3 reports the numerical results of Algorithm 3.1 for some problems generated randomly. The information of the true covariance matrix is reported in the first two columns, where the second column lists the number of known diagonal and off-diagonal entries of , and the third column reports the ratio of the largest eigenvalue of to the smallest nonzero eigenvalue of . For each test example, we sampled the upper triangular entries uniformly at random to formulate the sampling operator , where the sample ratio is for and for . The fourth and the fifth columns report the results of the first stage convex relaxation and the first two stages convex relaxation, respectively, and the last one lists the final results.
We see that the solution yielded by the trace-norm relaxation method has a high relative error and rank, and the solution given by the first two stages convex relaxation has the desirable relative error but its rank is still higher than that of the true matrix for those problems with low sample ratios. This shows that for these difficult problems, more than two stages convex relaxation is required. For the problems with with sample ratio, the two-stage convex relaxation reduces the error bounds of the trace-norm relaxation method at least , while for those problems with sample ratio, the reduction rate is over . In addition, combining with the results in Table 2, we see that the performance of our multi-stage convex relaxation has no direct like with the ratio of the largest eigenvalue and the smallest nonzero eigenvalue of the true .
6 Conclusions
We have proposed a multi-stage convex relaxation approach to the structured rank minimization problem (1) by solving the exact penalty problem of its equivalent MPGCC in an alternating way. It turned out that this approach not only has favorable theoretical guarantees but also reduces the error of the nuclear norm relaxation method effectively. There are several topics worthwhile to pursue, such as to develop the fast and effective algorithms for seeking the solution of subproblems, to establish the theoretical guarantee for the case where the subproblems are solved inexactly, and apply this approach to other classes of low-rank optimization problems, say, low-rank plus sparse problems.
Acknowledgements This work is supported by the National Natural Science Foundation of China under project No.11571120 and the Natural Science Foundation of Guangdong Province under project No. 2015A030313214. The authors would like to thank Dr Xudong Li on help us improve our computing codes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. J. Bi, X. L. Liu and S. H. Pan , Exact penalty decomposition method for zero-norm minimization based on MPEC formulation , SIAM Journal on Scientific Computing, 36 (2014): A 1451-A 1477.
- 2[2] S. J. Bi and S. H. Pan , Error bounds for rank constrained optimization problems and applications , Operations Research Letters, 44 (2016): 336-341.
- 3[3] M. R. Bai, X. J. Zhang, G. Y. Ni and C. F. Cui , An adaptive correction approach for tensor completion , SIAM Journal on Imaging Sciences, 9 (2016): 1298-1323.
- 4[4] E. J. Candès and B. Recht , Exact matrix completion via convex optimization , Foundations of Computational Mathematics, 9 (2009): 717-772.
- 5[5] E. J. Candès and Y. Plain , Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements , IEEE Transactions on Information Theory, 57 (2011): 2342-2359.
- 6[6] Y. X. Chen and Y. J. Chi , Robust spectral compressed sensing via structured matrix completion , IEEE Transactions on Information Theory, 60 (2014): 6576-6601.
- 7[7] C. Ding, D. F. Sun and J. J. Ye , First order optimality conditions for mathematical programs with semidefinite cone complementarity constraints , Mathematical Programming, 147 (2014): 539-579.
- 8[8] C. Ding and H. D. Qi , Convex optimization learning of faithful Euclidean distance representations in nonlinear dimensionality reduction , Mathematical Programming, 2016, doi:10.1007/s 10107-016-1090-7.
