A hybrid penalty method for a class of optimization problems with multiple rank constraints
Tianxiang Liu, Ivan Markovsky, Ting Kei Pong, Akiko Takeda

TL;DR
This paper introduces a hybrid penalty method combining penalty and pseudo-projection techniques to efficiently solve optimization problems with multiple rank constraints on Hankel matrices, relevant in system identification and signal processing.
Contribution
The paper proposes a novel hybrid penalty approach that integrates pseudo-projection methods for handling multiple rank constraints, with convergence analysis and practical efficiency demonstrated.
Findings
Efficient computation of pseudo-projection onto low-rank Hankel matrices using existing software.
Successful application of the hybrid method to numerical examples showing improved performance.
Convergence results established for the proposed hybrid penalty method.
Abstract
In this paper, we consider the problem of minimizing a smooth objective over multiple rank constraints on Hankel-structured matrices. This kind of problems arises in system identification, system theory and signal processing, where the rank constraints are typically "hard constraints". To solve these problems, we propose a hybrid penalty method that combines a penalty method with a post-processing scheme. Specifically, we solve the penalty subproblems until the penalty parameter reaches a given threshold, and then switch to a local alternating "pseudo-projection'' method to further reduce constraint violation. Pseudo-projection is a generalization of the concept of projection. We show that a pseudo-projection onto a {\em single} low-rank Hankel-structured matrix constraint can be computed efficiently by existing softwares such as SLRA (Markovsky and Usevich, 2014), under mild…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and numerical algorithms · Image and Signal Denoising Methods · Sparse and Compressive Sensing Techniques
\newsiamremark
remarkRemark \newsiamremarkconjectureConjecture
\newsiamthmclaimClaim \headersHybrid method for problems with multiple rank constraintsT. Liu, I. Markovsky, T. K. Pong and A. Takeda
A hybrid penalty method for a class of optimization problems with multiple rank constraints
Tianxiang Liu RIKEN Center for Advanced Intelligence Project, 1-4-1, Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan (). [email protected]
Ivan Markovsky Department ELEC, Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050 Brussels, Belgium (). [email protected]
Ting Kei Pong Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong. This author was supported partly by Hong Kong Research Grants Council PolyU153004/18p. (). [email protected]
Akiko Takeda Department of Creative Informatics, Graduate School of Information Science and Technology, the University of Tokyo, Tokyo, Japan (), RIKEN Center for Advanced Intelligence Project, 1-4-1, Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan (). [email protected]
Abstract
In this paper, we consider the problem of minimizing a smooth objective over multiple rank constraints on Hankel-structured matrices. This kind of problems arises in system identification, system theory and signal processing, where the rank constraints are typically “hard constraints”. To solve these problems, we propose a hybrid penalty method that combines a penalty method with a post-processing scheme. Specifically, we solve the penalty subproblems until the penalty parameter reaches a given threshold, and then switch to a local alternating “pseudo-projection” method to further reduce constraint violation. Pseudo-projection is a generalization of the concept of projection. We show that a pseudo-projection onto a single low-rank Hankel-structured matrix constraint can be computed efficiently by existing softwares such as SLRA (Markovsky and Usevich, 2014), under mild assumptions. We also demonstrate how the penalty subproblems in the hybrid penalty method can be solved by pseudo-projection-based optimization methods, and then present some convergence results for our hybrid penalty method. Finally, the efficiency of our method is illustrated by numerical examples.
keywords:
Hankel-structure, system identification, hybrid penalty method, pseudo-projection.
{AMS}
15B05, 90C30.
1 Introduction
Many data modeling problems can be posed and solved as structured low-rank approximation problems, i.e., problems of approximating matrices by preserving the structure but reducing the rank [12]. The to-be-approximated matrices are constructed from data and the model’s complexity is related to the rank of the approximation—the lower the rank, the simpler the model. However, the simpler the model is, the higher the approximation error is. One way to deal with this fundamental trade-off between model complexity and model accuracy is to solve a sequence of low-rank approximation problems with increasing bounds on the rank.
In static linear data modeling problems, i.e., models defined by linear algebraic equations, the data matrices are unstructured. All spectral and Fröbenius norm optimal unstructured low-rank approximations can be obtained from truncation of the singular value decomposition [4]. This result, known as the Eckart–Young–Mirsky theorem [2], is at the heart of dimensionality reduction methods in machine learning [21]. Unstructured low-rank approximation is equivalent to the principal component analysis in statistics and the total least squares in numerical linear algebra [17].
The object of system theory, control, and signal processing is dynamical models. In linear time-invariant data modeling problems, i.e., for models defined by linear constant-coefficient difference equations, the data matrix is Hankel structured [1, 5, 10, 16]. To see this, consider a system defined by the equation
[TABLE]
By definition, the time series is a trajectory of the system if
[TABLE]
where is the parameter vector of the system and
[TABLE]
is a Hankel matrix, constructed from the time series. Therefore, . The resulting Hankel structured low-rank approximation problem does not admit an analytic solution in terms of the singular value decomposition. For this reason, numerous local optimization [11] as well as convex relaxation [3] methods are proposed for solving it.
In this paper, we consider a generalization of the Hankel structured low-rank approximation problem to multiple rank constraints. An application that motivates this generalization is the common dynamics estimation problem in multi-channel signal processing [13, 14, 18]. Modeling each channel separately requires an individual rank constraint of a Hankel matrix in the optimization problem. Imposing the assumption that the channels have common dynamics then leads to an additional (coupling) rank constraint. The problem of common dynamics estimation is closely related to the problem of approximate common factor computation of multiple polynomials in computer algebra [6, 23]. Specifically, we consider the following optimization problem with multiple rank constraints:
[TABLE]
where (see Section 2 for notation), and are positive integers satisfying (), and represents the loss function, which is nonnegative, level-bounded and smooth with Lipschitz continuous gradient. For example, , where is the noisy observation signal.
For constrained problems such as (1) with smooth objectives, a classical solution method is the gradient projection algorithm, whose iterations require projections onto the feasible set. However, the coupling structure of the last constraint in (1) makes projection onto the feasible set a challenging problem: indeed, even the projection onto the set defined by each single constraint in (1) does not admit a closed-form solution. Thus, variants of proximal gradient algorithms cannot be directly applied to solving (1). Fortunately, we can show that one can obtain a so-called “pseudo-projection” (see Definition 2.2) onto the set defined by each single constraint by some existing solvers such as SLRA [15], under mild assumptions.
Motivated by this, we adopt a penalty approach and construct penalty subproblems whose feasible regions are either , or defined by either the first constraints or the last constraint in (1): the pseudo-projections are easy to compute in all these cases. We then propose an algorithm vNPGmajor for the penalty subproblems, making explicit use of the difference-of-convex (DC) structure of the penalty functions. The algorithm vNPGmajor is a variant of NPGmajor in [8, Algorithm 2] and is based on computing pseudo-projections, which can be done efficiently for the feasible region of the penalty subproblems.
While approximate solutions to (1) can now be obtained by our penalty method, such solutions are typically not feasible for (1). This is not ideal for applications such as system identification in which solution feasibility is an important concern [10]. Even though constraint violation can theoretically be reduced via solving a sequence of penalty subproblems with increasing weights in the penalty functions, in practice this strategy results in high computational cost and numerical instability. To resolve this issue, we shift to a post-processing method after obtaining a moderately accurate solution by our penalty method. Specifically, starting from such a solution obtained from the penalty method, we apply an alternating pseudo-projection method, alternating between the set defined by the first constraints in (1) and that defined by the last constraint there, to reduce constraint violation.
Our main contributions are highlighted as follows:
- •
We propose a hybrid penalty method (Algorithm 2) for solving (1): A penalty scheme allowing three different kinds of penalty subproblems, followed by an alternating pseudo-projection method for post-processing. An algorithm, vNPGmajor (Algorithm 1), is proposed for the penalty subproblems.
- •
We prove some convergence results for the hybrid penalty method, including an error bound for the penalty method (Theorem 3.3) and the convergence rate for the alternating pseudo-projection method (Theorem 3.7).
- •
We demonstrate how a pseudo-projection can be obtained by the solver SLRA [15] in Section 4, under mild assumptions.
The rest of this paper is organized as follows. In Section 2, we introduce notation and some basic properties of Hankel operators. The hybrid penalty method and the corresponding convergence analysis are presented in Section 3. In Section 4, we demonstrate how to compute pseudo-projections. Numerical simulation results are presented in Section 5. Finally, we give some concluding remarks in Section 6.
2 Notation and preliminaries
Throughout this paper, we let denote the -dimensional Euclidean space and denote the Euclidean norm induced by vector inner product . For an , we let denote its th entry. For vectors , we let . Given a matrix , we let denote its Fröbenius norm, denote its spectral norm, denote its transpose and denote its th entry. For , , we denote the matrix inner product by . For a linear operator , we use , and to denote its adjoint, range and kernel, respectively.
For an extended-real-valued function , we say that is proper if , and is closed if it is lower semi-continuous. Following [20, Definition 8.3], for a proper closed function , the regular subdifferential of at is defined as
[TABLE]
and the (limiting) subdifferential of at is defined as
[TABLE]
where means both and . We say that is a stationary point of if . It is known from [20, Theorem 10.1] that any local minimizer of is a stationary point.
For a nonempty closed set , we let denote the indicator function of , which is zero in and is infinity otherwise. The regular normal cone and (limiting) normal cone of at are defined by and respectively. We use to denote the distance from an to and to denote the projection, i.e., and . For a nonempty closed set , the distance from an to and its projection are defined with respect to the Fröbenius norm:
[TABLE]
We next recall the definition of prox-regular sets; see [20, Exercise 13.31].
Definition 2.1** (Prox-regular sets).**
A closed set is prox-regular at for if there exist and such that whenever and with and , it holds that
[TABLE]
*Furthermore, is prox-regular at if it is prox-regular at for all . *
We now define the notion of pseudo-projection, which will be used in our subsequent discussions.
Definition 2.2** (Pseudo-projection).**
Let be a nonempty closed set, and . The pseudo-projection of onto with respect to is the collection of all satisfying:
- (a)
(Stationarity)* ; and* 2. (b)
(Function value improvement)* .*
Notice that any element of the pseudo-projection is a stationary point of the corresponding projection problem, i.e., it is a stationary point of the function . Also, each such element improves the function value of the corresponding projection problem relative to a given point . Pseudo-projection onto a nonempty closed set is always nonempty: indeed, in view of [20, Example 6.16] and [20, Proposition 6.5], we have for all and all .
For notational simplicity, we define linear operators () and as
[TABLE]
where , and () and are defined in (1). We now present some properties of the linear operators and .
Lemma 2.3**.**
For any ,
[TABLE]
Lemma 2.4**.**
For any , , it holds
[TABLE]
Proof 2.5**.**
Fix any , . According to the definition of adjoint, for any , we have
[TABLE]
*Then the conclusion follows from this and the arbitrariness of . This completes the proof. *
3 A hybrid penalty method
Notice that there are multiple rank constraints in (1), making it difficult to compute the projection onto the feasible set. To handle these constraints, one intuitive idea is to use a penalty method to “reduce” the number of constraints. Specifically, we replace some or all constraints by penalty functions which consist of penalty parameters and measures of constraint violation. However, approximate solutions returned by penalty methods are typically not feasible for (1). Although we can theoretically reduce constraint violation by increasing the weights in the penalty functions when feasibility is important (e.g., in applications such as system identification [10]), this strategy leads to high computational cost and numerical instability in practice. One way out would be to shift to a local refinement method after obtaining a moderately accurate solution by the penalty method.
Based on these intuitive ideas, our solution method will then consist of two stages: a penalty method, followed by a post-processing scheme. We will describe the penalty method in Section 3.1, the post-processing scheme in Section 3.2 and the hybrid penalty method and its convergence analysis in Section 3.3.
3.1 Stage 1: A penalty method
To describe the penalty method, we first rewrite (1) as follows, using notation in (2):
[TABLE]
This can be further equivalently rewritten as
[TABLE]
with three ways of setting , , and :
- •
Variant I: , and
[TABLE]
- •
Variant II: , () and
[TABLE]
- •
Variant III: , (), and
[TABLE]
Notice that for the above three variants, the projection onto has a closed-form solution. On the other hand, while the projection onto does not in general admit a closed-form solution, some kinds of stationary points of this projection problem can be approximately and efficiently obtained by some existing solvers such as SLRA [15], as we will show in Section 4, under mild assumptions.
Now we are ready to describe our penalty method. We first replace the constraints () in (3) by a penalty for violating the constraints to obtain the auxiliary function
[TABLE]
where is the penalty parameter. Then we approximately minimize the auxiliary function and update while decreasing .
Note that each term of the penalty function in (4) can be written as the Moreau envelope of indicator function . Using the DC decomposition of the Moreau envelope as in [8, Equation 6], we see that
[TABLE]
where is a smooth function and is a convex function with ; see [8, Equation 7]. Recall that the projection onto is easy to compute. Thus, for Variant III, in which , can be minimized via NPGmajor in [8, Algorithm 2]. However, for Variants I and II, the projection onto is not easy to compute. Fortunately, one can obtain some kind of stationary points for the corresponding projection problems via specific solvers: as we shall see in Section 4, such a point belongs to the set of pseudo-projection (see Definition 2.2) under mild assumptions. Thus, we propose a variant of NPGmajor as Algorithm 1, which we call vNPGmajor, where we replace the projection in the subproblem by pseudo-projection.
The well-definedness of (7), i.e., whether the line-search loop terminates after a finite number of iterations, will be discussed in Section 3.3.
3.2 Stage 2: Post-processing scheme
After we obtain an approximate solution by the penalty method, we shift to a post-processing method. A natural and simple choice for post-processing is the alternating projection method. Let
[TABLE]
In the classical alternating projection method, one has to find the global minimizers of the following problems in each iteration, for some .
[TABLE]
However, these problems are in general difficult to solve globally. Fortunately, as mentioned in Section 3.1, we can obtain some point in the set of pseudo-projection efficiently, under mild assumptions. Thus, we adopt the following alternating pseudo-projection method for post-processing: start at some and , let
[TABLE]
3.3 Hybrid penalty method for (1) and convergence analysis
The hybrid penalty method for solving (1), which consists of the penalty method discussed in Section 3.1 and the post-processing method discussed in Section 3.2, is presented as Algorithm 2.
For the rest of the section, we will analyze the convergence of the hybrid penalty method, including the convergence analysis for the penalty method in Section 3.3.2 and the convergence rate for the post-processing method in Section 3.3.3. Before proceeding, we first show that the criteria (7) and (12) are well-defined.
3.3.1 Well-definedness of (7) and (12)
The following theorem is about the well-definedness of the line-search criterion (7) and the termination criterion (12), i.e., they can be satisfied after finitely many number of inner iterations. The proof is similar to that in [8, Proposition 1].
Theorem 3.1**.**
*The line-search criterion (7) is well-defined. Moreover, is bounded. Furthermore, the termination criterion (12) for Algorithm 1 is well-defined. *
Proof 3.2**.**
We start by discussing the line-search criterion. First, we observe from (6) and Definition 2.2 that
[TABLE]
which is equivalent to
[TABLE]
Next, recall from the definition of and [8, Equation 7] that
[TABLE]
Using (14) and (15) together with , the -smoothness of and the convexity of gives (here, we let denote the Lipschitz continuity modulus of ):
[TABLE]
Thus, we see that (7) is satisfied whenever . From the definition of , this latter inequality must hold when satisfies , implying that the line-search criterion (7) is well-defined. Now, the boundedness of can be argued as in [8, Proposition 1].
Next, let be generated by Algorithm 1 starting at a in Step 2 of Algorithm 2. We show that the termination criteria (12) hold after finitely many iterations in Algorithm 1 (with in place of and in place of in (12)). First, from (7), it is easy to see that the second inequality in (12) holds. Moreover, using a similar line of arguments as in [24, Lemma 4], we can show that
[TABLE]
Thus, the first inequality in (12) also holds after a finite number of iterations in Algorithm 1. Finally, we note from (6) and Definition 2.2 that
[TABLE]
Using this together with the definition of in (5), we further obtain
[TABLE]
Combining this relation with (15) gives
[TABLE]
*This inequality together with (16) and the boundedness of shows that the third inequality in (12) holds after a finite number of iterations. This completes the proof. *
3.3.2 Convergence analysis for the penalty method in Algorithm 2
Notice that when , the penalty method in Algorithm 2 is exactly the same as [8, Algorithm 1]. Thus, we know from [8, Theorem 2] that the sequence is bounded and that, under some constraint qualifications, any accumulation point of sequence is a stationary point of (3).
We next estimate the violation of the constraints for the solution given by the penalty method in Algorithm 2 in the following theorem. It implies that the constraint violation can be suppressed by terminating the algorithm at a small .
Theorem 3.3**.**
Let be the sequence generated by the penalty method in Algorithm 2 for solving (3). Then we have for and that
[TABLE]
Proof 3.4**.**
Note from the nonnegativity of , the definition of , the second inequality in (12) and the choice of and that for ,
[TABLE]
*This completes the proof. *
3.3.3 Convergence analysis of the post-processing method in Algorithm 2
First, we present the following theorem which will be used later for the convergence analysis of the post-processing method in Algorithm 2.
Theorem 3.5**.**
*Let be defined as in (8). Then is prox-regular at any that satisfies . *
Proof 3.6**.**
First, we can rewrite as
[TABLE]
By [19, Corollary 2.3], we see that is prox-regular at if the following conditions hold:
- (a)
there is no in with ;
- (b)
for every , the set is prox-regular at for every with .
We will prove that the above two statements hold. First, we prove (a). Using and noting that by assumption, we have and hence , we see from [9, Proposition 3.6] that
[TABLE]
On the other hand, we see from Lemma 2.4 that for any with (), we have
[TABLE]
Suppose that there exists some with (). We then know from (17) and (18) that
[TABLE]
Now we fix any . Note from (19) and Lemma 2.3 that
[TABLE]
We claim that . To prove this, we establish the following equivalent statement: for each , all elements in the following set equal 0:
[TABLE]
First, it is easy to see from the equality in (20) that all elements in and are zero. Now we prove that every element in is zero by induction for each .
Suppose that there exists some so that every element in is zero. Let and be any two elements in with . We then know from the first inequality in (20) that the submatrix formed by , , and is singular. Since , we conclude that by the induction hypothesis. Consequently, there is at least one 0 in . By the arbitrariness of these two elements in , we see that there is at most one nonzero element in . This together with the equality in (20) implies that every element in equals 0. Thus, we have by induction. Since is arbitrary, we see further that . This proves that , which is equivalent to statement (a).
*Now we prove (b). Using , we know from [9, Proposition 3.8] that is prox-regular at . Then by the definition of prox-regularity, we see that (b) holds. This completes the proof. *
Since (13) involves the pseudo-projection instead of the actual projection, the post-processing method in Algorithm 2 is different from the classical alternating projection method. Nevertheless, we can still show that the post-processing method in Algorithm 2 has local linear convergence under commonly used assumptions for establishing local linear convergence of the alternating projection method (see, for example, the assumptions used in [7, Theorem 5.16] and [9, Theorem 4.2]). The proof follows the same line of arguments as in [7, Theorem 5.2]. We include the proof in the Appendix for the convenience of the readers.
Theorem 3.7**.**
Let and be defined as in (8) and suppose that there exists some such that and . Then for any initial points and near , any sequence generated by the following iterations converges to a point in -linearly:
[TABLE]
4 Subproblem: pseudo-projection
In this section, we consider the pseudo-projection subproblems (6) in Algorithm 1 and (13) in Algorithm 2. Recall that their corresponding projection problems can be put in the following general form:
[TABLE]
here, , and , , , and are given as in (23) or (24) below, corresponding to (9) and (10) respectively:
[TABLE]
The pseudo-projection problem corresponding to (22) can now be stated as follows: given and some reference point satisfying , compute
[TABLE]
In what follows, we will describe how such a can be obtained by the solver SLRA in [15]. Recall that SLRA was developed based on the following key observation:
[TABLE]
In view of this, algorithms were developed in [15] to approximately solve the following equivalent formulation of (22):
[TABLE]
where
[TABLE]
Notice that under the settings in (23) or (24), we have and hence (25) is an optimization problem in and the feasible set reduces to . We will show below in Section 4.1 that in (26) is smooth on . Thus, when gradient-based optimization methods such as those described in [15] are applied to solving (25), one obtains a stationary point of the following function:
[TABLE]
We will then discuss in Section 4.2 how an element of can be obtained from such a stationary point under mild assumptions.
4.1 Smoothness of
In this subsection, we will prove that is smooth on . We start with an auxiliary lemma.
Lemma 4.1**.**
*Consider (22) with setting (23) or (24). For any and any , if , then . *
Proof 4.2**.**
Assume that and satisfy . We need to show that .
We first consider (22) with setting (23). In this case, we have , , and . Notice that and . Write
[TABLE]
*and . Using Lemma 2.3, we obtain *
[TABLE]
Since , to show that , it suffices to show that the above has full column rank. To this end, we first note from that there is at least one nonzero element in . Let be the first integer in with . Then the submatrix of starting from the th row is lower triangular with all diagonal entries being . Consequently, this submatrix is nonsingular and thus has full column rank. This completes the proof for this case.
Now we consider (22) with setting (24). In this case, we have , , and with . Notice that and . Write
[TABLE]
where (). We then see from Lemma 2.4 that
[TABLE]
*Similar to the proof in setting (23), we can write the th block of as *
[TABLE]
Consequently, we have
[TABLE]
*Since , to prove that , we only need to show that the block diagonal matrix on the right-hand side of (28) has full column rank. But then it suffices to show that has full column rank, and this latter claim can be established by following a similar line of arguments as in the proof for setting (23). This completes the proof. *
Theorem 4.3**.**
*Consider (22) with setting (23) or (24). Then the function defined in (26) is smooth on . *
Proof 4.4**.**
In view of [22, Equation 5] and recall that (in both cases (23) and (24)), we only need to show that for any , the linear map defined as is surjective, or equivalently, is injective. To proceed, fix any and consider any with . Then we have for any that
[TABLE]
*Thus we have , which together with Lemma 4.1 implies that . This completes the proof. *
Since is smooth on , we can then apply standard gradient-based optimization methods to solving (25) and obtain a stationary point of in (27). We next discuss how one can obtain a pseudo-projection from such a stationary point.
4.2 Stationarity and improvement of function value
We discuss in this subsection how to obtain a pseudo-projection from a suitable stationary point of in (27), under mild assumptions. We start by showing how one can construct from a point satisfying the stationarity condition in Definition 2.2.
Theorem 4.5**.**
Consider (22) with setting (23) or (24). Let be a stationary point of in (27) and let achieve the infimum in (26) when . Then
[TABLE]
If in addition , then we have
[TABLE]
Proof 4.6**.**
First, we define
[TABLE]
Then we see from (27) and the definition of that
[TABLE]
On the other hand, we also have from the stationarity of that . Using this, (32) and [20, Theorem 10.13], we see further that
[TABLE]
Next, notice from Lemma 4.1 that for any , , and , the following implication holds:
[TABLE]
This corresponds to the linear independence constraint qualification for the following optimization problem:
[TABLE]
Using this, the definition of in (31), (33) and [20, Example 10.8], we deduce that there exist and a scalar such that the following Karash-Kuhn-Tucker conditions hold:
[TABLE]
Multiplying both sides of the second equation in (34) from the right by , and using the two equations in (35), we obtain and thus
[TABLE]
We now show that
[TABLE]
To proceed, recall that , which implies . According to [9, Proposition 3.6], in order to establish (37), it now remains to show that
[TABLE]
To this end, take any . Then we have in particular that . This together with (36) implies that . Thus, we must have and consequently . This proves (38) and hence (37). The desired relation (29) now follows immediately from (34) and (37).
Suppose in addition that . Then we have
[TABLE]
*where (a) follows from [9, Proposition 3.6] and the fact that proximal normal vectors are regular normal vectors [20, Example 6.16], (b) follows from [20, Theorem 10.6] and (c) follows from [20, Proposition 6.5]. This together with (29) proves (30). This completes the proof. *
We next show that if the stationary point of in (27) is obtained via a gradient-based descent optimization method with a suitably chosen initial point, then the that attains the infimum in (26) will satisfy the condition on function value improvement in Definition 2.2.
Theorem 4.7**.**
Consider (22) with setting (23) or (24). Let satisfy and let satisfy . Then for any with , we have
[TABLE]
*where attains the infimum in (26) when . *
Proof 4.8**.**
First, we see from and the definition of in (26) that . This together with the assumption and the fact that attains the infimum in (26) when shows that
[TABLE]
*This completes the proof. *
Remark 4.9** (Obtaining pseudo-projection in cases (23) or (24)).**
*Let satisfy and let satisfy . Then one can apply some standard gradient-based descent methods such as those implemented in SLRA [15] for solving (25) with as the initialization: these methods typically generate a sequence so that any accumulation point, say , is stationary for in (27) and satisfies . Suppose achieves the infimum in (26) when . Then we know from (30) in Theorem 4.5 and (39) in Theorem 4.7 that if holds, then . *
4.3 Conjecture related to Theorem 4.5
In this subsection, we revisit the assumption in Theorem 4.5. We would like to understand how likely such a condition is fulfilled by the that achieves the infimum in (26), with being a stationary point of in (27). Notice that if is indeed an optimal solution of , such a is an optimal solution of (22). Thus, we will first study whether when is an optimal solution of (22). Specifically, we make the following conjecture:
Conjecture 4.10**.**
Let be a positive integer. Suppose that satisfies the condition and let solve the following optimization problem:
[TABLE]
*Then we have . *
We do not know whether Conjecture 4.10 holds true for all positive numbers . However, we are able to prove that it holds true when .
Proposition 4.11**.**
*Conjecture 4.10 holds true when . *
Proof 4.12**.**
Since , we only need to show that there exists with and . First of all, since , we must have . We consider two cases:
[TABLE]
For case (i), we let when , and when . Then and
[TABLE]
Now we consider case (ii). Notice that there exists at least one nonzero element in because . Hence, there are at most distinct real roots for the polynomial equation . Let be a real number different from these roots. Then we have . Let
[TABLE]
Then and . Consequently,
[TABLE]
*This completes the proof. *
5 Numerical experiments
In this section, we will conduct numerical experiments for our hybrid penalty method, i.e., Algorithm 2. All numerical experiments are performed in Matlab R2019a on a 64-bit PC with 3.8 GHz Intel Core i5 Quad-Core and 8GB of DDR4 RAM.
We consider the following problem with two rank constraints:
[TABLE]
where , is the diagonal matrix so that equals when is odd, and equals when is even, , and are given positive integers, and and are known noisy signals.
Let HB_1, HB_2 and HB_3 represent the three hybrid penalty methods which solve (5) by Algorithm 2 via the reformulation (3) with Variant I, Variant II and Variant III discussed in Section 3.1 respectively. Let AP represent the alternating pseudo-projection algorithm (11) applied directly to the sets and defined in (8), constructed based on the data from (5).
Data generation: We set and consider two 3-tuples and . For each 3-tuple, we first randomly generate two signals and from two marginally stable linear time-invariant systems of order at most and respectively, which have common poles. Then we let and , where is the noise factor, and and are random vectors with i.i.d. standard Gaussian entries.
HB_1, HB_2 and HB_3: In Algorithm 1, we set , , , , , and for ,
[TABLE]
All pseudo-projection subproblems that arise are approximately solved by calling SLRA [15] with default setting (except that the is specified as in Remark 4.9). We terminate Algorithm 1 when the number of iterations exceeds 108 or
[TABLE]
For the penalty method in Algorithm 2, we set , with initial , and with initial . Let . We set the initial point for HB_1 and HB_2 as a pseudo-projection of onto and respectively, obtained by calling SLRA in [15] with default setting (the reference point is the origin). For HB_3, we set .
For the post-processing method in Algorithm 2, we also call SLRA in [15] with default settings to approximately compute a pseudo-projection (except that the is specified as in Remark 4.9), and terminate it when the number of iterations exceeds 105 or
[TABLE]
We output as the approximate solution.
AP: In this method, we start at and call SLRA in [15] with default setting (except that the is specified as in Remark 4.9) to approximately compute a pseudo-projection onto and defined in (8) (the initial reference points are the origin). We also output as the approximate solution.
Numerical results: In Figure 1, we compare the four methods AP, HB_1, HB_2 and HB_3 in terms of terminating function values over 100 random instances for and over 30 random instances for . 111For each 3-tuple, we first generate and as described above. For these two fixed signals, we generate 100 (and, resp., 30) random noisy signals and and solve the corresponding instances. One can see that while the three hybrid penalty methods HB_1, HB_2 and HB_3 have comparable performance, they always outperform AP.
In Figure 2, we compare the three hybrid penalty methods HB_1, HB_2 and HB_3 in terms of constraint violation (before and after post-processing) and CPU time over 30 random instances for . We measure constraint violation by , with vio given by
[TABLE]
where and are computed solutions, , , and . One can see that the post-processing scheme significantly reduces constraint violation. On the other hand, HB_2 is faster than HB_1 and HB_3.
6 Concluding remarks
In this paper, we propose a hybrid penalty method for solving (1). The hybrid penalty method consists of two parts: a penalty scheme which makes use of a special penalty function as in [8], and a post-processing method for reducing constraint violation. Both the penalty subproblems and the subproblems in the post-processing method involve the new concept of pseudo-projections: we discussed in Section 4 in detail how pseudo-projections can be computed efficiently by some existing software such as [15], under mild assumptions.
There are several open questions related to pseudo-projection computation. For instance, we still do not know how likely the condition holds for the that achieves the infimum in (26) (with being a stationary point of in (27)). 222In the numerical experiments in Section 5, the condition almost never fails for the solution returned by SLRA: For over 99.9% of our calls to SLRA, the th singular value of is significantly larger than its next singular value. Even assuming is a solution of (40), we can only establish when . The case for is still open.
Appendix A Proof of Theorem 3.7
Before proving Theorem 3.7, we first state two auxiliary lemmas without proofs. The proof of Lemma A.1 can be found in the first paragraph in the proof of [7, Theorem 5.16], and Lemma A.2 follows from Theorem 3.5 and the same argument as in the proof of [7, Theorem 5.16].
Lemma A.1**.**
Let and be defined as in (8), and define
[TABLE]
*where is the closed unit ball. Then if and only if . *
Lemma A.2**.**
Let and be defined as in (8). Suppose that there exists some such that and . Let be defined as in (42). Then for any , there exist some and such that
[TABLE]
[TABLE]
*where is the closed ball with centre and radius , and is the closed unit ball. *
We now prove Theorem 3.7. The proof follows the same line of arguments as in [7, Theorem 5.2].
Proof A.3**.**
Fix any with defined as in (42), and let and be given as in Lemma A.2. We first claim that
[TABLE]
where . To prove this, note from (21) and Definition 2.2 that
[TABLE]
If or , we then see from the second inequality in (47) that (45) holds trivially. Now we assume that and . We first notice from (47), and that
[TABLE]
Using (46), (48) and , we obtain further that
[TABLE]
Here, represents the closed unit ball and represents the closed ball with center and radius . Furthermore, we see from (43), (51) and (52) that
[TABLE]
On the other hand, in view of (48), (50) and (52), we can apply (44) with , and to obtain
[TABLE]
where the second inequality follows from (49). Adding (53) and (54), we obtain
[TABLE]
which proves (45).
Note from with and that . Choose initial points and such that . Next, we prove the following inequalities by induction:
[TABLE]
First, we prove that the above three inequalities hold for . Note from , the -update in (21) and the definition of that
[TABLE]
which proves (55) and (56) for . Then we see from , and (45) that
[TABLE]
which proves (57) for . To prove by induction, we assume that (55), (56) and (57) hold for some . We know from the -update, (55) and (57) that
[TABLE]
This together with (56) and (57) implies
[TABLE]
We then see from , and (45) that
[TABLE]
Thus, we proved (55), (56) and (57) for . This completes the induction.
Now we prove that the sequence is a Cauchy sequence. For any and , we know from (55) and (57) that
[TABLE]
Furthermore, by using (57), we have
[TABLE]
These prove that the sequence is a Cauchy sequence. Therefore, it converges to some and we have for any that
[TABLE]
*Thus the sequence converges -linearly. This completes the proof. *
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Chu, N. Del Buono, L. Lopez and T. Politi , On the low-rank approximation of data on the unit sphere , SIAM J. Matrix Anal. Appl., 27 (2005), pp. 46–60.
- 2[2] C. Eckart and G. Young , The approximation of one matrix by another of lower rank , Psychometrika, 1 (1936), pp. 211–218.
- 3[3] M. Fazel , Matrix Rank Minimization with Applications , Ph D thesis, Elec. Eng. Dept., Stanford University, 2002.
- 4[4] G. Golub and C. Van Loan , Matrix Computations , Johns Hopkins University Press, 1996.
- 5[5] M. Ishteva, K. Usevich and I. Markovsky , Factorization approach to structured low-rank approximation with applications , SIAM J. Matrix Anal. Appl., 35 (2014), pp. 1180–1204.
- 6[6] N. K. Karmarkar and Y. N. Lakshman , On approximate GC Ds of univariate polynomials , J. Symbolic Comput., 26 (1998), pp. 653–666.
- 7[7] A. S. Lewis, D. R. Luke and J. Malick , Local linear convergence for alternating and averaged nonconvex projections , Found. Comput. Math., 9 (2007), pp. 485–513.
- 8[8] T. Liu, T. K. Pong and A. Takeda , A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems , To appear in Math. Program., DOI:10.1007/s 10107-018-1327-8.
