A Dictionary Based Generalization of Robust PCA
Sirisha Rambhatla, Xingguo Li, Jarvis Haupt

TL;DR
This paper extends Robust PCA to handle data decompositions where the sparse component is represented in a known dictionary, providing theoretical guarantees and empirical validation for successful recovery.
Contribution
It introduces a unified convex demixing framework for low-rank plus dictionary-sparse components, covering both undercomplete and overcomplete dictionaries.
Findings
Successful recovery under mild assumptions up to a certain sparsity level.
Empirical phase transition results validate theoretical guarantees.
Framework applicable to various dictionary sizes.
Abstract
We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demixing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recovered under some relatively mild assumptions up to a certain sparsity level. Further, we corroborate our theoretical results by presenting empirical evaluations in terms of phase transitions in rank and sparsity for various dictionary sizes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Dictionary based Generalization of Robust PCA
Abstract
We analyze the decomposition of a data matrix, assumed to be a superposition of a low-rank component and a component which is sparse in a known dictionary, using a convex demixing method. We provide a unified analysis, encompassing both undercomplete and overcomplete dictionary cases, and show that the constituent components can be successfully recovered under some relatively mild assumptions up to a certain global sparsity level. Further, we corroborate our theoretical results by presenting empirical evaluations in terms of phase transitions in rank and sparsity for various dictionary sizes.
**Index Terms— ** Low-rank, dictionary sparse, Robust PCA.
1 Introduction
Exploiting the inherent structure of data for the recovery of relevant information is at the heart of data analysis. In this paper, we analyze a scenario where a data matrix arises as a result of a superposition of a rank- component , and a dictionary sparse component expressed here as . Here, is an a priori known dictionary with normalized columns, and is the unknown sparse coefficient matrix with at most total non-zeros. Specifically, we will study the following model,
[TABLE]
and identify the conditions under which the components and , can be successfully recovered, given and .
A wide range of problems can be expressed in the form described above. Perhaps the most celebrated of these is principal component analysis (PCA) [1], which can be viewed as a special case of eq.(1), with the matrix set to zero. In the absence of the component , the problem reduces to that of sparse recovery [2, 3, 4]; See [5] and references therein for an overview of related works. The popular framework of Robust PCA tackles a case when the dictionary is an identity matrix [6, 7]; variants include [8, 9, 10, 11]. In addition, other variants of Robust PCA, such as Outlier Pursuit [12], where and the sparse component is column sparse, and randomized adaptive sensing approaches [13, 14, 15, 16, 17], have also been explored.
Our work is most closely related to [18], which explores the application of the model shown in eq.(1) to detect traffic anomalies, and focuses on a case where the dictionary is overcomplete, i.e., fat. The model described therein, is applicable to a case where the rows of are orthogonal, e.g., , and the coefficient matrix , has at most nonzero elements per row and column. In this paper, we analyze a more general case, where we relax some of the aforementioned assumptions for the fat case, and develop an analogous analysis for the thin case. Specifically, this paper makes the following contributions towards guaranteeing the recovery of and in eq.(1). First, we analyze the thin case, where we assume to be a frame [19] with a global sparsity of at most ; See [20] for a brief overview of frames. Next, for the fat case, we extend the analysis presented in [18], and assume that the dictionary satisfies the restricted isometry property (RIP) of order , with a global sparsity of at most , and a column sparsity of at most . Consequently, we eliminate the sparsity constraint on the rows of the coefficient matrix and the orthogonality constraint on the rows of the dictionary required by [18].
The model shown in eq.(1) is propitious in a number of applications. For example, it can be used for target identification in hyperspectral imaging, and in topic modeling applications to identify documents with certain properties. Further, in source separation tasks, a variant of this model was used in singing voice separation in [21, 22]. Further, we can also envision source separation tasks where is not low-rank, but can in turn be modeled as being sparse in a known [23] or unknown [24] dictionary.
The rest of the paper is organized as follows. We formulate the problem, introduce the notation and describe various considerations on the structure of the component matrices in section 2. In section 3, we present our main result and a proof sketch, followed by numerical simulations in section 4. Finally, we conclude in section 5 with some insights on future work.
2 Problem Formulation
Our aim is to recover the low-rank component , and the sparse coefficient matrix , given the dictionary , and samples generated according to the model described in eq.(1). Utilizing the assumed structure of the components and , we consider the following convex problem for ,
[TABLE]
where, denotes the nuclear norm, and refers to the - norm, which serve as convex relaxations of rank and sparsity (i.e. -norm), respectively. Depending upon the number of dictionary elements, in , we analyze the problem described above for two cases – a) when , the thin case, and b) when , the fat case. For the thin case, we assume that the rows of the dictionary to comprise a frame, i.e. for any vector , we have
[TABLE]
where and are the lower and upper frame bounds, respectively, with . Next, for , the fat case, we assume that obeys the restricted isometry property (RIP) of order , i.e. for any -sparse vector , we have
[TABLE]
where, is the restricted isometry constant (RIC).
The aim of this paper is to answer the following question – Given , under what conditions can we recover and from the mixture ? We observe that there are a few ways we can run into trouble right away, namely– a) the dictionary sparse part is low-rank, and b) the low rank part, , is sparse in the dictionary, . Indeed, these relationships take the center stage in our analysis. We begin by defining a few relevant subspaces, similar to those used in [18], which will help us to formalize the said relationships.
Let the pair be the solution to the problem shown in eq.(2). We define as the linear space of matrices spanning the row and column spaces of the low-rank component . Specifically, let denote the singular value decomposition of , then the space is defined as
[TABLE]
Next, let be the space spanned by matrices that have the same support (location of non-zero elements) as , and let be defined as
[TABLE]
In addition, we denote the corresponding complements of the spaces described above by appending ‘’. Next, let the orthogonal projection operator(s) onto the space of matrice(s) defined above be , and , respectively. Further, we will use and to denote the projection matrices corresponding to the column and row spaces of , respectively, i.e., implying the following for any matrix ,
[TABLE]
As alluded to previously, there are indeed situations under which we cannot hope to recover the matrices and . To identify these scenarios, we will employ various notions of incoherence. We define the incoherence between the low-rank part, , and the dictionary sparse part, as,
[TABLE]
where is small when these components are incoherent (good for recovery). The next two measures of incoherence can be interpreted as a way to avoid the cases where for , (a) resembles the dictionary , and (b) resembles the sparse coefficient matrix . In this case, the low-rank part essentially mimics the dictionary sparse component. To this end, similar to [18], we define respectively the following to measure these properties,
[TABLE]
where . Also, we define .
3 Main Result
In this section, we present the conditions under which solving the problem stated in eq.(2) will successfully recover the true matrices and . As discussed in the previous section, the structure of the dictionary plays a crucial role in the analysis of the two paradigms, i.e. the thin case and the fat case. Consequently, we provide results corresponding to these cases separately. We begin by introducing a few definitions and assumptions applicable to both cases. To simplify the analysis we assume that . Specifically, we will assume that , where is a constant. In addition, our analysis is applicable to the case when .
Definition D.1**.**
We define
[TABLE]
where, and are defined as,
and
. Further, we define and as,
[TABLE]
Definition D.2**.**
[TABLE]
Assumption A.1**.**
Assumption A.2**.**
Let , then
[TABLE]
Assumption A.3**.**
For as above,
[TABLE]
Theorem 1**.**
Consider a superposition of a low-rank matrix of rank , and a dictionary sparse component , wherein the sparse coefficient matrix has at most non-zeros, i.e., , and , with parameters , , and . Then, solving the formulation shown in eq.(2) will recover matrices and if the following conditions hold for any , as defined in D.1 and D.2, respectively.
* For , the dictionary obeys the frame condition with frame bounds and assumptions A.1, and A.2 hold.*
* For , the dictionary obeys the RIP of order with RIC , a constant and assumptions A.1 and A.3 hold.*
Thm. 1 establishes the sufficient conditions for the existence of s to guarantee recovery of for both the thin and the fat case. For both cases, we see that the conditions are closely related to the various incoherence measures , and between the low-rank part, , the dictionary, , and the sparse component . Further, we observe that the theorem imposes an an upper-bound on the global sparsity, i.e., . This is similar to what was reported in [12], and seems a result of the deterministic analysis presented here. Further, the condition shown in assumption A.1, i.e., , translates to a relationship between rank , and the sparsity , namely,
[TABLE]
, for the thin case, and
[TABLE]
, for the fat case. These relationships are indeed what we observe in empirical evaluations; this will be revisited in the next section. We now present a brief proof sketch of the results presented in this section.
3.1 Proof Sketch
We use dual certificate construction procedure to prove the main result in Thm. 1 [25]. To this end, we start by constructing a dual certificate for the convex problem shown in eq.(2). In our analysis, we use for the spectral norm, here denotes the maximum singular value of the matrix , , and . The following lemma shows the conditions the dual certificate needs to satisfy.
Lemma 2** (from Lemma 2 in [18] and Thm. 3 in [12]).**
If there exists a dual certificate satisfying
[TABLE]
then the pair is the unique solution of eq (2).
We will now proceed with the construction of the dual certificate which satisfies the conditions outlined by conditions C1-4 by Lemma 2. Using the analysis similar to [18] (section V. B.), we construct the dual certificate as
[TABLE]
for arbitrary . The condition C2 then translates to
[TABLE]
Let and , then we can write the equation above as,
[TABLE]
Note that . Now, let , and let denote the rows of that correspond to support of , and correspond to the remaining rows of . Further, let be a length vector containing elements of corresponding to support of . Using these definitions and results, we conclude
[TABLE]
This implies that . Now, we look at the condition C3, i.e. , this is where our analysis departs from [18]; we write
[TABLE]
where we have used the fact that and . Now, as is the pseudo-inverse of , i.e., , we have that , where is the smallest singular value of . Therefore, we have
[TABLE]
To obtain an upper bound on , we will now present the following lemmata.
Lemma 3**.**
The lower bound on is given by
[TABLE]
Lemma 4**.**
Upper bound on is given by
[TABLE]
Assembling the results of the lemmata to obtain the upper bound on and consequently C3, we arrive at the expression for defined in D.2. Now, we move onto finding conditions under which C4 is satisfied by our dual certificate. For this we will bound . From eq.(16) in [18] we have,
[TABLE]
where . Our aim now will be to bound and for our case. For this, we present the following lemmata.
Lemma 5** (from eq.(17) in [18] ).**
The upper bound on is given by .
Lemma 6**.**
The upper bound on is given by , where is as defined in D.1.
Substituting these in eq.(7) and C4, we have
[TABLE]
The expression above is further upper bounded by due to C4. Here, and are as defined in D.1. Hence, we arrive at the following lower bound for ,
[TABLE]
Gleaning from the expressions for and , we observe that the following conditions need to be satisfied for the existence of s that can recover the desired matrices– a) , and b) . These conditions are satisfied by the assumptions A.1-A.3. This completes the proof.
4 Simulations
Our analysis in the previous section shows that depending upon the size of the dictionary , if the conditions of Thm. 1 are met, a convex program which solves eq.(2) will recover the components and . In this section, we empirically evaluate the claims of Thm. 1. To this end, we employ the accelerated proximal gradient algorithm outlined in Algorithm 1 of [18] to analyze the phase transition in rank and sparsity for different sizes of the dictionary . For our analysis, we consider the case where . Here, we generate the low-rank part by outer product of two random matrices of sizes and , with entries drawn from the standard normal distribution. In addition, the non-zero entries ( in number), of the sparse component , are drawn from the Rademacher distribution, also the dictionary is drawn from the standard normal distribution, then its columns are normalized. Phase transition in rank and sparsity over trials for dictionaries of sizes (thin) and (fat), corresponding to our theoretical results, and for all admissible levels of sparsity are shown in Fig. 1 and Fig. 2, respectively.
Fig. 1 shows the recovery of the low-rank part in panels (a-b), while panels (c-d) show the recovery for the sparse component , for , respectively. Next, panels (e-f) show the region of overlap between the low-rank recovery and sparse recovery plots, for and , respectively. This corresponds to the region in which both and are recovered successfully. Further, the red line in panels (a) and (b) show the trend predicted by our analysis, i.e., eq.(5) and eq.(6), with the parameters hand-tuned for best fit. Indeed, the empirical relationship between rank and sparsity has the same trend as predicted by Thm. 1.
Similarly, Fig. 2 shows the recovery of the low-rank part in panels (a-b), while panels (c-d) show the recovery for the sparse component for , respectively, for a much wider range of global sparsity . Indeed, these phase transition plots show that we can successfully recover the components for sparsity levels much greater than those put forth by the theorem. This can be attributed to the deterministic analysis presented here. To this end, we conjecture that a randomized analysis of the problem can potentially improve the results presented here.
5 Conclusions
We analyze a dictionary based generalization of Robust PCA. Specifically, we extend the theoretical guarantees presented in [18] to a setting wherein the dictionary may have arbitrary number of columns, and the coefficient matrix has global sparsity of , i.e. . We generalize the results by assuming to be a frame for the thin case, to obey the RIP condition for the fat case, and eliminate the orthogonality constraints on the rows of the dictionary and the sparsity constraint on the rows of the coefficient matrix (as in [18]), rendering the results useful for a potentially wide range of applications. Further, we provide empirical evaluations via phase transition plots in rank and sparsity corresponding to our theoretical results. We draw motivations from the promising phase transition plots, beyond the sparsity level tolerated by our analysis, and propose randomized analysis of the problem to improve the upper-bound on the sparsity as a future work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] I. Jolliffe, Principal component analysis , Wiley Online Library, 2002.
- 2[2] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM journal on computing , vol. 24, no. 2, pp. 227–234, 1995.
- 3[3] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Transactions on Information Theory , vol. 47, no. 7, pp. 2845–2862, 2001.
- 4[4] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Transactions on Information Theory , vol. 51, no. 12, pp. 4203–4215, 2005.
- 5[5] H. Rauhut, “Compressive sensing and structured random matrices,” Theoretical foundations and numerical methods for sparse recovery , vol. 9, pp. 1–92, 2010.
- 6[6] E. J. Candès, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of the ACM (JACM) , vol. 58, no. 3, pp. 11, 2011.
- 7[7] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky, “Rank-sparsity incoherence for matrix decomposition,” SIAM Journal on Optimization , vol. 21, no. 2, pp. 572–596, 2011.
- 8[8] Z. Zhou, X. Li, J. Wright, E. J. Candès, and Y. Ma, “Stable principal component pursuit,” in Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on . IEEE, 2010, pp. 1518–1522.
