A Dictionary-Based Generalization of Robust PCA with Applications to Target Localization in Hyperspectral Imaging
Sirisha Rambhatla, Xingguo Li, Jineng Ren, and Jarvis Haupt

TL;DR
This paper introduces a convex demixing method for decomposing data matrices into low-rank and dictionary-sparse components, enabling effective target localization in hyperspectral images by leveraging spectral signatures.
Contribution
It presents a unified theoretical framework for dictionary-based robust PCA accommodating both undercomplete and overcomplete dictionaries, with analysis of recovery conditions.
Findings
Successful recovery of constituent matrices under mild conditions.
Effective target localization in hyperspectral imaging using spectral signatures.
Experimental validation demonstrating the approach's advantages.
Abstract
We consider the decomposition of a data matrix assumed to be a superposition of a low-rank matrix and a component which is sparse in a known dictionary, using a convex demixing method. We consider two sparsity structures for the sparse factor of the dictionary sparse component, namely entry-wise and column-wise sparsity, and provide a unified analysis, encompassing both undercomplete and the overcomplete dictionary cases, to show that the constituent matrices can be successfully recovered under some relatively mild conditions on incoherence, sparsity, and rank. We leverage these results to localize targets of interest in a hyperspectral (HS) image based on their spectral signature(s) using the a priori known characteristic spectral responses of the target. We corroborate our theoretical results and analyze target localization performance of our approach via experimental evaluations and…
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 4 | 0.01 | D-RPCA(E) | 0.300 | 0.979 | 0.023 | 0.989 |
| RPCA† | 0.650 | 0.957 | 0.049 | 0.974 | ||
| MF∗ | N/A | 0.957 | 0.036 | 0.994 | ||
| MF | N/A | 0.914 | 0.104 | 0.946 | ||
| 0.1 | D-RPCA(E) | 0.800 | 0.989 | 0.017 | 0.997 | |
| RPCA† | 0.800 | 0.989 | 0.014 | 0.997 | ||
| MF | N/A | 0.989 | 0.016 | 0.998 | ||
| MF† | N/A | 0.989 | 0.010 | 0.998 | ||
| 0.5 | D-RPCA(E) | 0.600 | 0.968 | 0.031 | 0.991 | |
| RPCA† | 0.600 | 0.935 | 0.067 | 0.988 | ||
| MF | N/A | 0.548 | 0.474 | 0.555 | ||
| MF | N/A | 0.849 | 0.119 | 0.939 | ||
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 4 | 0.01 | D-RPCA(E) | 0.300 | 0.979 | 0.023 | 0.989 |
| RPCA† | 0.650 | 0.957 | 0.049 | 0.974 | ||
| MF∗ | N/A | 0.957 | 0.036 | 0.994 | ||
| MF | N/A | 0.914 | 0.104 | 0.946 | ||
| 0.1 | D-RPCA(E) | 0.800 | 0.989 | 0.017 | 0.997 | |
| RPCA† | 0.800 | 0.989 | 0.014 | 0.997 | ||
| MF | N/A | 0.989 | 0.016 | 0.998 | ||
| MF† | N/A | 0.989 | 0.010 | 0.998 | ||
| 0.5 | D-RPCA(E) | 0.600 | 0.968 | 0.031 | 0.991 | |
| RPCA† | 0.600 | 0.935 | 0.067 | 0.988 | ||
| MF | N/A | 0.548 | 0.474 | 0.555 | ||
| MF | N/A | 0.849 | 0.119 | 0.939 | ||
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 10 | 0.01 | D-RPCA(E) | 0.600 | 0.935 | 0.060 | 0.972 |
| RPCA† | 0.700 | 0.978 | 0.023 | 0.990 | ||
| MF∗ | N/A | 0.624 | 0.415 | 0.681 | ||
| MF | N/A | 0.569 | 0.421 | 0.619 | ||
| 0.1 | D-RPCA(E) | 0.500 | 0.968 | 0.029 | 0.993 | |
| RPCA† | 0.500 | 0.871 | 0.144 | 0.961 | ||
| MF∗ | N/A | 0.688 | 0.302 | 0.713 | ||
| MF† | N/A | 0.527 | 0.469 | 0.523 | ||
| 0.5 | D-RPCA(E) | 1.000 | 0.978 | 0.031 | 0.996 | |
| RPCA† | 2.200 | 0.849 | 0.113 | 0.908 | ||
| MF | N/A | 0.807 | 0.309 | 0.781 | ||
| MF | N/A | 0.527 | 0.465 | 0.539 | ||
| Method | Threshold | Performance at best operating point | AUC | ||
| TPR | FPR | ||||
| 15 | D-RPCA(E) | 0.300 | 0.989 | 0.021 | 0.998 |
| RPCA† | 3.000 | 0.849 | 0.146 | 0.900 | |
| MF | N/A | 0.957 | 0.085 | 0.978 | |
| MF† | N/A | 0.796 | 0.217 | 0.857 | |
| Method | TPR | FPR | AUC | |||
|---|---|---|---|---|---|---|
| Mean | St.Dev. | Mean | St.Dev. | Mean | St.Dev. | |
| D-RPCA(E) | 0.972 | 0.019 | 0.030 | 0.014 | 0.991 | 0.009 |
| RPCA† | 0.919 | 0.061 | 0.079 | 0.055 | 0.959 | 0.040 |
| MF | 0.796 | 0.179 | 0.234 | 0.187 | 0.814 | 0.178 |
| MF† | 0.739 | 0.195 | 0.258 | 0.192 | 0.775 | 0.207 |
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 4 | 0.01 | D-RPCA(C) | 0.905 | 0.989 | 0.014 | 0.998 |
| OP† | 0.895 | 0.989 | 0.015 | 0.998 | ||
| MF∗ | N/A | 0.656 | 0.376 | 0.611 | ||
| MF | N/A | 0.624 | 0.373 | 0.639 | ||
| 0.1 | D-RPCA(C) | 0.805 | 0.989 | 0.013 | 0.998 | |
| OP | 1.100 | 0.720 | 0.349 | 0.682 | ||
| MF∗ | N/A | 0.742 | 0.256 | 0.780 | ||
| MF† | N/A | 0.828 | 0.173 | 0.905 | ||
| 0.5 | D-RPCA(C) | 1.800 | 0.989 | 0.010 | 0.998 | |
| OP† | 1.300 | 0.989 | 0.012 | 0.998 | ||
| MF | N/A | 0.548 | 0.474 | 0.556 | ||
| MF | N/A | 0.849 | 0.146 | 0.939 | ||
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 4 | 0.01 | D-RPCA(C) | 0.905 | 0.989 | 0.014 | 0.998 |
| OP† | 0.895 | 0.989 | 0.015 | 0.998 | ||
| MF∗ | N/A | 0.656 | 0.376 | 0.611 | ||
| MF | N/A | 0.624 | 0.373 | 0.639 | ||
| 0.1 | D-RPCA(C) | 0.805 | 0.989 | 0.013 | 0.998 | |
| OP | 1.100 | 0.720 | 0.349 | 0.682 | ||
| MF∗ | N/A | 0.742 | 0.256 | 0.780 | ||
| MF† | N/A | 0.828 | 0.173 | 0.905 | ||
| 0.5 | D-RPCA(C) | 1.800 | 0.989 | 0.010 | 0.998 | |
| OP† | 1.300 | 0.989 | 0.012 | 0.998 | ||
| MF | N/A | 0.548 | 0.474 | 0.556 | ||
| MF | N/A | 0.849 | 0.146 | 0.939 | ||
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 10 | 0.01 | D-RPCA(C) | 0.800 | 0.946 | 0.016 | 0.993 |
| OP† | 1.300 | 0.946 | 0.060 | 0.988 | ||
| MF∗ | N/A | 0.946 | 0.060 | 0.987 | ||
| MF | N/A | 0.527 | 0.468 | 0.511 | ||
| 0.1 | D-RPCA(C) | 0.550 | 0.979 | 0.029 | 0.997 | |
| OP† | 0.800 | 0.893 | 0.112 | 0.928 | ||
| MF∗ | N/A | 0.688 | 0.302 | 0.714 | ||
| MF† | N/A | 0.527 | 0.470 | 0.523 | ||
| 0.5 | D-RPCA(C) | 1.400 | 0.989 | 0.037 | 0.997 | |
| OP† | 0.800 | 0.807 | 0.148 | 0.847 | ||
| MF | N/A | 0.807 | 0.309 | 0.781 | ||
| MF | N/A | 0.527 | 0.468 | 0.539 | ||
| Method | Threshold | Performance at best operating point | AUC | ||
| TPR | FPR | ||||
| 15 | D-RPCA(C) | 0.800 | 0.989 | 0.018 | 0.998 |
| OP† | 2.200 | 0.882 | 0.126 | 0.900 | |
| MF | N/A | 0.957 | 0.085 | 0.978 | |
| MF† | N/A | 0.796 | 0.217 | 0.857 | |
| Method | TPR | FPR | AUC | |||
|---|---|---|---|---|---|---|
| Mean | St.Dev. | Mean | St.Dev. | Mean | St.Dev. | |
| D-RPCA(C) | 0.981 | 0.016 | 0.020 | 0.010 | 0.997 | 0.002 |
| OP† | 0.889 | 0.099 | 0.117 | 0.115 | 0.906 | 0.114 |
| MF | 0.763 | 0.151 | 0.266 | 0.149 | 0.772 | 0.166 |
| MF† | 0.668 | 0.151 | 0.331 | 0.148 | 0.702 | 0.192 |
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 30 | 0.01 | D-RPCA(E) | 0.150 | 0.989 | 0.015 | 0.992 |
| RPCA† | 0.700 | 0.849 | 0.146 | 0.925 | ||
| MF | N/A | 0.929 | 0.073 | 0.962 | ||
| MF† | N/A | 0.502 | 0.498 | 0.498 | ||
| 0.1 | D-RPCA(E) | 0.050 | 0.982 | 0.019 | 0.992 | |
| RPCA† | 3.000 | 0.638 | 0.374 | 0.664 | ||
| MF | N/A | 0.979 | 0.053 | 0.986 | ||
| MF† | N/A | 0.620 | 0.381 | 0.660 | ||
| 0.5 | D-RPCA(E) | 0.080 | 0.982 | 0.019 | 0.992 | |
| RPCA† | 2.500 | 0.635 | 0.381 | 0.671 | ||
| MF | N/A | 0.980 | 0.159 | 0.993 | ||
| MF | N/A | 0.555 | 0.447 | 0.442 | ||
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 30 | 0.01 | D-RPCA(E) | 0.150 | 0.989 | 0.015 | 0.992 |
| RPCA† | 0.700 | 0.849 | 0.146 | 0.925 | ||
| MF | N/A | 0.929 | 0.073 | 0.962 | ||
| MF† | N/A | 0.502 | 0.498 | 0.498 | ||
| 0.1 | D-RPCA(E) | 0.050 | 0.982 | 0.019 | 0.992 | |
| RPCA† | 3.000 | 0.638 | 0.374 | 0.664 | ||
| MF | N/A | 0.979 | 0.053 | 0.986 | ||
| MF† | N/A | 0.620 | 0.381 | 0.660 | ||
| 0.5 | D-RPCA(E) | 0.080 | 0.982 | 0.019 | 0.992 | |
| RPCA† | 2.500 | 0.635 | 0.381 | 0.671 | ||
| MF | N/A | 0.980 | 0.159 | 0.993 | ||
| MF | N/A | 0.555 | 0.447 | 0.442 | ||
| Method | Threshold | Performance at best operating point | AUC | ||
|---|---|---|---|---|---|
| TPR | FPR | ||||
| 60 | D-RPCA(E) | 0.060 | 0.986 | 0.016 | 0.995 |
| RPCA† | 1.000 | 0.799 | 0.279 | 0.793 | |
| MF | N/A | 0.980 | 0.011 | 0.994 | |
| MF† | N/A | 0.644 | 0.355 | 0.700 | |
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 30 | 0.01 | D-RPCA(C) | 0.065 | 0.990 | 0.015 | 0.991 |
| OP† | 0.800 | 0.7581 | 0.3473 | 0.705 | ||
| MF | N/A | 0.929 | 0.073 | 0.962 | ||
| MF† | N/A | 0.502 | 0.50 | 0.498 | ||
| 0.1 | D-RPCA(C) | 0.070 | 0.996 | 0.022 | 0.994 | |
| OP† | 0.100 | 0.989 | 0.3312 | 0.904 | ||
| MF | N/A | 0.979 | 0.053 | 0.986 | ||
| MF† | N/A | 0.62 | 0.3814 | 0.66 | ||
| 0.5 | D-RPCA(C) | 0.035 | 0.983 | 0.017 | 0.995 | |
| OP† | 0.200 | 0.940 | 0.264 | 0.887 | ||
| MF | N/A | 0.980 | 0.160 | 0.993 | ||
| MF | N/A | 0.555 | 0.447 | 0.442 | ||
| Method | Threshold | Performance at best operating point | AUC | |||
| TPR | FPR | |||||
| 30 | 0.01 | D-RPCA(C) | 0.065 | 0.990 | 0.015 | 0.991 |
| OP† | 0.800 | 0.7581 | 0.3473 | 0.705 | ||
| MF | N/A | 0.929 | 0.073 | 0.962 | ||
| MF† | N/A | 0.502 | 0.50 | 0.498 | ||
| 0.1 | D-RPCA(C) | 0.070 | 0.996 | 0.022 | 0.994 | |
| OP† | 0.100 | 0.989 | 0.3312 | 0.904 | ||
| MF | N/A | 0.979 | 0.053 | 0.986 | ||
| MF† | N/A | 0.62 | 0.3814 | 0.66 | ||
| 0.5 | D-RPCA(C) | 0.035 | 0.983 | 0.017 | 0.995 | |
| OP† | 0.200 | 0.940 | 0.264 | 0.887 | ||
| MF | N/A | 0.980 | 0.160 | 0.993 | ||
| MF | N/A | 0.555 | 0.447 | 0.442 | ||
| Method | Threshold | Performance at best operating point | AUC | ||
|---|---|---|---|---|---|
| TPR | FPR | ||||
| 60 | D-RPCA(C) | 0.020 | 0.993 | 0.022 | 0.994 |
| OP† | 0.250 | 0.963 | 0.264 | 0.907 | |
| MF | N/A | 0.980 | 0.011 | 0.994 | |
| MF† | N/A | 0.644 | 0.355 | 0.700 | |
| Matrices | |
|---|---|
| The data matrix | |
| The low-rank matrix with rank- and singular value decomposition | |
| The known dictionary either thin () or fat () | |
| The sparse component with the following properties –(1) in case of entry-wise sparsity: non-zero entries and when has at most non-zeros per column, and (2) in case of column-wise sparsity: non-zero columns | |
| Regularization Parameters | |
| The regularization parameter for the entry-wise sparsity case | |
| The regularization parameter for the column sparsity case | |
| Subspaces | |
| The set of matrices which span the same column or row space as , i.e., for or . | |
| The set of matrices with the same support as (for the entry-wise sparse case). | |
| The set of matrices with the same column support as (for the column-wise sparse case). | |
| The set of matrices whose columns span the subspace spanned by columns of , i.e. | |
| The column space of | |
| The row space of | |
| Index Sets | |
| Support of matrix (entry-wise case) | |
| Column support of matrix (the outliers) | |
| Index set of the inliers (column-wise case) | |
| Projection | |
| Projection operator corresponding to any subspace | |
| Projection matrix corresponding to the operator | |
| Parameters for analysis | |
| The incoherence parameter between the low-rank component and the dictionary, defined as | |
| Defined as | |
| Defined as | |
| Defined as | |
| Defined as | |
| Defined as | |
| Lower generalized frame bound | |
| Upper generalized frame bound | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\externaldocument
Appendix_final
A Dictionary-Based Generalization of Robust PCA with Applications to Target Localization in Hyperspectral Imaging
Sirisha Rambhatla, Xingguo Li, Jineng Ren and Jarvis Haupt
Department of Electrical and Computer Engineering,
University of Minnesota – Twin Cities, Minneapolis, MN-55455
{rambh002, lixx1661, renxx282, jdhaupt}@umn.edu.
Sirisha Rambhatla†, Xingguo Li‡, Jineng Ren†, and
Jarvis Haupt† †Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, 55455, USA, e-mail: {rambh002, renxx282, jdhaupt}@umn.edu, respectively. ‡Computer Science Department, Princeton University, Princeton, NJ 08540, USA, email: [email protected]. The work was done when S. Rambhatla was at the University of Minnesota-Twin Cities.This work was supported by the DARPA YFA, Grant N66001-14-1-4047. Preliminary versions appeared in the proceedings of the 2016 IEEE Global Conference on Signal & Information Processing (GlobalSIP), 2017 Asilomar Conference on Signals, Systems, & Computers, and the 2018 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP).
Abstract
We consider the decomposition of a data matrix assumed to be a superposition of a low-rank matrix and a component which is sparse in a known dictionary, using a convex demixing method. We consider two sparsity structures for the sparse factor of the dictionary sparse component, namely entry-wise and column-wise sparsity, and provide a unified analysis, encompassing both undercomplete and the overcomplete dictionary cases, to show that the constituent matrices can be successfully recovered under some relatively mild conditions on incoherence, sparsity, and rank. We leverage these results to localize targets of interest in a hyperspectral (HS) image based on their spectral signature(s) using the a priori known characteristic spectral responses of the target. We corroborate our theoretical results and analyze target localization performance of our approach via experimental evaluations and comparisons to related techniques.
Index Terms:
Low-rank, dictionary learning, target localization, Robust PCA, hyperspectral imaging, sparse representation.
I Introduction
Leveraging the structure of a given dataset is at the heart of machine learning and data analysis tasks. A priori knowledge about the structure often makes the problem well-posed, leading to improvements in the solutions. Perhaps the most common of these, one that is often encountered in practice, is approximate low-rankness of the dataset, which is exploited by the popular principal component analysis (PCA)[1]. The low-rank structure encapsulates the model assumption that the data in fact spans a lower dimensional subspace than the ambient dimension of the data. However, in a number of applications, the data may not be inherently low-rank, but may be decomposed as a superposition of a low-rank component, and a component which has a sparse representation in a known dictionary. This scenario is encountered in target identification applications in hyperspectral (HS) imaging [2, 3], where the a priori knowledge of the target signatures (dictionary), can be leveraged for localization.
Hyperspectral (HS) imaging is an imaging modality which senses the intensities of the reflected electromagnetic waves (responses) corresponding to different wavelengths of the electromagnetic spectra, often invisible to the human eye. As the spectral response associated with an object/material is dependent on its composition, HS imaging can be used to identify the said target objects/materials via their characteristic spectra or signature responses, also referred to as endmembers.
Typical applications of HS imaging range from monitoring agricultural use of land, catchment areas of rivers and water bodies, food processing, surveillance, and climate science applications, to detecting various minerals, chemicals, and for presence of life sustaining compounds on distant planets; see [4, 5, 6], and references therein for details. However, these spectral signatures are often highly correlated, which makes it difficult to detect regions of interest.
In this work, we present two techniques for target localization in HS images by posing it as a matrix demixing task. Here, we first analyze a matrix demixing problem where a data matrix is assumed to be formed via a superposition of a low-rank component of rank- for , and a dictionary sparse part . Here, the matrix is an a priori known dictionary, and is an unknown sparse coefficient matrix. Specifically, we will study the following model for :
[TABLE]
and identify the conditions under which components and can be recovered given and by solving appropriate convex formulations. We then leverage these theoretical results for the target localization task in HS images; see Section VI.
We consider the demixing problem described above for two different sparsity models on the matrix . First, we consider a case where has at most total non-zero entries (entry-wise sparse case), and second where has non-zero columns (column-wise sparse case). To this end, we develop the conditions under which solving
[TABLE]
for the entry-wise sparsity case, and
[TABLE]
for the column-wise sparse case, will recover and for regularization parameters and , respectively, given the data and the dictionary . The known dictionary here can be overcomplete (fat, i.e., ) or undercomplete (thin, i.e., ). Here, “D-RPCA” refers to “Dictionary based Robust Principal Component Analysis”, while “E” and “C” indicate the entry-wise and column-wise sparsity patterns, respectively. In addition, , , and refer to the nuclear norm, - norm of the vectorized matrix, and norm (sum of column norms), respectively, which serve as convex relaxations of rank, sparsity, and column sparsity inducing regularization, respectively.
These two types of sparsity patterns capture different structural properties of the dictionary sparse component. The entry-wise sparsity model allows individual data points to span low-dimensional subspaces, still allowing the dataset to span the entire space. While in the column-wise sparsity setting, the component is also column-wise sparse. As a result, this model effectively captures the structured (dictionary dependent) corruptions in the otherwise low-rank structured columns of . Note that the columns of are not restricted to be sparse in the column-wise sparsity model.
I-A Background
A wide range of problems can be expressed in the form described in (1). Perhaps the most celebrated of these is principal component analysis (PCA) [1], which can be viewed as a special case of (1), with the matrix set to zero. In the absence of , the problem reduces to that of sparse recovery [7, 8, 9]; see [10] and references therein for an overview of related works. Further, the popular framework of Robust PCA tackles a case when the dictionary is identity [11, 12], i.e., for an identity matrix , Outlier Pursuit (OP) [13] ( and is column-wise sparse,) and others [14, 15, 16, 17, 18, 19, 20, 21, 22].
The model in (1) is also closely related to the one in [23], which explores the overcomplete dictionary setting with applications to network traffic anomaly detection. However, the analysis therein applies to a case where the is overcomplete with orthogonal rows, and the coefficient matrix has a small number of non-zero elements per row and column, which may be restrictive assumptions in some applications. To this end, in recent works we analyze the extension of [23] to include a case where the dictionary has more rows than columns, i.e., is thin, while removing the orthogonality constraint for both the thin and the fat dictionary cases, for entry-wise sparsity [24] and column-wise sparsity [3] cases, respectively.
In particular, the entry-wise case (1) is propitious in a number of applications. For example, it can be used for target identification in hyperspectral imaging [2, 3], and in topic modeling applications to identify documents with certain properties, on similar lines as [25]. Further, in source separation tasks, a variant of this model was used in singing voice separation in [26, 27]. In addition, we can also envision source separation tasks where is not low-rank, but can in turn be modeled as being sparse in a known [28] or unknown [29] dictionary. The column-wise setting, model (1) is also closely related to outlier identification [13, 18, 19, 30], which is motivated by a number of contemporary “big data” applications. Here, the sparse matrix (known as “outliers”) can be used to identify malicious responses in collaborative filtering applications [31], finding anomalous patterns in network traffic [32] or estimating visually salient regions of images [33, 34, 35]; see also [36].
In Section VI we also analyze and demonstrate the application of the model shown in (1) for a hyperspectral (HS) demixing task. HS image analysis using sparse recovery-based techniques were explored in [37, 38, 39, 40]. Applications of compressive sampling have been explored in [41, 42], while [43] analyzes the case where HS images are noisy and incomplete. Further, in a recent work [44], the authors study a case where is absent and the sparse matrix is also low-rank for the demixing task (1). However, the techniques discussed above focus on identifying all materials in a given HS image. Although sparsity-based target detection was considered in [45, 46, 47, 48], the approaches use training samples from both background and the targets for detection, while possessing no recovery guarantees. However, for target localization, the task is to identify only specific target(s) in a given HS image, while the background may be unknown/irrelevant. As a result, there is a need for techniques which localize targets based on their a priori known spectral signatures; see also [49] and [50].
I-B Our Contributions
As described above, we propose and analyze a dictionary based generalization of robust PCA as shown in (1). Here, we consider two distinct sparsity patterns of , i.e., entry-wise and column-wise sparse , arising from different structural assumptions on the dictionary sparse component. Our specific contributions are summarized below.
Entry-wise case:
We make the following contributions towards guaranteeing the recovery of and via the convex optimization problem in D-RPCA(E). First, we analyze the thin case (i.e. ), where we assume that the matrix has at most non-zero elements globally, i.e.,, where represents the number of non-zero entries in . Next, for the fat case, we first extend the analysis presented in [23] to eliminate the orthogonality constraint on the rows of the dictionary . Further, we relax the sparsity constraints required by [23] on rows and columns of the sparse coefficient matrix , to study the case when with at most non-zero elements per column [24]. Hence, we provide a unified analysis for both the thin and the fat case, making the model (1) amenable to a wide range of applications.
Column-wise case: We propose and analyze a dictionary based generalization of Outlier Pursuit (OP) [13], wherein the coefficient matrix admits a column sparse structure, referred to as “outliers”; see also [3]. Note that, in this case there is an inherent ambiguity regarding the recovery of the true component pair corresponding to the low-rank part and the dictionary sparse component, respectively. Specifically, any pair satisfying , where and have the same column space, and and have the identical column support, is a solution of D-RPCA(C). To this end, we develop the sufficient conditions for the convex optimization task D-RPCA(C) to recover the column space of the low-rank component , while identifying the outlier columns of ; see Section II-A for details. Here, the difference between D-RPCA(C) and OP being the inclusion of the known dictionary. Next, we demonstrate the advantages of leveraging the knowledge of the dictionary via phase transitions in rank and sparsity for recovery of the outlier columns. Specifically, we show that as compared to OP, D-RPCA(C) works for potentially higher ranks of , when is a fixed proportion of .
The thin dictionary case – an interesting result: [23] suggests that when the dictionary is thin, i.e., , one can envision a pseudo-inverse based technique wherein we pre-multiply both sides in (1) with the Moore-Penrose pseudo-inverse , i.e., (this is not applicable for the fat case due to the non-trivial null space). This operation leads to a formulation which resembles the robust PCA (RPCA) [11, 12] model for the entry-wise case and Outlier Pursuit (OP) [13] for the column-wise case, i.e.,
[TABLE]
[TABLE]
An interesting finding of our work is that although this transformation algebraically reduces the entry-wise and column-wise sparsity cases to Robust PCA and OP settings, respectively, the specific model assumptions of Robust PCA and OP may not hold for all choices of dictionary size and rank . Specifically, we find that in cases where , this pre-multiplication may not lead to a “low-rank” . This suggests that the notion of “low” or “high” rank is relative to the maximum possible rank of , which in this case is . Therefore, if , can be full-rank, and the low-rank assumptions of RPCA and OP may no longer hold. As a result, these two models (the pseudo inversed case and the current work) cannot be used interchangeably for the thin dictionary case. We corroborate these via experimental evaluations presented in Section V111The code is made available at github.com/srambhatla/Diction ary-based-Robust-PCA, and the results are reproducible..
Techniques for HS demixing: Building on our theoretical results, we present two techniques for target detection in a HS image, depending upon different sparsity assumptions on the matrix . Our techniques operate by forming the dictionary using the a priori known spectral signatures of the target of interest, and leveraging the approximate low-rank structure of the data matrix [24, 3, 2]. We then analyze the performance of these techniques via extensive experimental evaluations on real-world demixing tasks over different datasets and dictionary choices, and compare the performance of the proposed techniques with related works.
The choice of a particular sparsity model, i.e., entry-wise and column-wise for this task depends on the properties of the dictionary matrix . In particular, if the target signature admits a sparse representation in the dictionary, entry-wise sparsity structure is preferred. This is likely to be the case when the dictionary is overcomplete () or fat, and also when the target spectral responses admit a sparse representation in the dictionary. On the other hand, the column-wise sparsity structure is amenable to cases where the representation can use all columns of the dictionary. This potentially arises in the cases when the dictionary is undercomplete () or thin. Note that, in the column-wise sparsity case, the non-zero columns need not be sparse themselves. The applicability of these two modalities is also exhibited in our experimental analysis; see Section VI-B for further details.
Demixing Despite Correlated Signatures: Since the spectral signatures of even distinct classes are highly correlated to each other this demixing task is particularly challenging. For instance, we plot the spectral signatures of different classes of the “Indian Pines” Dataset [51] in Fig. 1. The shaded region here shows the upper and lower ranges of different classes. For instance, in Fig. 1 we observe that the spectral signature of the “Stone-Steel” class is similar to that of class “Wheat”. This correlation between the spectral signatures of different classes results in an approximate low-rank structure of the data, captured by the low-rank component , while the dictionary-sparse component is used to identify the target of interest; see also Fig 8. We specifically show that such a decomposition successfully localizes the target despite the high correlation between spectral signatures. It is worth noting that although we consider thin dictionaries () for the purposes of this demixing task, our theoretical results are also applicable for the fat case () [24],[3].
The rest of the paper is organized as follows222Notation: Given a matrix and vector , we use for the spectral norm, where denotes the maximum singular value of the matrix, , , , and . Here, denotes the element of and denotes the canonical basis vector with at the -th location. We also use to denote the -norm in case of vectors and spectral norm for matrices.. We formalize the problem and describe various considerations on the structure of the component matrices in Section II. In Section III, we present our main theorems for the entry-wise and column-wise cases along with discussion on the implication of the results, followed by an outline of the analysis in Section IV. Numerical evaluations on synthetic data are provided in Section V, while we explore the application to target localization in HS images in Section VI. Finally, we summarize our contributions and conclude this discussion in Section VII with future directions.
II Preliminaries
We start formalizing the problem set-up and introduce model parameters pertinent to our analysis. We begin our discussion with our notion of optimality for the two sparsity modalities; we also summarize the notation in Table V in the appendix.
II-A Optimality of the Solution Pair
For the entry-wise case, we recover the low-rank component , and the sparse coefficient matrix , given the dictionary , and data generated according to the model described in (1). Recall that is the global sparsity, denotes the number of non-zero entries in a column of when the dictionary is fat.
In the the column-wise sparsity setting, due to the inherent ambiguity in the model (1), as discussed in Section I-B, we can only hope to recover the column-space for the low-rank matrix and the identities of the non-zero columns for the sparse matrix. Therefore, in this case any solution in the Oracle Model (defined below) is deemed to be optimal.
Definition D.1** (Oracle Model for Column-wise Sparsity Case).**
Let the pair be the matrices forming the data as per (1), and define the oracle model . Then, any pair is in the Oracle Model , if , and hold simultaneously, where and are projections onto the column space of and column support of , respectively.
II-B Conditions on the Dictionary
We require that the dictionary follows the generalized frame property (GFP) defined as follows.
Definition D.2**.**
A matrix satisfies the generalized frame property (GFP), on vectors , if for any fixed vector where and some , we have
[TABLE]
where and are the lower and upper generalized frame bounds with .
The GFP shown above is met as long as the vectors are not in the null-space of the matrix for finite . Therefore, for the thin dictionary setting for both entry-wise and column-wise cases can be the entire space, and GFP is satisfied as long as has full column rank. For example, being a frame[52] suffices; see also [53]. On the other hand, for the fat dictionary setting, we need the space to have a union-of-subspace structure such that GFP is met for both the entry-wise and column-wise sparsity cases. Specifically, for the entry-wise sparsity case, we also require that the frame bounds and be close to each other. To this end, we assume that satisfies the restricted isomtery property (RIP) [9] of order with a restricted isometric constant (RIC) of in this case, and that and .
II-C Relevant Subspaces
We now define the subspaces relevant for our discussion. For the following discussion, let the pair denote the solution to D-RPCA(E) in the entry-wise sparse case. Further, for the column-wise sparse setting, let denote a solution pair in the oracle model D.1, obtained by solving D-RPCA(C). For the low-rank matrix , let the compact singular value decomposition (SVD) be defined as
[TABLE]
where and are the left and right singular vectors of , respectively, and is the diagonal matrix with singular values on the diagonal. Here, matrices and each have orthogonal columns, and the non-negative entries are arranged in descending order. We define as the linear subspace consisting of matrices spanning the same row or column space as , i.e., for or ,
[TABLE]
Next, let ( for the column-wise sparsity setting) be the space spanned by matrices with the same non-zero support (column support, denoted as ) as , and let denote the index set containing the non-zero column index set of for the column-wise case, then we denote the space spanned by the dictionary sparse component as
[TABLE]
where for entry-wise case and for column-wise case. Also, we denote the corresponding complements of the spaces described above by appending ‘’. In addition, we use calligraphic ‘’ to denote the projection operator onto a subspace , and ‘’ to denote the corresponding projection matrix. For instance, we define and as the projection operators corresponding to the column space and row space of the low-rank component . Therefore, for a given matrix ,
[TABLE]
where and . With this, the projection operators onto, and orthogonal to, the subspace are respectively defined as
[TABLE]
and
[TABLE]
II-D Incoherence Measures and Parameters
We employ various notions of incoherence to identify the conditions under which our procedures succeed. To this end, we first define the incoherence parameter , which characterizes the relationship between the low-rank part and the dictionary sparse part as
[TABLE]
The parameter is the measure of degree of similarity between the low-rank part and the dictionary sparse component. Here, a larger implies that the dictionary sparse component is close to the low-rank part, while a small indicates otherwise. In addition, we also define the parameter as
[TABLE]
which measures the similarity between the orthogonal complement of the column-space and the dictionary .
The next two measures of incoherence can be interpreted as a ways to identify the cases where for with SVD as : (a) resembles the dictionary , and/or (b) resembles the sparse coefficient matrix . In these cases, the low-rank part may mimic the dictionary sparse component. To this end, similar to [23], we define the following to measure these properties respectively as
[TABLE]
Here, , and achieves the upper bound when a dictionary element is exactly aligned with the column space of . Moreover, achieves the upper bound when the row-space of is “spiky,” i.e., a certain row of is -sparse, meaning that a column of is supported by (can be expressed as a linear combination of) a column of . The lower bound here is attained when it is “spread-out,” i.e., each column of is a linear combination of all columns of . In general, our recovery of the two components is easier when the incoherence parameters and are closer to their lower bounds.
Further, for notational convenience, we define
[TABLE]
Here, is the maximum absolute entry of for the entry-wise case, which measures how close columns of are to the singular vectors of . Similarly, for the column-wise case, measures the closeness of columns of to the singular vectors of under a column-wise maximum -norm metric.
III Main Results
We present the main results corresponding to each sparsity structure of in this section.
III-A Exact Recovery for Entry-wise Sparsity Case
Our main result establishes the existence of a regularization parameter for which solving the optimization problem D-RPCA(E) will recover the components and exactly. To this end, we will show that such a belongs to a non-empty interval with and defined as
[TABLE]
where is a constant that captures the relationship between different model parameters, and is defined as
[TABLE]
and is defined as
[TABLE]
Given these definitions, we formalize the theorem for the entry-wise case as following; a proof sketch is provided in Section IV-A.
Theorem 1**.**
Suppose , where and has at most non-zeros, i.e., . Given , , , defined in (2), (4), (5), and any with defined in (6), and asssuming the dictionary obeys the generalized frame property D.2 with frame bounds , solving D-RPCA(E) will recover matrices and in the following cases:
* For , may contain the entire space and follows*
[TABLE]
* For for a constant , consists of all sparse vectors, and follows*
[TABLE]
Theorem 1 establishes the sufficient conditions for the existence of to guarantee recovery of for both the thin and the fat cases. The conditions on dictated by (8) and (9), for the thin and fat case, respectively, arise from ensuring that . Further, the condition , translates to the following sufficient condition on rank in terms of the sparsity for ,
[TABLE]
for the recovery of . This relationship matches with our empirical evaluations and will be revisited in Section V-A.
For both, thin and fat dictionary cases, smaller incoherence measures (, , and ) between the low-rank part, , the dictionary, , and the sparse component are sufficient for recovery. Our theoretical results for the fat case are similar to [23] without its restrictions (e.g. orthogonality of rows and columns of , and sparsity requirements). By extending the analysis to thin dictionaries, we consider the worst case deterministic setting as opposed to Robust PCA analysis such as [12] which imposes randomness assumptions on the components. The algorithm works beyond these constrains in practice since we consider sufficient conditions under the worst-case deterministic setting; see Section V. One sanity check is to consider the case when the low-rank part is orthogonal to the dictionary, i.e., . From (6), we see that the condition , no longer constraints rank and sparsity, and we need . However, the rank and sparsity are still restricted, i.e., with increase in rank the dictionary choice may be restricted to maintain orthogonality.
III-B Exact Recovery for Column-wise Sparsity Case
Recall that we consider the oracle model in this case as described in D.1 owing to the intrinsic ambiguity in recovery of ; see our discussion in Section I-B. To demonstrate its recoverability, the following lemma establishes the sufficient conditions for the existence of an optimal pair . The proof is provided in Appendix A-B.
Lemma 2**.**
Given , , and , any pair satisfies and if .
Analogous to the entry-wise case, we show the existence of a non-empty interval for the regularization parameter , for which solving D-RPCA(C) recovers an optimal pair as per Lemma 2. Here, for a constant , and are defined as
[TABLE]
Then, our main result for the column-wise case is as follows; a proof sketch is provided in Section IV-B.
Theorem 3**.**
Suppose with defining the oracle model , where , and for . Given , , , defined in (2), (3), (4), (5), and any , for defined in (11), solving D-RPCA(C) will recover a pair of components , if the space is structured such that the dictionary obeys the generalized frame property D.2 with frame bounds , for .
Theorem 3 states the conditions under which the solution to the optimization problem D-RPCA(C) will be in the oracle model defined in D.1. The condition on the column sparsity is a result of the constraint that . Similar to (10), requiring leads to the following sufficient condition on the rank in terms of the sparsity for ,
[TABLE]
For the conditions are similar to the entry-wise case, namely, that . Moreover, suppose that and are both close to , which can be easily met by a tight frame when , or a RIP type condition when . Then, if is a constant, since , we have that . This is of the same order with the upper bound of in the Outlier Pursuit (OP) [13]. Our numerical results in Section V further show that D-RPCA(C) can be much more robust than OP, and may recover even when the rank of is high and outliers are a constant proportion of .
Remark: In essence, Theorems 1 and 3 guarantee recovery of the components as long as the incoherence parameters, , , and are small. As stated in Section II-C, these parameters measure if the low-rank component and the dictionary sparse component can be teased apart from the given data. Specifically, here measures how close the low-rank component is to the dictionary sparse component. Both and measure how close the column space of the low-rank part is to the dictionary , while measures if the row space of is sparse. These measures ensure that the components can be identified successfully. Furthermore, we see that the global sparsity in the column-wise case can be higher than the entry-wise case.
IV Proof of Main Results
IV-A Proof of Theorem 1
We use dual certificate construction procedure to prove the main result in Theorem. 1; the proofs of all lemmata used here are given in Appendix A-A. To this end, we start by constructing a dual certificate for the convex problem shown in D-RPCA(E). Here, we first show the conditions the dual certificate needs to satisfy via the following lemma.
Lemma 4**.**
If there exists a dual certificate satisfying
[TABLE]
then the pair is the unique solution of D-RPCA(E).
We will now proceed with the construction of the dual certificate which satisfies the conditions outlined by (C1)-(C4) by Lemma 4. Using the analysis similar to [23] (Section V. B.), we construct the dual certificate as
[TABLE]
for arbitrary . The condition (C1) is readily satisfied by our choice of . For (C2), we substitute the expression for to arrive at
[TABLE]
Letting and
[TABLE]
we can write (13) as . Further, we can vectorize the equation above as . Let be a length vector containing elements of corresponding to the support of . Now, note that can be represented in terms of a Kronecker product as follows,
[TABLE]
On defining , we have .
Further, let denote the rows of that correspond to support of , and let correspond to the remaining rows of . Using these definitions and results, we have . Thus, for conditions (C1) and (C2) to be satisfied, we need
[TABLE]
Here, the following result ensures the existence of the inverse.
Lemma 5**.**
If and , satisfies the bound .
Now, we look at the condition (C3) . This is where our analysis departs from [23]; we write
[TABLE]
where we have used the fact that and . Now, as is the pseudo-inverse of , i.e., , we have that , where is the smallest singular value of . Therefore, we have
[TABLE]
The following lemma establishes an upper bound on .
Lemma 6**.**
An upper-bound on is given by .
Combining (15), Lemma 5, and Lemma 6, we have
[TABLE]
Now, combining (16) and the upper bound on defined in (6), we have that (C3) holds. Now, we move on to finding conditions under which (C4) is satisfied by our dual certificate. For this we will bound . Our analysis follows the similar procedure as employed in deriving (16) in [23], reproduced here for completeness. First, by the definition of and properties of the norm, we have
[TABLE]
We now focus on simplifying the term . By definition of , and using the fact that , we have , which implies
[TABLE]
where we have used the result on shown in (14).
Further, we can write as
[TABLE]
Moving on, we derive an upper bound on .
Lemma 7**.**
An upper-bound on is given by .
Then, on defining we have
[TABLE]
where we have the following bound for .
Lemma 8**.**
An upper-bound on is given by , where
[TABLE]
where and is defined in (7).
Combining this with (17) and Lemma 8, we have
[TABLE]
By simplifying (18), we arrive at the lower bound for as in (6), from which (C4) holds. Gleaning from the expressions for and , we observe that for the existence of that can recover the desired matrices. This completes the proof. ∎
Characterizing : In the previous section, we characterized the and based on the dual certificate construction procedure. For the recovery of the true pair , we require . Since and by definition, we need for , i.e.,
[TABLE]
Conditions for thin : To simplify the analysis we assume, without loss of generality, that . Specifically, we will assume that , where is a constant. With this assumption in mind, we will analyze the following cases for the global sparsity, when and .
Case 1: .
which leads to
[TABLE]
As per the GFP of D.2, we also require that . Therefore we arrive at
[TABLE]
Further, since , we require the numerator to be positive, and since the lower bound on , we have
[TABLE]
which also implies . Now, the condition implies
[TABLE]
Since, the R.H.S. of this inequality is upper bounded by (achieved when and are zero). This condition on is satisfied by our assumption that .
Case 2: .
Again, due to the requirement that , following a similar argument as in the previous case we conclude that
[TABLE]
Conditions for fat : To simplify the analysis, we suppose that . Note that in this case, we require that the coefficient matrix has -sparse columns. Now, . Using similar arguments as above
[TABLE]
Characterizing : Further, the condition translates to a relationship between rank , and the sparsity , as shown in (10) for .
IV-B Proof of Theorem 3
In this section we prove Theorem 3; the proofs of lemmata are provided in Appendix A-B. The Lagrangian of the nonsmooth optimization problem D-RPCA(C) is
[TABLE]
where is a dual variable. The subdifferentials of (20) with respect to are
[TABLE]
We claim that a pair is an optimal point of D-RPCA(C) if and only if the following hold by the optimality conditions:
[TABLE]
The following lemma states the optimality conditions for the optimal solution pair .
Lemma 9**.**
*Given and , let define the oracle model . Then any solution is the an optimal solution pair of D-RPCA(C), if there exists a dual certificate that satisfies
, , where for all ; , otherwise,
, and .*
We first propose as the dual certificate, where
[TABLE]
Hence, the condition (C1) is readily satisfied by our choice of . Now, the condition (C2), defined as , where for all ; , otherwise. Substituting the expression for , we need the following condition to hold
[TABLE]
Letting and , we have . Further, vectorizing the equation above, we have
[TABLE]
where . Next, by letting , using the definition of and the properties of the Kronecker product we have . Now, let denote the rows of corresponding to the non-zero rows of and denote the remaining rows, then
[TABLE]
From (25) and (26), we have . Therefore, we need the following
[TABLE]
which corresponds to the least norm solution i.e., , s.t. . For this choice of (24) is satisfied and consequently so is the condition (C2). Here, the existence of the inverse is ensured by the following.
Lemma 10**.**
If and , the minimum singular value of is bounded away from [math] and is given by
Upon the existence of such as defined in (27), (C3) is satisfied if the following condition holds
[TABLE]
From (27), this condition translates to
[TABLE]
Now, since (see the analogous analysis for the entry-wise case), we need
[TABLE]
Now, using Lemma 10 and the following bound on ,
Lemma 11**.**
An upper-bound on is given by .
we have that the condition (C3) holds if
[TABLE]
which is satisfied by our choice of (11). Now, for the condition (C4) we need the following condition to hold true:
[TABLE]
Note that, here . Further, the following result establishes an upper-bound on .
Lemma 12**.**
An upper bound on is given by
In light of this, the condition (C4) implies that,
[TABLE]
To this end, if we let , (C4) is satisfied by defined in (11). This completes the proof. ∎
Characterizing : From (11), we need , where . Then from , we require .
Characterizing : Since we need , substituting the expressions for and , and using the fact that , we arrive at (12).
V Numerical Simulations on Synthetic Data
In this section, we empirically evaluate the properties of D-RPCA(E) and D-RPCA(C) via phase transition in rank and sparsity, and compare its performance to related techniques, and to the behavior predicted by Theorem 1 and Theorem 3 in (10) and (12), respectively.
V-A Entry-Wise Sparsity Case
Experimental Set-up: We employ the accelerated proximal gradient (APG) algorithm outlined in Algorithm 1 to solve the optimization problem D-RPCA(E). For these evaluations, we fix , and generate the low-rank part by outer product of two column normalized random matrices of sizes and , with entries drawn from the standard normal distribution. In addition, we choose non-zero locations of the sparse component randomly, and draw the values at these non-zero entries from the Rademacher distribution, and the dictionary from the standard normal distribution with normalized columns. We then run Monte-Carlo trials for each pair of rank and sparsity, and for each of these, we scan across values of s in the range of to find the best pair of to compile the results. For ease of computation we run on modest values of and . Here, the white and dark region correspond to correct recovery and failure, respectively.
Discussion: Phase transition in rank and sparsity averaged over trials for dictionaries of sizes (thin) and (fat), are shown in Fig. 2 and Fig. 3, respectively. We note from Fig. 2 that indeed the empirical relationship between rank and sparsity for the recovery of has the same trend as predicted by (10) in Section III for . Here, the parameters corresponding to the predicted trend (shown in red) have been hand-tuned for best fit. In fact, as shown in Fig. 3, this trend continues for sparsity levels much greater than . This can be potentially attributed to the worst case deterministic analysis considered here.
Further, Fig. 4 shows the results of RPCA† (in green, shows the area where at least one of the Monte-Carlo simulations succeeds) in comparision to the results obtained by D-RPCA(E) for and . We observe that D-RPCA(E) outperforms RPCA† across the board. In fact, we notice that the RPCA† technique only succeeds when . We believe that this is because when the component is not low-rank (full-rank in this case) w.r.t. the maximum potential rank of . As a result, the model assumptions of the robust PCA problem do not apply; see Section I-B. In contrast, the proposed framework of D-RPCA(E) can handle these cases effectively (see Fig. 4) since is low-rank irrespective of the dictionary size. This highlights the applicability of the our approach to cases where , and simultaneous recovery of the low-rank component in one-shot.
V-B Column-wise Sparsity Case
We now present phase transition in rank and number of outliers to evaluate the performance of D-RPCA(C). In particular, we compare with Outlier Pursuit (OP) [13] that solves D-RPCA(C) with , and OP† to demonstrate that the a priori knowledge of the dictionary provides superior recovery properties.
Experimental Set-up:
Again, we employ a variant of the APG algorithm outlined in Algorithm 1 to solve the optimization problem D-RPCA(C). We set , , and for each pair of and we run Monte-Carlo trials for and . For our experiments, we form , where , have i.i.d. entries, which are then normalized column-wise. Next, we generate where each entry of is i.i.d. . Also, the known dictionary is formed by normalizing the columns of a random matrix with i.i.d. entries. For each method, we scan through values of the regularization parameter to find a solution pair with the best precision, i.e. . We declare an experiment successful if it acheives a precision of or higher. Here, we threshold the column norms at before we evaluate the precision.
Discussion: Fig. 5 (a)-(c) shows the phase transition in rank and column-sparsity for the outlier identification performance (in terms of precison) of OP for , D-RPCA(C) for (and OP† in green, marking the region where precision is greater than [math]), and D-RPCA(C) for , respectively. We observe that the a priori knowledge of the dictionary significantly boosts the performance of D-RPCA(C) as compared to OP. This showcases the superior outlier identification properties of the proposed technique D-RPCA(C). Further, similar to the entry-wise case, we note that the pseudo-inverse based technique OP† (in green) fails when . For the case the proposed technique D-RPCA(C) is able to identify the outlier columns with high precision. Meaning that our technique succeeds even when the outlier columns are not sparse.
VI Evaluation of Real-World Dataset: Target Localization in Hyperspectral Imaging
A HS sensor records the response of a region to different frequencies of the electromagnetic spectrum. As a result, each HS image , can be viewed as a data-cube formed by stacking matrices of size , as shown in Fig. 6. Here, is determined by the number of channels or frequency bands across which measurements of the reflectances are made. Therefore, each volumetric element or voxel, of a HS image is a vector of length corresponding to response of the material to measurement channels.
HS images (when represented as a matrix) are approximately low-rank since a particular scene is composed of only a limited type of objects/materials [54]. For instance, while imaging an agricultural area, one would expect to record responses from materials like biomass, farm vehicles, roads, houses, water bodies, and so on. Moreover, the spectra of complex materials can be assumed to be a linear mixture of the constituent materials [54, 55], i.e. the received HS responses can be viewed as being generated by a linear mixture model [43]. For the target localization task at hand, this approximate low-rank structure is used to decompose a given HS image into a low-rank part, and a component that is sparse in a known dictionary – a dictionary sparse part – wherein the dictionary is composed of the spectral signatures of the target of interest. We consider the thin dictionary setting for the rest of this discussion, since often we aim to localize targets based on a few a priori known spectral signatures, although a similar analysis applies for the fat case; see Section III and [24].
Formally, let , where be formed by unfolding the HS image , such that, each column of corresponds to a voxel of the data-cube. We then model as a superposition of a low-rank component with rank , and a dictionary-sparse component, , i.e.,
[TABLE]
Here, represents an a priori known dictionary composed of appropriately normalized characteristic responses of the material/object (or the constituents of the material), we wish to localize, and refers to the sparse coefficient matrix (also referred to as abundances in the literature). Note that can also be constructed by learning a dictionary based on the known spectral signatures of a target; see [56, 57, 58, 59, 60].
We now discuss the implementation specifics corresponding to the target localization task. We begin by presenting the algorithm used to solve the optimization problems D-RPCA(E) and D-RPCA(C), before discussing the experimental details.
VI-A Algorithmic Considerations
The optimization problems of interest, D-RPCA(E) and D-RPCA(C), for the entry-wise and column-wise case, respectively, are convex but non-smooth. To solve for the components of interest, we adopt the accelerated proximal gradient (APG) algorithm, as shown in Algorithm 1. We here present a unified APG-based algorithm for D-RPCA(E) for the both sparsity and dictionary cases, which includes the case considered by [23].
VI-A1 Discussion of Algorithm 1
For the optimization problem of interest, we solve an unconstrained problem by transforming the equality constraint to a least-square term which penalizes the fit. In particular, we will accomplish the demixing task by solving the following via the APG-based Algorithm 1.
[TABLE]
for the entry-wise sparsity case, and
[TABLE]
for the column-wise sparsity case.
We note that although for the HS application at hand, the thin dictionary case with () might be more useful in practice, Algorithm 1 allows for the use of fat dictionaries () as well. Specifically, the APG algorithm requires that the gradient of the smooth part,
[TABLE]
of the convex objectives shown in (29) and (30) is Lipschitz continuous with minimum Lipschitz constant . Now, since the gradient with respect to is given by
[TABLE]
we have that the gradient is Lipschitz continuous as
[TABLE]
for all in the domain of , where
[TABLE]
The update of the low-rank component and the sparse matrix for the entry-wise case both involve a soft thresholding step, , where for a matrix , is defined as
[TABLE]
In case of the low-rank part we apply this function to the singular values (therefore referred to as singular value thresholding) [61], while for the update of the dictionary sparse component, we apply it to the sparse coefficient matrix .
The low-rank update step remains the same as for the entry-wise case. For the update of the column-wise case, we threshold the columns of based on their column norms, i.e., for a column of a matrix , the column-norm based soft-thresholding function, is defined as
[TABLE]
VI-A2 Parameter Selection
We adopt a grid-search strategy over the range of admissible values to find the best values of the regularization parameters.
Selecting parameters for the entry-wise case: The choice of parameters and in Algorithm 1 is based on the optimality conditions of the optimization problem shown in (29). As presented in [23], the range of parameters and associated with the low-rank part and the sparse coefficient matrix , respectively, lie in and , i.e., for Algorithm 1 .
These ranges for are derived using the optimization problem shown in (29). Specifically, we find the largest values of these regularization parameters which yield a solution for the pair by analyzing the optimality conditions of (29). This value of the regularization parameter then defines the upper bound on the range. For instance, the optimality condition for and , is given by
[TABLE]
where the sub-differential set is defined as
[TABLE]
Therefore, for a zero solution pair we have that
[TABLE]
which yields the condition that . Therefore, the maximum value of which drives the low-rank part to an all-zero solution is . Similarly, the optimality condition for the dictionary sparse component to choose is given by
[TABLE]
where the the sub-differential set is defined as
[TABLE]
Again, for a zero solution pair we need that
[TABLE]
which implies that , i.e. the maximum value of that drives the dictionary sparse part to zero is .
Selecting parameters for the column-wise case: Again, the choice of parameters and is derived from the optimization problem shown in (30). In this case, the range of parameters and associated with the low-rank part and the sparse coefficient matrix , respectively, lie in and , i.e., for Algorithm 1 . The range of regularization parameters are evaluated using the analysis similar to the entry-wise case, we use the optimality conditions for (30), instead of (29).
VI-B Experimental Evaluation
We now evaluate the performance of the proposed technique on real-world HS data. We begin by introducing the dataset333Available via http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_Scenes. used for these simulations, following which we describe the experimental set-up and present the results.
Data
Indian Pines Dataset: We first consider the “Indian Pines” dataset [51], which was collected over the Indian Pines test site in North-western Indiana in the June of 1992 using the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [63] sensor, a popular choice for collecting HS images for various remote sensing applications. This dataset consists of spectral reflectances across bands in wavelength of ranges nm from a scene which is composed mostly of agricultural land along with two major dual lane highways, a rail line and some built structures, as shown in Fig. 7(a). The dataset is further processed by removing the bands corresponding to those of water absorption, which results in a HS data-cube with dimensions is as visualized in Fig. 6. Here, , , and therefore . This modified dataset is available as “corrected Indian Pines” dataset [51], with the ground-truth containing classes; henceforth, referred to as the “Indian Pines Dataset". We form the data matrix by stacking each voxel of the image side-by-side, which results in a data matrix . We will analyze the performance of the proposed technique for the identification of the stone-steel towers (class in the dataset), shown in Fig. 7(a), constituting voxels.
Pavia University Dataset: Acquired using Reflective Optics System Imaging Spectrometer (ROSIS) sensor, the Pavia University Dataset [62] consists of spectral reflectances across bands (in the range nm) of an urban landscape over northern Italy. The selected subset of the scene, a data-cube, mainly consists of buildings, roads, painted metal sheets and trees, as shown in Fig. 7(b). Note that class- corresponding to “Gravel” is not present in the selected data-cube considered here. For our demixing task, we will analyze the localization of target class , corresponding to the painted metal sheets, which constitutes voxels in the scene. Note that for this dataset , , and .
Further, in Fig. 8 we show the decay of singular values of the Indian Pines and the Pavia University dataset. We note that indeed the presence of a limited number of materials makes the these datasets approximately low-rank.
Dictionary: We form the known dictionary two ways: 1) where a (thin) dictionary is learned based on the voxels by solving (31), and 2) when the dictionary is formed by randomly sampling voxels from the target class. This is to emulate the ways in which we can arrive at the dictionary corresponding to a target – 1) where the exact signatures are not available, and/or there is noise, and 2) where we have access to the exact signatures of the target, respectively.
In our experiments for case 1), we learn a dictionary using the target class data by alternating between updating the sparse coefficients via FISTA [64] and dictionary via the Newton method [65], approximately solving the following optimization problem [56, 57, 58, 59].
[TABLE]
For case 2), the columns of the dictionary are set as the known data voxels of the target class. Specifically, instead of learning a dictionary based on a target class of interest, we set it as the exact signatures observed previously. Note that for this case, the dictionary is not normalized at this stage since the specific normalization depends on the particular demixing problem of interest, discussed shortly. In practice, we can store the un-normalized dictionary (formed from the voxels), consisting of actual signatures of the target material, and can normalize it after the HS image has been acquired.
Experimental Setup
Normalization: For normalizing the data, we divide each element of the data matrix by to preserve the inter-voxel scaling. For the dictionary, in the learned dictionary case, i.e., case 1), the dictionary already has unit-norm columns. Further, when the dictionary is formed from the data directly, i.e., for case 2), we divide each element of by , and then normalize the columns of , such that they are unit-norm.
Dictionary selection for the Indian Pines Dataset: For the learned dictionary case, we evaluate the performance of the aforementioned techniques for both entry-wise and column-wise settings for two dictionary sizes, and , for three values of the regularization parameter , used for the initial dictionary learning step, i.e., and . Here, the parameter controls the sparsity during the initial dictionary learning step (31). For the case when dictionary is selected from the voxels directly, we randomly select voxels from the target class- to form our dictionary.
Dictionary selection for the Pavia University Dataset: Here, for the learned dictionary case, we evaluate the performance of the aforementioned techniques for both entry-wise and column-wise settings for a dictionary of size for three values of the regularization parameter , used for the initial dictionary learning step, i.e., and . Further, we randomly select voxels from the target class-, when the dictionary is formed from the data voxels.
Comparison with matched filtering (MF)-based approaches: In addition to the robust PCA-based and OP-based techniques introduced in Section I-B, we also compare the performance of our techniques with two MF-based approaches. These MF-based techniques are agnostic to our model assumptions, i.e., entry-wise or column-wise sparsity cases. Therefore, the following description applies to both sparsity cases.
For the first MF-based technique, referred to as MF, we form the inner-product of the column-normalized data matrix , denoted as , with the dictionary , i.e., , and select the maximum absolute inner-product per column. For the second MF-based technique, MF†, we perform matched filtering on the pseudo-inversed data . Here, the matched filtering corresponds to finding maximum absolute entry for each column of the column-normalized . Next, in both cases we scan through threshold values between to generate the results.
Performance Metrics: We evaluate the performance of these techniques via the receiver operating characteristic (ROC) plots. ROC plots are a staple for classification performance analysis of a binary classifier in machine learning; see also [66]. Specifically, it is a plot between the true positive rate (TPR) and the false positive rate (FPR), where a higher TPR (close to ) and a lower FPR (close to [math]) indicates that the classifiier detects all the elements in the class while rejecting those outside the class.
A natural metric to gauge good performance is the area under the curve (AUC) metric. It indicates the area under the ROC curve, which is maximized when TPR and FPR , therefore, a higher AUC is preferred. Here, an AUC of indicates that the performance of the classifier is roughly as good as a coin flip on average. As a result, if a classifier has an AUC , one can improve the performance by simply inverting the result of the classifier. This effectively means that AUC is evaluated after “flipping” the ROC curve. In other words, this means that the classifier is good at rejecting the class of interest, and taking the complement of the classifier decision can be used to identify the class of interest.
In our experiments, MF-based techniques often exhibit this phenomenon. Specifically, when the dictionary contains element(s) which resemble the average behavior of the spectral signatures, the inner-product between the normalized data columns and these dictionary elements may be higher as compared to other distinguishing dictionary elements. Since MF-based techniques rely on the maximum inner-product between the normalized data columns and the dictionary, and further since the spectral signatures of even distinct classes are highly correlated; see, for instance Fig. 1, where MF-based approaches in these cases can effectively reject the class of interest. This leads to an AUC . Therefore, as discussed above, we invert the result of the classifier (indicated as in the tables) to report the best performance. If using MF-based techniques, this issue can potentially be resolved in practice by removing the dictionary elements which tend to resemble the average behavior of the spectral signatures.
Parameter Setup for the Algorithms
Entry-wise sparsity case: We evaluate and compare the performance of the proposed method D-RPCA(E) with RPCA† (described in Section I-B), MF, and MF†. Specifically, we evaluate the performance of these techniques via the receiver operating characteristic (ROC) plot for the Indian Pines dataset and the Pavia University dataset, with the results shown in Table I(a)-(d) and Table III(a)-(c), respectively.
For the proposed technique, we employ the accelerated proximal gradient (APG) algorithm shown in Algorithm 1 and discussed in Section VI-A to solve the optimization problem shown in D-RPCA(E). Similarly, for RPCA† we employ the APG algorithm with transformed data matrix , while setting .
With reference to selection of tuning parameters for the APG solver for (D-RPCA(E)) (RPCA†, respectively), we choose , (), , and scan through values of in the range (), to generate the ROCs. We threshold the resulting estimate of the sparse part based on its column norm. We choose the threshold such that the AUC metric is maximized for both cases (D-RPCA(E) and RPCA†).
Column-wise sparsity case: For this case, we evaluate and compare the performance of the proposed method D-RPCA(C) with OP† (as described in Section I-B), MF, and MF†. The results for the Indian Pines dataset and the Pavia University dataset as shown in Table II(a)-(d) and Table IV(a)-(c), respectively. As in the entry-wise sparsity case, we employ the accelerated proximal gradient (APG) algorithm presented in Algorithm 1 to solve the optimization problem shown in D-RPCA(C). Similarly, for OP† we employ the APG with transformed data matrix , while setting . For the tuning parameters for the APG solver for (D-RPCA(C)) (OP†, respectively), we choose , (), , and scan through s in the range (), to generate the ROCs. We threshold the resulting estimate of the sparse part based on its column norm.
Analysis: Table I–III and Table II–IV show the ROC characteristics and the classification performance of the proposed techniques D-RPCA(E) and D-RPCA(C), for two datasets under consideration, respectively, under various choices of the dictionary and regularization parameter for (31). We note that both proposed techniques D-RPCA(E) and D-RPCA(C) on an average outperform the competing techniques, emerging as the most reliable techniques across different dictionary choices; see Tables I(d), III(c), II(d), and IV(c).
Further, the performance of D-RPCA(C) is slightly better than D-RPCA(E). This can be attributed to the fact that the column-wise sparsity model does not require the columns of to be sparse themselves. As alluded to in Section I-B, this allows for higher flexibility in the choice of the dictionary elements for the thin dictionary case.
In addition, we see that the matched filtering-based techniques (and even OP† based technique for and in Table II) exhibit “flip” or inversion of the ROC curve. As described in Section VI-B, this phenomenon is an indicator that a classifier is better at rejecting the target class. In case of MF-based technique, this is a result of a dictionary that contains an element that resembles the average behavior of the spectral responses. A similar phenomenon is at play in case of the OP† for and in Table II. Specifically, here the inversion indicates that the dictionary is capable of representing the columns of the data effectively, which leads to an increase in the corresponding column norms in their representation . Coupled with the fact that the component is no longer low-rank for this thin dictionary case (see our discussion in Section I-B), this results in rejection of the target class. On the other hand, our techniques D-RPCA(E) and D-RPCA(C) do not suffer from this issue. Moreover, note that across all the experiments, the thresholds for RPCA† and OP† are higher than their D-RPCA counterparts. This can also be attributed to the pre-multiplication by the pseudo-inverse of the dictionary , which increases column norms based on the leading singular values of . Therefore, using D-RPCA(E), when the target spectral response admits a sparse representation, and D-RPCA(C), otherwise, yield consistent and superior results as compared to related techniques.
There are other interesting recovery results which warrant our attention. Fig. 9 shows the low-rank and the dictionary sparse component recovered by D-RPCA(E) for two different values of , for the case where we form the dictionary by randomly sampling the voxels (Table I(c)) for the Indian Pines Dataset [51]. Interestingly, we recover the rail tracks/roads running diagonally on the top-right corner, along with some low-density housing; see Fig 9 (f). This is because the signatures we seek (stone-steel) are similar to the signatures of the materials used in these structures. This further corroborates the applicability of the proposed approach in detecting the presence of particular spectral signatures as long as they are appropriately distinct.
VII Discussion
We analyze a dictionary-based generalization of Robust PCA, and use it for target localization in a hyperspectral (HS) image from the a priori known spectral signature of the material of interest. Here, we consider a case where the acquired data can be modeled as a superposition of a low-rank component and a dictionary sparse component, and analyze this model under two distinct sparsity modalities – entry-wise and column-wise, respectively for both thin and fat dictionary cases.
Our analysis shows that contrary to the existing intuition, in the thin dictionary case, premultiplication with pseudo-inverse of the dictionary may not reduce the problem to that of Robust PCA. To this end, we theoretically analyze the thin dictionary case while extending the analysis for the fat dictionary case, while also analyzing the column-wise sparsity case. As a result, our results, to the best of our knowledge, are the most general for this model and facilitate use of this model for practical settings. Here, we consider the worst case analysis for the deterministic setting. Therefore, analysis of this model with additional randomness assumptions on the constituent factors constitutes the future work. Additionally, the recent results on non-convex low-rank matrix estimation formulations [67, 68] may potentially lead to computationally efficient algorithms by replacing the expensive SVD step.
In this work, we also leverage our theoretical results for a target localization task in hyperspectral imaging to demonstrate the applicability of the proposed approach on real-world demixing tasks. Here, we show how the entry-wise and column-wise sparsity modalities can be used to detect targets depending on the dictionary structure. Future work on this thread will aim to further exploit local similarities (potentially by group sparsity constraints) in HS images to improve localization.
Overall, our algorithm agnostic theoretical guarantees and analysis of the corresponding application in HS image target detection task using the proposed dictionary-based generalization of Robust PCA opens up future theory-backed explorations of the model in various target detection applications.
Appendix A Proofs of Intermediate results
A-A Proofs for Entry-wise Case
We present the details of the proofs in this section for the entry-wise case. We first start by deriving the optimality conditions.
Proof of Lemma 4.
Let be a solution of the problem posed above. Notice that this pair is not necessarily unique. For example, as shown in proof of Lemma 2 in [23], , with arbitrary , is another feasible solution of the problem satisfying the optimality conditions (derived in this section).
We begin by writing the Lagrangian, , for the given problem as follows.
[TABLE]
where are the Lagrange multipliers.
Let the singular value decomposition (SVD) of be represented as . Then the sub-differential set of can be represented as
[TABLE]
as shown in [69]. Also, the subdifferential set corresponding to is given by
[TABLE]
Using these results, we write the sub-differential of the Lagrangian with respect to and at as
[TABLE]
Then optimality conditions are
[TABLE]
which implies that the dual solution must obey the following,
[TABLE]
Our aim here is to find the conditions on and such that the pair is a unique solution to the problem at hand.
Using these conditions, we see that and ; these correspond to conditions (C1) and (C2), respectively. Now consider a feasible solution for a non-zero . Now by duality of norms
[TABLE]
We can choose which implies and and
[TABLE]
Further, let , with and , be such that
[TABLE]
where denotes the element of . Then, we arrive at the following simplification for by duality of norms,
[TABLE]
We first write the sub-gradient optimality condition,
[TABLE]
Next, we use the relationships derived above to simplify the following term:
[TABLE]
We now simplify using Holder’s inequality.
[TABLE]
Finally, we simplify the optimality condition in shown in (A-A),
[TABLE]
Here, we note that if and , then the pair is the unique solution of the problem. Consequently, these are the required necessary conditions (C3) and (C4), respectively. ∎
Proof of Lemma 5.
First, note that we need to have full row rank, i.e, its smallest singular value should be greater than zero. To this end, we first derive a lower bound on the smallest singular value, of as follows:
[TABLE]
Now, using the definition of and properties of Kronecker products namely, transpose and vectorization of product of three matrices, we have
[TABLE]
Now, since ,
[TABLE]
Using the GFP, we have the following lower bound:
[TABLE]
Further, simplifying using properties of the projection operator, the reverse triangle inequality and the definition of ,
[TABLE]
Therefore, we note that if and , has full row rank, and the lower bound on the smallest singular value is given by . ∎
Proof of Lemma 6.
We begin with the definition of . Since and ,
[TABLE]
Now for an upper bound on we start by analyzing ,
[TABLE]
Using properties of the inner products and using the fact that ,
[TABLE]
Further simplifying using Cauchy Schwarz inequality and the definition of we have
[TABLE]
Now, since and using the GFP we have . Therefore, an upper bound for is given by . ∎
Proof of Lemma 7.
Since and , we have the upper bound . ∎
Proof of Lemma 8.
We begin by simplifying the quantity of interest as follows:
[TABLE]
Now, we derive appropriate bounds on the numerator and the denominator of (A-A) separately. Consider the numerator . Here, we are interested in the maximum -norm of the rows of , i.e.,
[TABLE]
Let refer to the support of , and to its complement. Then, the expression can be written in terms of and :
[TABLE]
Now, is defined as , therefore using the property of the product of two Kronecker products and product of projection matrices, can be written as
[TABLE]
We are interested in the entry of . Since, has a Kronecker product structure, an entry of is given by the product of elements of the matrices in the Kronecker product, therefore
[TABLE]
where is given by
[TABLE]
Now, consider , which can be simplified as
[TABLE]
Since trace is invariant under cyclic permutations, we have
[TABLE]
Denote and , then we have
[TABLE]
Now, the following upper bound on can be evaluated by squaring both sides and simplifying
[TABLE]
First consider , which can be written as . Here, can be upper bounded as shown below using the GFP
[TABLE]
Further, we can derive an upper bound on using the paraflelogram law for inner-products as follows.
[TABLE]
Therefore, we have
[TABLE]
Now, consider , since , and further, since , we have Now, substituting in (35), i.e., the expression for , we have,
[TABLE]
and finally substituting in (34) and noting that since and , ,
[TABLE]
Now, for , the maximum number of non-zeros per row is , while those in a column are for the thin case and for the fat case. Then we have
[TABLE]
Here, the constant is as defined in (7). Now, to bound the denominator of (A-A), we have
[TABLE]
We proceed to bound . For this, we derive a lower bound on . Note that selects the -th row of , which has a Kronecker product structure. Therefore,
[TABLE]
Therefore, since and , then if , we have . The analysis for deriving an upper bound for the second term in (A-A) closely follows that used in (37), as shown below
[TABLE]
Combining these results, we have the following bound for
[TABLE]
Finally, substituting these results in (A-A) we have , where is given by (7). ∎
A-B Proofs for Column-wise Case
Proof of Lemma 2.
We show that for any , if and do not hold simultaneously, then .
Let , as per our model shown in (1). Now, let be any other pair in our Oracle Model ,
[TABLE]
for some and , then we have that . This implies that . Further, this implies that and at least match in the columns indexed by the inliers, i.e., , and we have
[TABLE]
Therefore, . Specifically, this implies that there may exist a for which , which will imply that . This condition implies that . Therefore, we require and to hold simultaneously for . ∎
Proof of Lemma 9.
Let be an optimal solution pair of (D-RPCA(C)). From the optimality conditions (22) and (23), we seek such that
[TABLE]
Now consider a feasible solution for a non-zero . Then by the optimality of using the subgradient inequality, we have
[TABLE]
Let . We will show that if (q1)-(q4) hold, then , which proves the optimality of . Rewrite as
[TABLE]
Let , with and , then by duality of norms,
[TABLE]
Further, let , with and , be such that
[TABLE]
where denotes the column of . Then, we arrive at the following simplification for by duality of norms,
[TABLE]
Since and by optimality conditions of (39),
[TABLE]
where we use Holder’s inequality in the last step.
Combining (40), (41), (42), and (44), we have
[TABLE]
Since we have an arbitrary with and , does not hold. Therefore, to ensure the uniqueness of the solution , we need and . Hence, any dual certificate which obeys the conditions (C1)-(C4) guarantees optimality of the solution. ∎
Proof of Lemma 10.
We begin by writing the definition of as
[TABLE]
By the definition of and using the property of Kronecker product for multiplication by a vector we have
[TABLE]
Further , and we can write that expression above as follows
[TABLE]
Here (i) is due to the GFP condition D.2 and the reverse triangle inequality, and (ii) from the incoherence property in (2). ∎
Proof of Lemma 11.
We start by using the correspondence between the vector and the matrix , i.e.,
[TABLE]
Now, since for all ; and is otherwise (i.e., when ), using triangle inequality, we have
[TABLE]
Since we have
[TABLE]
where (i) is from subspace incoherence property and (ii) is from the GFP D.2. Combining (45) and (46), we have
[TABLE]
∎
Proof of Lemma 12.
We begin by analyzing the quantity of interest – , i.e., we are interested in the maximum column norm of the matrix . Note that is defined as
[TABLE]
and we have . Further, we have that
[TABLE]
Now, observe that the columns of matrix appear as blocks of size in the vector . Moreover, the elements of vector are formed due to the inner product between the rows of Kronecker product structured matrix and . Therefore, to identify a column of we need to focus on the interaction between correponding rows of and .
Consider the Kronecker product structured matrix . Since the rows in correspond to all rows outside the column support , this corresponds to selecting those rows of matrix which correspond to , which we denote by i.e.,
[TABLE]
For simplicity of the upcoming analysis, we denote the matrix as
[TABLE]
Using this notation, the -th block of vector (which is also the -th column of ), can be written as
[TABLE]
for some . Now, further since , therefore we are interested in maximum -norm of
[TABLE]
for some . Note that itself is a Kronecker product structured matrix given by
[TABLE]
Using the mixed product rule for Kronecker products we have
[TABLE]
for some . Further, since for two matrices and , , we have
[TABLE]
where we also use the fact that . We will now proceed to bound the first term in (47). Note that
[TABLE]
Now, each term in the summation can be bounded as
[TABLE]
This implies . Further, note that . Substituting this into (47), for a , we have
[TABLE]
We can further write as follows
[TABLE]
Substituting this result in (48), using Lemma 10 and Lemma 11,
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] I. Jolliffe, Principal component analysis , Wiley Online Library, 2002.
- 2[2] S. Rambhatla, X. Li, and J. Haupt, “Target-based hyperspectral demixing via generalized robust PCA,” in 51st Asilomar Conference on Signals, Systems, and Computers, ACSSC , 2017, pp. 420–424.
- 3[3] X. Li, J. Ren, S. Rambhatla, Y. Xu, and J. Haupt, “Robust PCA via dictionary based outlier pursuit,” in 2018 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), . IEEE, 2018.
- 4[4] M. Borengasser, W. S. Hungate, and R. Watkins, Hyperspectral remote sensing: principles and applications , CRC press, 2007.
- 5[5] B. Park and R. Lu, Hyperspectral imaging technology in food and agriculture , Springer, 2015.
- 6[6] D. Rolnick, P. L. Donti, L. H. Kaack, et al., “Tackling climate change with machine learning,” ar Xiv preprint ar Xiv:1906.05433 , 2019.
- 7[7] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM Journal on Computing , vol. 24, no. 2, pp. 227–234, 1995.
- 8[8] D. L. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Transactions on Information Theory , vol. 47, no. 7, pp. 2845–2862, 2001.
