Hyperspectral Super-Resolution via Global-Local Low-Rank Matrix   Estimation

Ruiyuan Wu; Wing-Kin Ma; Xiao Fu; and Qiang Li

arXiv:1907.01149·eess.IV·October 28, 2020

Hyperspectral Super-Resolution via Global-Local Low-Rank Matrix Estimation

Ruiyuan Wu, Wing-Kin Ma, Xiao Fu, and Qiang Li

PDF

1 Repo

TL;DR

This paper introduces a novel low-rank matrix estimation method for hyperspectral super-resolution, leveraging global and local low-rank structures to improve image reconstruction from multispectral and hyperspectral data.

Contribution

It proposes a global-local low-rank regularization framework and an efficient optimization algorithm for hyperspectral super-resolution, accounting for local spectral variations.

Findings

01

Outperforms benchmark algorithms in recovery accuracy

02

Effective in synthetic, semi-real, and real data scenarios

03

Leverages recent non-convex optimization advances

Abstract

Hyperspectral super-resolution (HSR) is a problem that aims to estimate an image of high spectral and spatial resolutions from a pair of co-registered multispectral (MS) and hyperspectral (HS) images, which have coarser spectral and spatial resolutions, respectively. In this paper we pursue a low-rank matrix estimation approach for HSR. We assume that the spectral-spatial matrices associated with the whole image and the local areas of the image have low-rank structures. The local low-rank assumption, in particular, has the aim of providing a more flexible model for accounting for local variation effects due to endmember variability. We formulate the HSR problem as a global-local rank-regularized least-squares problem. By leveraging on the recent advances in non-convex large-scale optimization, namely, the smooth Schatten-p approximation and the accelerated majorization-minimization…

Tables5

Table 1. Table 1: The local approximate ranks of the real HS datasets. The patch size refers to the spectral-spatial dimension, M × L i 𝑀 subscript 𝐿 𝑖 M\times L_{i} , of the local patches.

$P$	Chikusei		Cuprite
	( $480 \times 480$ pixels, $128$ bands)		( $480 \times 480$ pixels, $187$ bands)
	patch size	approx. rank	patch size	approx. rank
$1^{2}$	$128 \times 230, 400$	10	$187 \times 230, 400$	9
$2^{2}$	$128 \times 57, 600$	8.75 $\pm$ 0.96	$187 \times 57, 600$	8.25 $\pm$ 1.26
$3^{2}$	$128 \times 25, 600$	8.33 $\pm$ 1.73	$187 \times 25, 600$	7.44 $\pm$ 1.33
$4^{2}$	$128 \times 14, 400$	8.19 $\pm$ 1.83	$187 \times 14, 400$	7.13 $\pm$ 1.20
$5^{2}$	$128 \times 9, 216$	7.92 $\pm$ 1.89	$187 \times 9, 216$	6.76 $\pm$ 1.13
$6^{2}$	$128 \times 6, 400$	7.81 $\pm$ 1.98	$187 \times 6, 400$	6.64 $\pm$ 1.13
$8^{2}$	$128 \times 3, 600$	7.38 $\pm$ 2.13	$187 \times 3, 600$	6.38 $\pm$ 1.06
$10^{2}$	$128 \times 2, 304$	7.08 $\pm$ 2.29	$187 \times 2, 304$	6.12 $\pm$ 1.09
$12^{2}$	$128 \times 1, 600$	6.89 $\pm$ 2.34	$187 \times 1, 600$	6.03 $\pm$ 1.02
$15^{2}$	$128 \times 1, 024$	6.52 $\pm$ 2.35	$187 \times 1, 024$	5.84 $\pm$ 1.00
$16^{2}$	$128 \times 900$	6.47 $\pm$ 2.36	$187 \times 900$	5.79 $\pm$ 1.01
$P$	Indian Pine		University of Pavia
	( $120 \times 120$ pixels, $178$ bands)		( $240 \times 240$ pixels, $103$ bands)
	patch size	approx. rank	patch size	approx. rank
$1^{2}$	$178 \times 14, 400$	22	$103 \times 57, 600$	22
$2^{2}$	$178 \times 3, 600$	19.75 $\pm$ 1.71	$103 \times 14, 400$	20.75 $\pm$ 1.89
$3^{2}$	$178 \times 1, 600$	17.56 $\pm$ 4.10	$103 \times 6, 400$	20.44 $\pm$ 2.07
$4^{2}$	$178 \times 900$	16.63 $\pm$ 3.77	$103 \times 3, 600$	20.44 $\pm$ 2.66
$5^{2}$	$178 \times 576$	15.20 $\pm$ 3.94	$103 \times 2, 304$	20.28 $\pm$ 2.84
$6^{2}$	$178 \times 400$	14.69 $\pm$ 4.44	$103 \times 1, 600$	20.03 $\pm$ 3.08
$8^{2}$	$178 \times 255$	13.36 $\pm$ 4.47	$103 \times 900$	19.80 $\pm$ 3.43
$P$	Washington DC Mall		Moffett Field
	( $240 \times 240$ pixels, $191$ bands)		( $240 \times 240$ pixels, $187$ bands)
	patch size	approx. rank	patch size	approx. rank
$1^{2}$	$191 \times 57, 600$	6	$187 \times 57, 600$	13
$2^{2}$	$191 \times 14, 400$	5.50 $\pm$ 0.58	$187 \times 14, 400$	12.00 $\pm$ 2.71
$3^{2}$	$191 \times 6, 400$	5.22 $\pm$ 0.67	$187 \times 6, 400$	11.56 $\pm$ 2.65
$4^{2}$	$191 \times 3, 600$	4.50 $\pm$ 1.03	$187 \times 3, 600$	11.00 $\pm$ 2.94
$5^{2}$	$191 \times 2, 304$	4.52 $\pm$ 1.26	$187 \times 2, 304$	10.68 $\pm$ 2.87
$6^{2}$	$191 \times 1, 600$	4.39 $\pm$ 1.23	$187 \times 1, 600$	10.56 $\pm$ 2.92
$8^{2}$	$191 \times 900$	4.20 $\pm$ 1.17	$187 \times 900$	10.08 $\pm$ 2.89

Table 2. Table 2: Some settings with the semi-real data experiments.

	Chikusei	Indian Pines	Washinton DC Mall	University of Pavia	Moffett Field
MS response	IKONOS	Landsat 4 TM	Landsat 4 TM	IKONOS	Landsat 4 TM
Image size ( $M \times L$ )	128 $\times$ 230,400	178 $\times$ 14,400	191 $\times$ 57,600	103 $\times$ 57,600	187 $\times$ 57,600
Patch size ( $M \times L_{i}$ )	128 $\times$ 3,600	178 $\times$ 900	190 $\times$ 3,600	103 $\times$ 3,600	187 $\times$ 3,600
Patch number $P$	64	16	16	16	16

Table 3. Table 3: Average performance of the algorithms on semi-real datasets.

Method	Time (sec.)	PSNR	SAM	ERGAS	UIQI
Ideal value	0	$\infty$	0	0	1
Dataset - Chikusei
GSA	1.44 $\pm$ 0.15	33.11 $\pm$ 0.01	4.85 $\pm$ 0.01	6.59 $\pm$ 0.01	0.787 $\pm$ 0.000
GLP	43.44 $\pm$ 8.84	29.13 $\pm$ 0.00	5.36 $\pm$ 0.01	7.92 $\pm$ 0.01	0.733 $\pm$ 0.000
CNMF	147.54 $\pm$ 64.93	34.13 $\pm$ 0.11	4.21 $\pm$ 0.08	5.24 $\pm$ 0.11	0.811 $\pm$ 0.004
FUMI	293.31 $\pm$ 24.18	35.18 $\pm$ 0.04	2.55 $\pm$ 0.03	4.03 $\pm$ 0.03	0.882 $\pm$ 0.001
HySure	130.88 $\pm$ 11.80	36.15 $\pm$ 0.15	2.84 $\pm$ 0.09	4.31 $\pm$ 0.08	0.863 $\pm$ 0.003
LRSR	279.26 $\pm$ 43.04	35.84 $\pm$ 0.09	2.88 $\pm$ 0.03	4.41 $\pm$ 0.06	0.880 $\pm$ 0.002
NNM	91.95 $\pm$ 12.48	32.63 $\pm$ 0.01	4.96 $\pm$ 0.00	7.49 $\pm$ 0.01	0.764 $\pm$ 0.000
GLORIA	119.99 $\pm$ 21.78	37.73 $\pm$ 0.01	2.31 $\pm$ 0.00	3.59 $\pm$ 0.00	0.894 $\pm$ 0.000
Dataset - Indian Pines
GSA	0.44 $\pm$ 0.08	19.07 $\pm$ 2.24	7.19 $\pm$ 9.30	4.09 $\pm$ 2.78	0.512 $\pm$ 0.067
GLP	2.38 $\pm$ 0.51	18.06 $\pm$ 0.48	4.84 $\pm$ 0.09	3.66 $\pm$ 0.20	0.397 $\pm$ 0.034
CNMF	16.57 $\pm$ 4.04	22.64 $\pm$ 0.22	4.29 $\pm$ 0.10	2.43 $\pm$ 0.07	0.523 $\pm$ 0.009
FUMI	11.40 $\pm$ 0.30	24.85 $\pm$ 0.15	2.69 $\pm$ 0.05	1.76 $\pm$ 0.04	0.762 $\pm$ 0.007
HySure	10.96 $\pm$ 0.37	26.70 $\pm$ 0.16	2.74 $\pm$ 0.04	1.41 $\pm$ 0.03	0.698 $\pm$ 0.008
LRSR	18.80 $\pm$ 0.28	27.98 $\pm$ 0.14	2.53 $\pm$ 0.03	1.18 $\pm$ 0.02	0.796 $\pm$ 0.004
NNM	6.81 $\pm$ 0.89	26.43 $\pm$ 0.04	2.93 $\pm$ 0.01	1.71 $\pm$ 0.01	0.679 $\pm$ 0.002
GLORIA	14.56 $\pm$ 0.92	29.09 $\pm$ 0.03	2.28 $\pm$ 0.00	1.04 $\pm$ 0.00	0.804 $\pm$ 0.001
Dataset - Washington DC Mall
GSA	0.97 $\pm$ 0.09	20.09 $\pm$ 0.10	6.51 $\pm$ 0.03	20.35 $\pm$ 0.38	0.654 $\pm$ 0.004
GLP	7.68 $\pm$ 1.83	18.20 $\pm$ 0.09	7.00 $\pm$ 0.09	14.85 $\pm$ 0.06	0.582 $\pm$ 0.005
CNMF	31.74 $\pm$ 16.95	25.71 $\pm$ 0.19	3.84 $\pm$ 0.08	6.94 $\pm$ 0.22	0.772 $\pm$ 0.007
FUMI	44.33 $\pm$ 0.69	29.17 $\pm$ 0.10	2.49 $\pm$ 0.02	2.91 $\pm$ 0.06	0.915 $\pm$ 0.004
HySure	31.28 $\pm$ 8.81	28.82 $\pm$ 0.33	3.00 $\pm$ 0.07	3.74 $\pm$ 0.25	0.872 $\pm$ 0.009
LRSR	71.92 $\pm$ 1.55	27.91 $\pm$ 0.16	3.85 $\pm$ 0.10	3.59 $\pm$ 0.15	0.869 $\pm$ 0.007
NNM	27.63 $\pm$ 1.71	27.57 $\pm$ 0.12	3.28 $\pm$ 0.02	8.05 $\pm$ 0.03	0.813 $\pm$ 0.005
GLORIA	57.06 $\pm$ 8.14	29.36 $\pm$ 0.23	2.85 $\pm$ 0.02	3.18 $\pm$ 0.20	0.888 $\pm$ 0.002
Dataset - University of Pavia
GSA	0.49 $\pm$ 0.05	30.50 $\pm$ 0.02	6.79 $\pm$ 0.03	3.05 $\pm$ 0.01	0.900 $\pm$ 0.001
GLP	4.45 $\pm$ 1.14	25.71 $\pm$ 0.07	7.88 $\pm$ 0.11	5.54 $\pm$ 0.06	0.764 $\pm$ 0.005
CNMF	27.08 $\pm$ 21.05	35.86 $\pm$ 0.18	4.09 $\pm$ 0.09	1.84 $\pm$ 0.04	0.946 $\pm$ 0.002
FUMI	36.72 $\pm$ 0.81	35.29 $\pm$ 0.04	3.12 $\pm$ 0.01	1.76 $\pm$ 0.01	0.962 $\pm$ 0.000
HySure	27.46 $\pm$ 9.73	36.71 $\pm$ 0.14	3.34 $\pm$ 0.02	1.55 $\pm$ 0.01	0.961 $\pm$ 0.001
LRSR	58.29 $\pm$ 1.20	36.43 $\pm$ 0.09	3.45 $\pm$ 0.05	1.65 $\pm$ 0.03	0.959 $\pm$ 0.001
NNM	10.55 $\pm$ 0.41	36.81 $\pm$ 0.01	3.48 $\pm$ 0.00	1.55 $\pm$ 0.00	0.957 $\pm$ 0.000
GLORIA	16.68 $\pm$ 0.50	37.51 $\pm$ 0.01	3.16 $\pm$ 0.00	1.46 $\pm$ 0.00	0.964 $\pm$ 0.000
Dataset - Moffett Field
GSA	0.87 $\pm$ 0.06	30.56 $\pm$ 0.02	5.84 $\pm$ 0.01	2.79 $\pm$ 0.01	0.843 $\pm$ 0.000
GLP	6.50 $\pm$ 0.70	28.18 $\pm$ 0.19	5.30 $\pm$ 0.08	3.53 $\pm$ 0.07	0.775 $\pm$ 0.013
CNMF	17.38 $\pm$ 9.60	36.08 $\pm$ 0.12	3.19 $\pm$ 0.06	1.57 $\pm$ 0.03	0.919 $\pm$ 0.002
FUMI	35.07 $\pm$ 1.31	34.65 $\pm$ 0.02	2.48 $\pm$ 0.02	1.68 $\pm$ 0.01	0.946 $\pm$ 0.000
HySure	26.72 $\pm$ 7.39	36.90 $\pm$ 0.19	2.73 $\pm$ 0.05	1.41 $\pm$ 0.02	0.939 $\pm$ 0.002
LRSR	55.78 $\pm$ 0.69	36.97 $\pm$ 0.12	2.58 $\pm$ 0.04	1.36 $\pm$ 0.02	0.942 $\pm$ 0.001
NNM	22.79 $\pm$ 3.52	37.49 $\pm$ 0.10	2.51 $\pm$ 0.03	1.34 $\pm$ 0.01	0.940 $\pm$ 0.001
GLORIA	44.38 $\pm$ 5.46	38.23 $\pm$ 0.01	2.30 $\pm$ 0.00	1.22 $\pm$ 0.00	0.949 $\pm$ 0.000

Table 4. Table 4: Average performance of the algorithms on the synthetic data.

${𝖲𝖭𝖱}_{M}$ / ${𝖲𝖭𝖱}_{H}$ - 15dB
Method		PSNR	SAM	ERGAS	UIQI
Ideal value		$\infty$	0	0	1
GSA		10.98 $\pm$ 1.27	19.29 $\pm$ 4.71	9.49 $\pm$ 1.83	0.216 $\pm$ 0.055
GLP		14.76 $\pm$ 0.32	11.09 $\pm$ 0.18	5.90 $\pm$ 0.17	0.273 $\pm$ 0.048
CNMF		15.73 $\pm$ 0.41	10.52 $\pm$ 0.52	5.43 $\pm$ 0.22	0.299 $\pm$ 0.050
FUMI		17.63 $\pm$ 0.16	7.69 $\pm$ 0.42	4.32 $\pm$ 0.17	0.436 $\pm$ 0.055
HySure		17.02 $\pm$ 0.51	9.21 $\pm$ 0.32	4.77 $\pm$ 0.17	0.363 $\pm$ 0.064
LRSR		18.82 $\pm$ 0.72	6.98 $\pm$ 0.54	3.92 $\pm$ 0.26	0.434 $\pm$ 0.074
NNM		17.29 $\pm$ 0.50	8.88 $\pm$ 0.20	4.43 $\pm$ 0.10	0.372 $\pm$ 0.065
GLORIA	$P = 1$	18.46 $\pm$ 0.23	7.90 $\pm$ 0.10	4.00 $\pm$ 0.06	0.432 $\pm$ 0.060
	$P = 9$	21.46 $\pm$ 0.12	5.03 $\pm$ 0.13	3.00 $\pm$ 0.07	0.583 $\pm$ 0.051
	$P = 16$	21.76 $\pm$ 0.14	4.61 $\pm$ 0.10	2.83 $\pm$ 0.06	0.598 $\pm$ 0.053
${𝖲𝖭𝖱}_{M}$ / ${𝖲𝖭𝖱}_{H}$ - 25dB
GSA		18.64 $\pm$ 2.05	9.43 $\pm$ 9.14	4.91 $\pm$ 3.69	0.608 $\pm$ 0.102
GLP		19.41 $\pm$ 0.47	4.01 $\pm$ 0.14	3.40 $\pm$ 0.21	0.604 $\pm$ 0.068
CNMF		24.71 $\pm$ 0.35	3.89 $\pm$ 0.20	1.97 $\pm$ 0.10	0.729 $\pm$ 0.048
FUMI		22.54 $\pm$ 0.34	2.47 $\pm$ 0.14	2.51 $\pm$ 0.08	0.853 $\pm$ 0.034
HySure		31.33 $\pm$ 0.26	1.26 $\pm$ 0.09	0.90 $\pm$ 0.04	0.923 $\pm$ 0.021
LRSR		30.63 $\pm$ 0.22	1.51 $\pm$ 0.12	1.01 $\pm$ 0.05	0.914 $\pm$ 0.021
NNM		27.30 $\pm$ 0.44	2.78 $\pm$ 0.05	1.42 $\pm$ 0.03	0.820 $\pm$ 0.045
GLORIA	$P = 1$	28.99 $\pm$ 0.21	2.21 $\pm$ 0.08	1.25 $\pm$ 0.03	0.870 $\pm$ 0.029
	$P = 9$	30.70 $\pm$ 0.22	1.62 $\pm$ 0.08	1.00 $\pm$ 0.03	0.909 $\pm$ 0.023
	$P = 16$	31.32 $\pm$ 0.21	1.42 $\pm$ 0.10	0.90 $\pm$ 0.04	0.924 $\pm$ 0.020
${𝖲𝖭𝖱}_{M}$ / ${𝖲𝖭𝖱}_{H}$ - 35dB
GSA		21.31 $\pm$ 2.55	7.85 $\pm$ 11.01	4.38 $\pm$ 4.35	0.796 $\pm$ 0.109
GLP		21.07 $\pm$ 0.53	1.68 $\pm$ 0.09	2.79 $\pm$ 0.19	0.782 $\pm$ 0.070
CNMF		32.25 $\pm$ 0.35	1.76 $\pm$ 0.12	0.89 $\pm$ 0.06	0.925 $\pm$ 0.020
FUMI		23.45 $\pm$ 0.30	1.47 $\pm$ 0.11	2.32 $\pm$ 0.09	0.949 $\pm$ 0.010
HySure		35.93 $\pm$ 0.42	1.02 $\pm$ 0.10	0.55 $\pm$ 0.04	0.977 $\pm$ 0.006
LRSR		34.89 $\pm$ 0.40	1.03 $\pm$ 0.11	0.63 $\pm$ 0.04	0.971 $\pm$ 0.007
NNM		34.87 $\pm$ 0.32	1.19 $\pm$ 0.02	0.59 $\pm$ 0.01	0.959 $\pm$ 0.014
GLORIA	$P = 1$	35.48 $\pm$ 0.51	1.09 $\pm$ 0.08	0.58 $\pm$ 0.05	0.966 $\pm$ 0.012
	$P = 9$	36.18 $\pm$ 0.22	1.01 $\pm$ 0.03	0.52 $\pm$ 0.01	0.970 $\pm$ 0.011
	$P = 16$	36.73 $\pm$ 0.49	0.96 $\pm$ 0.09	0.53 $\pm$ 0.04	0.974 $\pm$ 0.008

Table 5. Table 5: Runtime performance of various MM schemes.

	Exact MM	Inexact MM via nominal PG	Inexact MM via APG (GLORIA)
Time (sec.)	214.16 $\pm$ 114.83	102.48 $\pm$ 2.44	24.25 $\pm$ 0.35

Equations77

Y_{M}

Y_{M}

Y_{H}

X = A S,

X = A S,

A \geq 0, S \geq 0 min ∥ Y_{M} - F A S ∥_{F}^{2} + ∥ Y_{H} - A S G ∥_{F}^{2},

A \geq 0, S \geq 0 min ∥ Y_{M} - F A S ∥_{F}^{2} + ∥ Y_{H} - A S G ∥_{F}^{2},

X \in R^{M \times L} min ℓ (X) + γ rank (X),

X \in R^{M \times L} min ℓ (X) + γ rank (X),

ℓ (X) = \frac{1}{2} ∥ Y_{M} - F X ∥_{F}^{2} + \frac{1}{2} ∥ Y_{H} - X G ∥_{F}^{2} .

ℓ (X) = \frac{1}{2} ∥ Y_{M} - F X ∥_{F}^{2} + \frac{1}{2} ∥ Y_{H} - X G ∥_{F}^{2} .

rank (X) = i = 1 \sum m i n {M, L} u (σ_{i} (X)),

rank (X) = i = 1 \sum m i n {M, L} u (σ_{i} (X)),

∥ X ∥_{*} = i = 1 \sum m i n {M, L} σ_{i} (X) .

∥ X ∥_{*} = i = 1 \sum m i n {M, L} σ_{i} (X) .

X \in R^{M \times L} min ℓ (X) + γ ∥ X ∥_{*} .

X \in R^{M \times L} min ℓ (X) + γ ∥ X ∥_{*} .

X = [X_{1} X_{2} \dots X_{P}],

X = [X_{1} X_{2} \dots X_{P}],

X_{i} = A_{i} S_{i},

X_{i} = A_{i} S_{i},

\frac{\sum _{j = 1}^{r} σ _{j} ( X _{i} ) ^{2}}{\sum _{j = 1}^{m i n {M, L_{i}}} σ _{j} ( X _{i} ) ^{2}} \geq 0.9999.

\frac{\sum _{j = 1}^{r} σ _{j} ( X _{i} ) ^{2}}{\sum _{j = 1}^{m i n {M, L_{i}}} σ _{j} ( X _{i} ) ^{2}} \geq 0.9999.

X \in X min ℓ (X) + i = 0 \sum P γ_{i} rank (X_{i}),

X \in X min ℓ (X) + i = 0 \sum P γ_{i} rank (X_{i}),

A^{p} = U Λ^{p} U^{T},

A^{p} = U Λ^{p} U^{T},

ϕ_{p, τ} (X) = i = 1 \sum M (σ_{i} (X)^{2} + τ)^{p /2} = tr ((X X^{T} + τ I)^{p /2}),

ϕ_{p, τ} (X) = i = 1 \sum M (σ_{i} (X)^{2} + τ)^{p /2} = tr ((X X^{T} + τ I)^{p /2}),

X \in X min ℓ (X) + i = 0 \sum P γ_{i} ϕ_{p, τ} (X_{i}),

X \in X min ℓ (X) + i = 0 \sum P γ_{i} ϕ_{p, τ} (X_{i}),

f (X) = ℓ (X) + i = 0 \sum P γ_{i} ϕ_{p, τ} (X_{i}) .

f (X) = ℓ (X) + i = 0 \sum P γ_{i} ϕ_{p, τ} (X_{i}) .

g (X; \overset{ˉ}{X}) \geq f (X), g (\overset{ˉ}{X}; \overset{ˉ}{X}) = f (\overset{ˉ}{X}), \forall X, \overset{ˉ}{X} \in X .

g (X; \overset{ˉ}{X}) \geq f (X), g (\overset{ˉ}{X}; \overset{ˉ}{X}) = f (\overset{ˉ}{X}), \forall X, \overset{ˉ}{X} \in X .

X^{k + 1} = ar g X \in X min g (X; X^{k}), k = 0, 1, \dots

X^{k + 1} = ar g X \in X min g (X; X^{k}), k = 0, 1, \dots

ϕ_{p, τ} (X) = ar g W \in S_{++}^{M} min ψ_{p, τ} (X, W),

ϕ_{p, τ} (X) = ar g W \in S_{++}^{M} min ψ_{p, τ} (X, W),

ψ_{p, τ} (X, W) = \frac{p}{2} tr (W (X X^{T} + τ I)) + \frac{2 - p}{p} tr (W^{\frac{p}{p - 2}}) .

ψ_{p, τ} (X, W) = \frac{p}{2} tr (W (X X^{T} + τ I)) + \frac{2 - p}{p} tr (W^{\frac{p}{p - 2}}) .

W^{⋆} = (X X^{T} + τ I)^{\frac{p}{2} - 1} .

W^{⋆} = (X X^{T} + τ I)^{\frac{p}{2} - 1} .

g (X; X^{k}) = ℓ (X) + i = 0 \sum P γ_{i} ψ_{p, τ} (X_{i}, W_{i}^{k}),

g (X; X^{k}) = ℓ (X) + i = 0 \sum P γ_{i} ψ_{p, τ} (X_{i}, W_{i}^{k}),

W_{i}^{k} = (X_{i}^{k} (X_{i}^{k})^{T} + τ I)^{\frac{p}{2} - 1}, i = 0, \dots, P .

W_{i}^{k} = (X_{i}^{k} (X_{i}^{k})^{T} + τ I)^{\frac{p}{2} - 1}, i = 0, \dots, P .

X \in X min ℓ (X) + i = 0 \sum P \frac{p γ _{i}}{2} tr (W_{i}^{k} X_{i} X_{i}^{T}) .

X \in X min ℓ (X) + i = 0 \sum P \frac{p γ _{i}}{2} tr (W_{i}^{k} X_{i} X_{i}^{T}) .

g_{k} (X) = ℓ (X) + i = 0 \sum P \frac{p γ _{i}}{2} tr (W_{i}^{k} X_{i} X_{i}^{T})

g_{k} (X) = ℓ (X) + i = 0 \sum P \frac{p γ _{i}}{2} tr (W_{i}^{k} X_{i} X_{i}^{T})

X^{j + 1} = Π_{X} (Z^{j} - \frac{1}{β _{j}} \nabla g_{k} (Z^{j})), j = 0, 1, \dots

X^{j + 1} = Π_{X} (Z^{j} - \frac{1}{β _{j}} \nabla g_{k} (Z^{j})), j = 0, 1, \dots

Z^{j} = X^{j} + α_{j} (X^{j} - X^{j - 1});

Z^{j} = X^{j} + α_{j} (X^{j} - X^{j - 1});

α_{j} = \frac{ξ _{j - 1} - 1}{ξ _{j}}, ξ_{j} = \frac{1 + 1 + 4 ξ _{j - 1}^{2}}{2},

α_{j} = \frac{ξ _{j - 1} - 1}{ξ _{j}}, ξ_{j} = \frac{1 + 1 + 4 ξ _{j - 1}^{2}}{2},

\nabla g_{k} (X)

\nabla g_{k} (X)

=

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

REIYANG/GLORIA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\fail

Hyperspectral Super-Resolution via

Global-Local Low-Rank Matrix Estimation

Ruiyuan Wu*†, Wing-Kin Ma†, Xiao Fu‡*, and Qiang Li⋆

*†*Department of Electronic Engineering, The Chinese University of Hong Kong,

Hong Kong SAR of China

*‡*School of Electrical Engineering and Computer Science,

Oregon State University, Corvallis, USA

⋆School of Information and Communications Engineering,

University of Electronic Science and Technology of China, China

E-mails: [email protected], [email protected],

[email protected], [email protected]

Abstract

Hyperspectral super-resolution (HSR) is a problem that aims to estimate an image of high spectral and spatial resolutions from a pair of co-registered multispectral (MS) and hyperspectral (HS) images, which have coarser spectral and spatial resolutions, respectively. In this paper we pursue a low-rank matrix estimation approach for HSR. We assume that the spectral-spatial matrices associated with the whole image and the local areas of the image have low-rank structures. The local low-rank assumption, in particular, has the aim of providing a more flexible model for accounting for local variation effects due to endmember variability. We formulate the HSR problem as a global-local rank-regularized least-squares problem. By leveraging on the recent advances in non-convex large-scale optimization, namely, the smooth Schatten- $p$ approximation and the accelerated majorization-minimization method, we develop an efficient algorithm for the global-local low-rank problem. Numerical experiments on synthetic, semi-real and real data show that the proposed algorithm outperforms a number of benchmark algorithms in terms of recovery performance.

1 Introduction

Hyperspectral (HS) sensors have limited spatial resolution as a tradeoff for achieving high spectral resolution. Such a tradeoff is due to hardware limitations and the measurement mechanism. In a nutshell, a certain amount of light energy reflected by the scene is required for each spectral band of an HS pixel to achieve sufficiently high SNRs. For a sensor with coarse spectral resolution, enough energy can be acquired from a small area by accumulating energy of a wide range of spectral bands. When the spectral resolution increases, the area sensed by a pixel needs to be enlarged to acquire the same amount of energy for each spectral band, which leads to a lower spatial resolution. How to enhance the spatial resolution of HS images has been a subject of great interest. Recently, the idea of using an additional multispectral (MS) image—which has finer spatial resolution than the HS but possesses only several coarse spectral bands—for HS spatial resolution enhancement has shed new light on the subject. This MS-aided enhancement problem is called hyperspectral super-resolution (HSR) or HS-MS data fusion. One approach for HSR is to adapt pansharpening techniques in fusion of panchromatic and HS images [1]. Another approach, which is currently more popular, is based on low-dimensional data models. The low-dimensional model approach assumes that the spectral pixels of the target high-spatial-resolution HS image, or the super-resolution (SR) image for short, lie in a low-dimensional subspace. This assumption is particularly reasonable in the linear spectral mixture scenario: Since the aforesaid scenario has every spectral pixel described as a linear combination of the spectral signatures of the underlying endmembers, the spectral pixels lie in a subspace spanned by the endmember spectral signatures. Also, since the number of endmembers is often much smaller than the number of HS spectral bands, the subspace dimension is low. The low-dimensional model may also be applicable to some classes of nonlinear spectral mixtures such as the bilinear mixture model [2, 3]. The low-dimensional model approach has strong connections to hyperspectral unmixing (HU). To be specific, insights and methods in HU are quite often used in the low-dimensional model approach. A comparative review has shown that the low-dimensional model approach can lead to better recovery than those from the pansharpening approach, assuming no or negligible HS-MS co-registration error [1].

To facilitate our discussion later, in this paper we taxonomize the existing low-dimensional model-based HSR methods into two types.

Matrix factorization: This type of methods models the spectral-spatial matrix of the SR image as a product of two matrix factors—one being the spectral dictionary, and another the coefficients for low-dimensional representation, and it seeks to jointly estimate the two matrix factors from the observed HS-MS pair. Coupled non-negative matrix factorization (CNMF) [4], a pioneering HSR method, falls into this type. As its name suggests, CNMF exploits the non-negativity of the matrix factors. Subsequent research follows the same route and exploits other problem structures—sparsity [5], the sum-to-one abundance condition from the linear spectral mixture model [6, 7], non-local pixel similarity [8], and many more—to attempt to improve recovery quality. 2. 2.

Dictionary-based regression: This type of methods also assumes that the spectral-spatial matrix of the SR image is the product of a spectral dictionary and the associated coefficient matrix. The difference is that it does not seek to jointly estimate the spectral dictionary and the coefficients. It first determines the spectral dictionary via some easy way, and then uses that spectral dictionary to perform regression to recover the coefficients. A typical example is HySure [9], which extracts the spectral dictionary by applying vertex component analysis (VCA) [10] to the HS image, and then recovers the coefficients by applying spatial total variation-regularized linear regression to the HS-MS image pair. Other methods include [11, 12], which exploit the local low-rank structure; this will be further discussed later. The dictionary-based regression methods are easy to implement compared to the matrix factorization methods.

Research on these two types of methods is mostly focused on the practical aspects, and it is worthwhile to note that some specific methods have recently been shown to exhibit theoretical recovery guarantees as well [13, 14]—which supports the soundness of the low-dimensional model approach via a theoretical lens.

Matrix factorization and dictionary-based regression are considered most representative in HSR methods, although there are others. For example, tensor factorization has recently been studied for HSR [15, 16, 17, 18, 19, 20]. The tensor model is also a low-dimensional model, and it exploits not only the spectral-spatial structure but also the two-dimensional spatial structure. Tensor factorization is shown to exhibit favorable sufficient conditions on exact recovery guarantees [16, 17]. In addition, deep learning for HSR has most recently gained growing interest [21, 22, 23].

Under the low-dimensional model, HSR can be seen as a problem of recovering a low-rank matrix from incomplete observations; this will be elaborated upon in Sections 3 and 4. From this perspective, the problem is nearly the same as the matrix completion problem which has drawn widespread interest in fields such as recommender systems, machine learning and mathematical optimization [24, 25, 26]. The problem in matrix completion is that we have a matrix with many missing entries, and we aim to recover the missing entries from the available entries. The main assumption in matrix completion is that the matrix to be recovered has low rank structure. This assumption is the same as the low-dimensional model assumption in HSR. In matrix completion we see two main types of methods. One is matrix factorization, which shares the same rationale as matrix factorization for HSR. Another is low-rank matrix estimation. This approach does not pre-determine the target dimension of the low-dimensional subspace, or the target rank of the matrix to be recovered, which is the case in matrix factorization. Instead of using the matrix factorization model, it seeks the minimum rank matrix for accomplishing the task. A well-known method in low-rank matrix estimation is the nuclear norm minimization method [24]. It is a convex solution, and the idea is to approximate the hard-to-handle rank function by the nuclear norm which is convex. Non-convex rank approximation, such as the Schatten- $p$ approximation, was also considered for approximating the low-rank problem better [27]. Back to HSR, while we have seen numerous studies on matrix factorization and dictionary-based regression, we see far less on low-rank matrix estimation.

In addition, and as a different issue, the existing low-dimensional model-based HSR methods are usually not designed to account for the endmember variability (EV) effects due, for instance, to illumination conditions and intrinsic spectral variability of the materials [28]. In low-dimensional models, EV means that the spectral dictionary can vary in space. Taking a step back to HU, we have seen studies that use the matrix factorization method to deal with endmember variability [29, 30, 31]. In that regard, a possibility one can consider is to adapt such matrix factorization methods to the HSR application. We are however unaware of such development as of the writing of this paper.

In this paper, our objective is to explore the potential of low-rank matrix estimation in HSR. Our study is not a direct application of the existing low-rank matrix estimation methods, such as the nuclear norm minimization method. Our formulation takes the possibility of EV into consideration. We posit a global-local low-rank structure with the SR image, in which not only the spectral-spatial matrix of the whole SR image has low-rank structure, but that of each local area also has another low-rank structure. This assumption means that each local area can have its low-dimensional representation. The local low-rank assumption provides the model with the flexibility to account for EV. Moreover, since the low-dimensional subspaces of the local areas should be related, particularly the neighboring ones, we also assume that the whole spectral-spatial matrix has low rank and utilize it to tie the local subspaces together. As we will see, our global-local low-rank matrix estimation leads to a fairly clean formulation. In comparison, if one applies the EV-present matrix factorization methods in HU to HSR, the resulting formulation would be more complicated.

The arising challenge and our proposed solution should be described. The global-local low-rank matrix estimation problem is a non-convex large-scale problem. For example, to recover an SR image of $100$ spectral bands and of size $200\times 200$ , our problem requires us to handle $100\times 200^{2}=4,000,000$ optimization variables. An efficient optimization strategy is clearly needed. We attack the problem by leveraging on the recent advances in non-convex large-scale optimization, namely, the smooth Schatten- $p$ approximation [27] and an accelerated version of the majorization-minimization (MM) method [32]. As mentioned, the smooth Schatten- $p$ approximation, albeit non-convex, approximates rank better in comparison with the convex nuclear norm. Also, its smooth nature enables us to access powerful machinery in smooth optimization. The accelerated MM method is a combination of inexact MM and the accelerated projected gradient (PG) method. Our recent research in another context [32]—which also deals with large problem sizes—has suggested that this type of accelerated methods runs very fast in practice. Using the above two techniques, we develop a fast algorithm called Global-Local lOw-Rank promotIng Algorithm (GLORIA). As will be shown by numerical results, GLORIA gives competitive recovery performance compared to the state of the arts and related methods. We conducted semi-real experiments on five different datasets, and GLORIA consistently ranks first or second in performance indicators such as peak SNR and spectral angle mappers. We also provide results on synthetic and real data experiments, in which GLORIA also exhibits promising performance.

Before we proceed to the description of our method, we should mention related works. First, the dictionary-based regression method in [11], HSR-LDL-EIA, also utilizes some kind of local low-rank structures. HSR-LDL-EIA considers the linear spectral mixture model and assumes that the number of endmembers in each local area is very small and no greater than the number of MS bands. With that assumption, the HSR problem can be easily solved in a local-area-by-local-area fashion—which is what HSR-LDL-EIA does. The local low-rank assumption used in HSR-LDL-EIA is not the same as the one used by us. As discussed earlier, our local low-rank assumption is to cater for EV. Second, the recent work in [12] takes insight from the local low-rank assumption in HSR-LDL-EIA and proposes a dictionary-based regression method using local nuclear norm-regularized linear regression. Again, the local low-rank assumption in [12] is founded on the argument that the number of endmembers in each local area is small. We should also mention the works [33, 34] which follow similar rationales as those in [11, 12].

Let us summarize our contributions.

We consider low-rank matrix estimation for HSR, which has not been studied in prior works. We propose a global-local low-rank approach which aims to account for EV effects. 2. 2.

The global-local low-rank approach requires us to tackle a large-scale non-convex optimization problem. We custom-develop an efficient algorithm for such purpose, using recent advances in large-scale non-convex optimization. As will be shown in numerical experiments, our algorithm has competitive recovery performance.

Readers who are interested in trying our algorithm can find the source codes at https://github.com/REIYANG/GLORIA.

2 Background

2.1 The Measurement Model

Let us begin by providing a concise review of the background. Fig. 1 depicts the scenario. We have a scene observed by an HS sensor and an MS sensor. The MS sensor has a lower spectral resolution than the HS sensor, while the HS sensor has a lower spatial resolution than the MS sensor. The goal of HSR is to use the observed MS and HS images to construct a higher resolution image whose spectral resolution is identical to that of the HS sensor, and spatial resolution the MS. For convenience, the image we seek to construct will be called the super-resolution (SR) image. As a common assumption (see, e.g., [4]), the HS image is modeled as a spatially degraded version of the SR image by means of spatial blurring and down-sampling. Also, the MS image is modeled as spectrally degraded version of the SR image by means of spectral bandpass averaging.

The HSR data model is as follows. Assuming that the HS and MS images are co-registered, we model the HS and MS images as

[TABLE]

where ${\bm{Y}}_{\rm M}\in\mathbb{R}^{M_{\rm M}\times L}$ and ${\bm{Y}}_{\rm H}\in\mathbb{R}^{M\times L_{\rm H}}$ are the spectral-spatial matrices of the observed MS and HS images, respectively (resp.); $M_{\rm M}$ and $M$ are the numbers of spectral bands of the MS and HS images, resp., with $M_{\rm M}<M$ ; $L$ and $L_{\rm H}$ are the numbers of pixels of the MS and HS images, resp., with $L_{\rm H}<L$ ; ${\bm{X}}\in\mathbb{R}^{M\times L}$ is the spectral-spatial matrix of the SR image; ${\bm{F}}\in\mathbb{R}^{M_{\rm M}\times M}$ and ${\bm{G}}\in\mathbb{R}^{L\times L_{\rm H}}$ describe the measurement responses that lead to the MS and HS images, resp.; ${\bm{V}}_{\rm M}$ and ${\bm{V}}_{\rm H}$ are noise. Note that ${\bm{F}}$ designates the relative spectral bandpass responses from the SR image to the MS image, while ${\bm{G}}$ specifies the spatial blurring and down-sampling responses that result in the HS image. The measurement response matrices ${\bm{F}}$ and ${\bm{G}}$ are assumed to be known, and in practice ${\bm{F}}$ and ${\bm{G}}$ can be acquired either by calibration [4] or by estimation from the HS and MS images [35, 9]. Furthermore, the MS and HS images are measured by means of reflectance, with values lying between [math] and $1$ . As such, we may assume that $x_{ij}\in[0,1]$ , for all $i,j$ .

2.2 The Matrix Factorization Model

Next, we describe the matrix factorization model which is the core assumption for matrix factorization and dictionary-based regression methods in HSR. In the matrix factorization model we assume that the SR image ${\bm{X}}$ can be factored as

[TABLE]

where ${\bm{A}}\in\mathbb{R}^{M\times N}$ is the spectral dictionary; ${\bm{S}}\in\mathbb{R}^{N\times L}$ is the coefficient matrix; $N$ is the target rank or model order, which is pre-fixed and is often chosen to be much less than $M$ and $L$ . In many existing studies, the model (2) is seen as the linear spectral mixture model in which the columns ${\bm{a}}_{i}$ ’s of ${\bm{A}}$ are interpreted as spectral signatures of the endmembers of the scene, and the columns ${\bm{s}}_{i}$ ’s of ${\bm{S}}$ the associated abundances of the pixels. Under the model (2), the matrix factorization methods seek to find $({\bm{A}},{\bm{S}})$ from $({\bm{Y}}_{\rm M},{\bm{Y}}_{\rm H})$ by minimizing a data-fitting loss function, and thereby reconstruct ${\bm{X}}$ by ${\bm{X}}={\bm{A}}{\bm{S}}$ . For example, in CNMF [4], the idea is to solve

[TABLE]

where ${\bm{X}}\geq{\bm{0}}$ means that ${\bm{X}}$ is element-wise non-negative; $\|\cdot\|_{F}$ denotes the Frobenius norm. CNMF, as well as other matrix factorization formulations, considers structured factors to better exploit the underlying problem structure. CNMF utilizes the fact that, under the linear spectral mixture model, ${\bm{A}}$ and ${\bm{S}}$ are non-negative. Moreover, in the dictionary-based regression methods, one first determines ${\bm{A}}$ from ${\bm{Y}}_{\rm H}$ by methods such as principal component analysis (PCA) or VCA. Then, ${\bm{S}}$ is estimated by solving the data-fitting problem like the one in (3), but with ${\bm{A}}$ fixed. In estimating ${\bm{S}}$ , regularization such as total variation may be added for problem structure exploitation [9]. A common, and key, concept behind the various matrix factorization and dictionary-based regression methods is that although ${\bm{X}}$ is a high-dimensional matrix, we hold the belief that every spectral pixel ${\bm{x}}_{i}$ of ${\bm{X}}$ should lie in a low-dimensional subspace spanned by ${\bm{a}}_{1},\ldots,{\bm{a}}_{N}$ .

As an alternative view, matrix factorization and dictionary-based regression may be regarded as methods for estimating a low-rank matrix ${\bm{X}}$ from the incomplete HS-MS observations. Specifically, the low-dimensional subspace assumption with ${\bm{X}}$ implies ${\rm rank}({\bm{X}})={\rm rank}({\bm{A}}{\bm{S}})\leq N$ .

3 Global-Local Low-Rank Formulation

This section describes the main development of this paper, global-local low-rank matrix estimation.

3.1 A Brief Review of Low-Rank Matrix Estimation

The low-rank matrix estimation methods have recently become popular in the context of matrix completion [24, 26]. Let us first describe how the de facto standard in low-rank matrix estimation, namely, the nuclear norm approximation, can be applied to HSR. In low-rank matrix estimation, we assume ${\bm{X}}$ to be a low-rank matrix. This assumption can be interpreted as requiring the columns ${\bm{x}}_{1},\ldots,{\bm{x}}_{L}$ to lie in a low-dimensional subspace. The matrix factorization model (2) can also be seen as constraining ${\bm{x}}_{1},\ldots,{\bm{x}}_{L}$ to lie in a low-dimensional subspace, with the subspace dimension no greater than $N$ . Hence, both the low-rank matrix estimation and matrix factorization methods exploit low-dimensional data structures. The idea with low-rank matrix estimation is to find a low-rank ${\bm{X}}$ whose data fitting loss is small. A common low-rank matrix estimation formulation is as follows

[TABLE]

where $\gamma>0$ is given and is called a regularization parameter; $\ell:\mathbb{R}^{M\times L}\rightarrow\mathbb{R}$ denotes the data-fitting loss function and is given by

[TABLE]

Let us give a brief comparison of low-rank matrix estimation and matrix factorization. Matrix factorization methods, such as the one in (3), require one to pre-determine the target rank $N$ . There is no such rank constraint in low-rank matrix estimation. The rank of ${\bm{X}}$ itself serves as the regularization for the low-rank matrix recovery endeavor, and the parameter $\gamma$ controls the balance between low rankness and goodness of data fitting.

The challenge with solving problem (4) is that the rank function in (4) is hard to handle; it is non-convex and non-differentiable. The state of the art handles this issue by applying the nuclear norm approximation. The reader is referred to the literature [26] for detailed description of the concept, and here we concisely explain the idea. The rank of ${\bm{X}}$ is identical to the number of nonzero singular values of ${\bm{X}}$ , and hence we can express ${\rm rank}({\bm{X}})$ as

[TABLE]

where $\sigma_{i}({\bm{X}})\geq 0$ denotes the $i$ th largest singular value of ${\bm{X}}$ ; $u(y)=1$ if $y>0$ , and $u(y)=0$ if $y=0$ . The idea with nuclear norm approximation is to approximate ${\rm rank}({\bm{X}})$ by removing $u$ from (6), which leads to the following approximate function:

[TABLE]

The above function is called the nuclear norm and is known to be convex [24, 26]. Applying this nuclear norm approximation of rank to problem (4) gives rise to the following problem:

[TABLE]

The advantage of the above approximation is that it is a convex problem. Moreover, problem (7) can be efficiently solved by methods such as the accelerated proximal gradient method [36] and ADMM [37].

3.2 The Global-Local Low-Rank Model

We consider a global-local low-rank assumption for the HSR problem. To explain the idea, consider the illustration in Fig. 2. We segment the SR image into a number of local patches. Our belief is that each local patch exhibits its own local rank structure. Or, the low-dimensional subspace of one patch does not need to be the same as that of another. This assumption appears to make sense since real images may have local variation effects due to EV. Moreover, we still keep the old low-rank assumption with ${\bm{X}}$ . This is because the low-dimensional subspace of one patch should be related to those of its neighboring patches, and such correlations may result in a low-rank ${\bm{X}}$ in the global sense (with a higher global rank than the local ranks). Alternatively speaking, the low-dimensional subspace of the whole ${\bm{X}}$ plays the role of tying together the low-dimensional subspaces of the local patches.

To write down the global-local low-rank assumption, we assume that the pixel indices of the image are arranged such that ${\bm{X}}$ can be conveniently expressed as

[TABLE]

where each ${\bm{X}}_{i}\in\mathbb{R}^{M\times L_{i}}$ is the spectral-spatial matrix of a local area, or local patch, of the image; $P$ is the number of patches; $L_{i}$ is the number of pixels of patch $i$ . For example, as illustrated in Fig. 2, we can divide the image into equal-space rectangular blocks as our local patches. Other ways to form the local patches, e.g., via segmentation [11], may also be considered. We assume that every ${\bm{X}}_{i}$ is a low-rank matrix, and ${\bm{X}}$ is also a low-rank matrix.

The global-local low-rank assumption stated above is fairly general and does not restrict itself to specific mixture models such as the linear spectral mixture model. On the other hand, we can better understand the assumption by a more concrete example, in which the linear spectral mixture model is used, as follows. Suppose we model each ${\bm{X}}_{i}$ to follow the linear spectral mixture model

[TABLE]

where ${\bm{A}}_{i}\in\mathbb{R}^{M\times N}$ and ${\bm{S}}_{i}\in\mathbb{R}^{N\times L_{i}}$ are the endmember and abundance matrices of patch $i$ ; $N$ is the total number of endmembers in the whole SR image ${\bm{X}}$ . In this model, we assume the presence of EV by allowing the endmember matrix ${\bm{A}}_{i}$ to be different for each patch. Note that our model assumes that EV appears at the patch level, not at the pixel level, and this can be justified if the local region of each patch is small enough such that the endmember spectral signatures experience little or no variation within the patch. We also want to impose an assumption that ${\bm{A}}_{1},\ldots,{\bm{A}}_{P}$ are correlated, since, in reality, they should be variations of one another. Such correlation would mean that ${\bm{A}}_{1},\ldots,{\bm{A}}_{P}$ can be linearly represented by a “global” basis ${\bm{B}}$ , whose dimension is no less than $N$ , but not significantly greater than $N$ owing to the correlations. This further means that ${\bm{X}}$ lives in a low-dimensional subspace with ${\bm{B}}$ as its basis. Our global low-rank assumption is to exploit the global low-dimensional structure.

Additionally, in the above motivating model example, at first sight one would be tempted to say that the rank of ${\bm{X}}_{i}$ is universally given by ${\rm rank}({\bm{X}}_{i})=N$ (under the slightly technical premise that ${\bm{A}}_{i}$ and ${\bm{S}}_{i}$ have full column rank and full row rank, resp.). In fact, it is reasonable to assume non-identical ${\rm rank}({\bm{X}}_{i})$ . In practice, it is likely that each local region is composed of a small number of endmembers, rather than all of the endmembers. Thus we can assume that among all the rows ${\bm{s}}_{i}^{1},\ldots,{\bm{s}}_{i}^{N}$ of ${\bm{S}}_{i}$ , only $N_{i}$ of them are nonzero (or active). Consequently we have ${\rm rank}({\bm{X}}_{i})=N_{i}$ (again, under the technical premise that ${\bm{A}}_{i}$ has full column rank and that the nonzero ${\bm{s}}_{i}^{j}$ ’s are linearly independent), which is our local low-rank assumption.

3.3 An Experiment

To support our argument that the global-local low-rank structure would be a reasonable assumption, we perform the following numerical experiment. We take real HS images and numerically evaluate their global and low rank values. The images come from six different datasets, namely, Chikusei, Cuprite, Indian Pines, University of Pavia, Washington DC Mall and Moffett Field. They are shown in Fig. 3 in color composite forms. For each image, we obtain the local patches ${\bm{X}}_{i}$ ’s by the equal-space rectangular segmentation in Fig. 2. For each local patch ${\bm{X}}_{i}$ we evaluate its rank in an approximate manner, specifically, by finding the smallest integer $r$ such that

[TABLE]

Table 1 shows the approximate ranks of the tested images under different patch sizes. One can clearly see that all the tested images exhibit global-local low-rank characteristics. For example, for the Chikusei image, the global rank is $10$ while the average local rank for $P=16^{2}$ is around $6.5$ .

3.4 The Global-Local Low-Rank Matrix Estimation Formulation

Under the global-local low-rank assumption in the preceding subsection, it is natural to formulate the HSR problem as the following global-local low-rank matrix estimation problem:

[TABLE]

where $\gamma_{0},\gamma_{1},\cdots,\gamma_{P}>0$ are given regularization parameters; we denote ${\bm{X}}_{0}={\bm{X}}$ for notational convenience; $\ell$ has been defined in (5); $\mathcal{X}\subseteq\mathbb{R}^{M\times L}$ is given by $\mathcal{X}=[0,1]^{M\times L}$ .

As reviewed previously, the standard approach to handle problem (4) is to approximate each ${\rm rank}({\bm{X}}_{i})$ by the nuclear norm $\|{\bm{X}}_{i}\|_{*}$ . Here we pursue a different option, namely, the the smooth Schatten- $p$ approximation [27]. To put it into context, let us first define a notation. Given a symmetric $n\times n$ positive definite matrix ${\bm{A}}$ and a number $p$ , we define

[TABLE]

where ${\bm{U}}$ and ${\bm{\Lambda}}$ constitute the eigen-decomposition ${\bm{A}}={\bm{U}}{\bm{\Lambda}}{\bm{U}}^{T}$ ; note that ${\bm{U}}$ is orthogonal, ${\bm{\Lambda}}=\mathrm{Diag}(\lambda_{1},\ldots,\lambda_{n})$ with $\lambda_{1}\geq\ldots\geq\lambda_{n}>0$ , and ${\bm{\Lambda}}^{p}=\mathrm{Diag}(\lambda_{1}^{p},\ldots,\lambda_{n}^{p})$ . The smooth Schatten- $p$ function of an $M\times L$ matrix ${\bm{X}}$ , with $M\leq L$ , is defined as

[TABLE]

where $p>0,\tau>0$ are given. This function has the following properties:

(i)

$\phi_{p,\tau}$ is smooth (or has derivatives of all orders); 2. (ii)

$\phi_{p,\tau}$ is convex for $p\geq 1$ , and non-convex for $p<1$ ; 3. (iii)

as $\tau\rightarrow 0$ , $\phi_{1,\tau}({\bm{X}})\rightarrow\|{\bm{X}}\|_{*}$ ; 4. (iv)

as $p\rightarrow 0,\tau\rightarrow 0$ , $\phi_{p,\tau}({\bm{X}})\rightarrow{\rm rank}({\bm{X}})$ .

As can be seen in the above properties, the smooth Schatten- $p$ function is a smooth approximation of rank. Ideally we would like to choose very small $p$ and $\tau$ so that $\phi_{p,\tau}({\bm{X}})$ closely approximates ${\rm rank}({\bm{X}})$ , but using very small $p$ and $\tau$ will also make $\phi_{p,\tau}$ poorly behaved (e.g., very large Lipschitz constant of the gradient of $\phi_{p,\tau}$ ).

By replacing ${\rm rank}({\bm{X}}_{i})$ in problem (9) with the smooth Schatten- $p$ function, we obtain the Schatten- $p$ approximation of the global-local low-rank matrix estimation formulation (9) as follows:

[TABLE]

where $p$ and $\tau$ are given. We will be interested in the case of $p<1$ . The corresponding problem (11) is non-convex, but we found that, empirically, using $p<1$ results in better recovery performance than using $p=1$ (the smooth nuclear norm case).

4 Global-Local Low-Rank Algorithm

In this section, we develop an algorithm for the global-local low-rank matrix estimation formulation (9). Problem (9) is a large-scale optimization problem with $ML$ optimization variables, and computational efficiency is a main concern in algorithm design. The algorithm to be presented is custom-designed for problem (9), where we exploit the problem structure for computational efficiency. The main optimization concepts used in our algorithm design are majorization-minimization (MM) and the accelerated projected gradient method.

4.1 Majorization-Minimization

Firstly, we describe the MM method. For notational convenience, let

[TABLE]

In MM we seek a function $g({\bm{X}};\bar{{\bm{X}}})$ , called a majorant of $f$ , that satisfies

[TABLE]

We also require that, for any given $\bar{{\bm{X}}}\in\mathcal{X}$ , $g(\cdot;\bar{{\bm{X}}})$ is convex and continuously differentiable. Given a starting point ${\bm{X}}^{0}\in\mathcal{X}$ , the MM method handles problem (9) by iteratively solving

[TABLE]

where the problem at each iteration in (13) is a convex problem. The MM iteration (13) is known to guarantee convergence to a stationary solution to problem (9) [38]. We identify a majorant for problem (9) by resorting to the following result.

Fact 1

[27, Sec. 3.1 and Appendix A]** For $0<p\leq 1$ , the smooth Schatten- $p$ function admits an alternative characterization

[TABLE]

where $\mathbb{S}_{++}^{M}$ denotes the set of all $M\times M$ symmetric and positive definite matrices;

[TABLE]

Also, the minimum in (14) is uniquely attained at

[TABLE]

By applying Fact 1 to (12), and noting $\psi_{p,\tau}({\bm{X}},{\bm{W}})\geq\psi_{p,\tau}({\bm{X}},{\bm{W}}^{\star})=\phi_{p,\tau}({\bm{X}})$ for any ${\bm{W}}\in\mathbb{S}_{++}^{M}$ , we obtain the following majorant

[TABLE]

where

[TABLE]

Note that computing ${\bm{W}}_{i}^{k}$ requires computing the eigendecomposition of ${\bm{X}}_{i}^{k}({\bm{X}}_{i}^{k})^{T}+\tau{\bm{I}}$ , which takes ${\mathcal{O}}(M^{3})$ floating point operations; cf. (10). It is easy to show that this $g$ is convex and continuously differentiable. Also, solving the MM iteration $\min_{{\bm{X}}\in\mathcal{X}}g({\bm{X}};{\bm{X}}^{k})$ in (13) is the same as solving

[TABLE]

The above problem is a quadratically regularized least-squares, with an iteratively reweighted quadratic regularizer. Hence, the MM method developed above can be interpreted as an iteratively regularizer-reweighted least-squares method. In fact, if we remove the bound constraints ${\bm{X}}\in\mathcal{X}$ we can solve problem (16) in closed form. However, we would like to keep the bound constraints, and this leads to the development in the next subsection.

4.2 Accelerated Projected Gradient for Solving the MM Iteration

Secondly, we describe an iterative method for solving the MM iteration in (16). We employ the accelerated projected gradient (APG) method [39, 40, 41], which is a fast first-order algorithm. The APG method for solving problem (16) is as follows. Let

[TABLE]

for convenience. Given a starting point ${\bm{X}}^{0}\in\mathcal{X}$ , we perform the recursion

[TABLE]

Here, $1/\beta_{j}>0$ is the step size; $\nabla g_{k}({\bm{X}})$ denotes the gradient of $g_{k}$ at ${\bm{X}}$ ; $\Pi_{\mathcal{X}}({\bm{Y}})=\arg\min_{{\bm{X}}\in\mathcal{X}}\|{\bm{Y}}-{\bm{X}}\|_{F}^{2}$ denotes the projection onto $\mathcal{X}$ ; ${\bm{Z}}^{j}$ is the extrapolated point of the ${\bm{X}}^{j}$ ’s and is given by

[TABLE]

where ${\bm{X}}^{-1}={\bm{X}}^{0}$ ; the sequence $\{\alpha_{j}\}$ , called the extrapolation sequence, is given by

[TABLE]

with $\xi_{-1}=0$ . The APG iteration (18) is efficient to implement. The gradient $\nabla g_{k}({\bm{X}})$ is given by

[TABLE]

Also, we have

[TABLE]

It can be verified that, given ${\bm{W}}_{0}^{k},\ldots,{\bm{W}}_{P}^{k}$ , computing (20)–(21) takes ${\mathcal{O}}(M(ML+{\rm nnz}({\bm{G}})))$ floating-point operations, where ${\rm nnz}({\bm{G}})$ denotes the number of nonzero elements of ${\bm{G}}$ . We should note that a large number of elements of ${\bm{G}}$ are zero. Recall that ${\bm{G}}$ describes the spatial degradation process of local spatial blurring and downsampling. One can show that the number of nonzero elements of each column of ${\bm{G}}$ depends on the size of the blurring kernel, and in practice the blurring kernel size is often small. Also, the projection operation $\Pi_{\mathcal{X}}$ for the bound set $\mathcal{X}=[0,1]^{M\times L}$ is merely a clipping function, i.e.,

[TABLE]

where ${\bm{0}}$ and ${\bm{1}}$ denote all-zero and all-one matrices, resp., and $\min$ and $\max$ are taken in the element-wise manner.

We complete the APG development by specifying our step-size rule. The APG method guarantees convergence to the optimal solution if $\beta_{k}$ is chosen to be a Lipschitz constant of $\nabla g_{k}$ [40, 41]. We have the following result.

Fact 2

A Lipschitz constant of $\nabla g_{k}$ in (17) is

[TABLE]

where $\lambda_{\rm max}(\cdot)$ denotes the largest eigenvalue of the argument.

The proof of Fact 2 is shown in Appendix A. The computational cost of (2) is mainly with computing the largest eigenvalues of $M\times M$ symmetric matrices, and there are $P+2$ such eigenvalues to compute. The eigenvalue $\lambda_{\rm max}({\bm{G}}{\bm{G}}^{T})$ can be computed offline, and the eigenvalues $\lambda_{\rm max}({\bm{W}}_{i}^{k})$ are byproducts of computing ${\bm{W}}_{i}^{k}$ in (15) (again, cf. (10)). It suffices to calculate $\lambda_{\max}({\bm{F}}^{T}{\bm{F}}+p\gamma_{0}{\bm{W}}_{0}^{k})$ at every MM iteration. Hence, computing (2) takes ${\mathcal{O}}(M^{3})$ floating-point operations.

4.3 Algorithm Speedup via Inexact MM

Let us summarize the MM method developed in the last two subsections: We perform the MM iteration (13). Every iteration requires solving a regularized least-squares (with bound constraints) exactly, and we do that by applying the APG method in (18)–(22). The setback with this exact MM approach is that while the APG method is considered fast, it still takes time to solve each MM problem exactly.

The algorithm we finally adopt is an inexact MM method. At each MM iteration, we apply the APG method with one iteration only. Specifically, we replace the exact MM iteration (13) by

[TABLE]

where $\alpha_{k}$ is given by (19); $\beta_{k}$ is chosen as $\beta_{k}=1/L_{g_{k}}$ , with $L_{g_{k}}$ given by (2). By using this inexact MM update we hope that the total number of iterations (i.e., the sum of APG iterations incurred by all the MM iterations) may be reduced, and the runtime improved. Based on our numerical experience, the inexact MM method runs much faster than the exact MM. Also, it is shown in [32] that, under some technical assumptions, the above inexact MM method guarantees convergence to a stationary solution.

We summarize the inexact MM algorithm in pseudo-form in Algorithm 1. We call this algorithm Global-Local lOw-Rank promotIng Algorithm (GLORIA). It can be verified that the complexity of GLORIA is ${\mathcal{O}}(M(ML+PM^{2}+{\rm nnz}({\bm{G}})))$ per iteration.

5 Numerical Results

We performed extensive numerical experiments to benchmark GLORIA against a number of existing algorithms. The benchmarked algorithms we choose are considered most representative in the context or are related to our method. Namely, they are GSA [42], GLP [43], CNMF [4], FUMI [6], HySure [9], the nuclear norm minimization (NNM) method in (7), and LRSR [12]. GSA and GLP are pansharpening-based methods; CNMF and FUMI are representative methods in matrix factorization-based HSR; HySure is a representative method in dictionary-based regression, and it employs spatial total variation regularization in its regression; LRSR is another dictionary-based regression method which uses local nuclear-norm regularization; NNM is regarded as the baseline method for low-rank matrix estimation in the context of matrix completion, and thus we include it in our experiments. All the algorithms are implemented on a desktop computer with Intel Core i7-5760X@3GHz CPU and 32GB memory. Codes are written in MATLAB R2015a.

The parameter settings of GLORIA are as follows, unless specified otherwise. The parameters of the smooth Schatten- $p$ function are $p=1/2$ , $\tau=1$ . The local patches are obtained by dividing the image into equal-spaced rectangular blocks, as in Fig. 2. The regularization parameters $\gamma_{i}$ ’s are chosen to be identical; i.e., $\gamma_{0}=\gamma_{1}=\cdots=\gamma_{P}=\gamma$ . We initialize the algorithm by using a $[0,1]$ -uniform i.i.d. generated ${\bm{X}}$ . We stop the algorithm when the relative change of the objective function is below $10^{-5}$ or when the number of iterations exceeds $100$ .

The parameter settings of the benchmarked algorithms basically follow the recommended settings in [44, 12]. In addition, for matrix factorization and dictionary-based regression methods, we fix the target rank as $N=30$ . We use VCA to initialize the matrix factorization algorithms, and we stop the algorithms by the same stopping rule as that for GLORIA. Also, the NNM method is implemented by applying the APG method in [45] to problem (7).

The performance measures employed for evaluating the recovery performance are the peak SNR (PSNR), spectral angle mapper (SAM), Erreur Relative Globale Adimensionnelle de Sythes̀e (ERGAS) and universal image quality index (UIQI). They have been extensively used in the HSR literature, and we refer the reader to [1] for their definitions.

5.1 Semi-Real Data Experiments

First, we consider semi-real data experiments. The experiments were based on the widely-used Wald’s protocol [46], where we take a real HS image as the ground-truth SR image ${\bm{X}}$ and use it to generate the observed MS and HS images, ${\bm{Y}}_{\rm H}$ and ${\bm{Y}}_{\rm M}$ , through the model (1). We consider the following real HS datasets.

Hyperspec-VNIR-C Chikusei: This dataset was acquired by the Headwall Hyperspec-VNIR-C imaging sensor [47]. It covers 128 spectral bands whose wavelength range is from 363nm to 1,018nm. We take a 480 $\times$ 480 subimage from this dataset as our SR image. 2. 2.

AVIRIS Indian Pines: This dataset was captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) HS sensor [48]. The wavelength range is from 400nm to 2,500nm. It has 178 spectral bands after dropping bands that are corrupted by water absorption. In the experiment, a 120 $\times$ 120 subimage is used. 3. 3.

HYDICE Washington DC Mall: This dataset was taken by the Hyperspectral Digital Imagery Collection Experiment (HYDICE) HS sensor [49]. We take a subimage of this dataset, which has 240 $\times$ 240 pixels and 191 clean spectral bands. The wavelength range is from 400nm to 2,500nm. 4. 4.

ROSIS University of Pavia: This dataset was measured by the Reflective Optics System Imaging Spectrometer (ROSIS) HS sensor. This dataset has 103 spectral bands whose wavelength range is from 430nm to 850nm. We take a 240 $\times$ 240 subimage from this dataset as the SR image. 5. 5.

AVIRIS Moffett Field: This dataset, recorded by AVIRIS HS sensor, has 187 uncorrupted spectral bands. The wavelength range is from 400nm to 2,500nm. We take a 240 $\times$ 240 subimage from this dataset.

These five images have been displayed in Fig. 3. The settings with the spectral and spatial measurement response matrices ${\bm{F}}$ and ${\bm{G}}$ should also be described. The matrix ${\bm{F}}$ is chosen such that it is equivalent to the spectral response of either the Landsat 4 TM sensor [50] (6 bands, with spectral coverage from 400nm to 2,500nm) or the IKONOS sensor [51] (4 bands, with spectral coverage from 450nm to 900nm). We choose the Landsat 4 TM sensor response for Indian Pines, Washington DC Mall and Moffett Field, and the IKONOS sensor for Chikusei and University of Pavia; such choosing is to match the spectral coverage of the HS images. As discussed previously, ${\bm{G}}$ corresponds to the process of spatial blurring and downsampling. The blurring function is a 11 $\times$ 11 Gaussian point spread function, with variance $1.7^{2}$ . The downsampling is done every $4$ pixels, both horizontally and vertically. Furthermore, the noise terms ${\bm{V}}_{\rm H}$ and ${\bm{V}}_{\rm M}$ are randomly generated following an i.i.d. mean-zero Gaussian distribution. We fix the SNRs at ${\sf SNR}_{\rm M}={\sf SNR}_{\rm H}=25{\rm dB}$ .

Table 2 summarizes some of the settings with the experiments. There, we also show the settings with the patch number $P$ of GLORIA. The regularization parameter of GLORIA is fixed as $\gamma=20/({\sf SNR}_{\rm M}+{\sf SNR}_{\rm H})$ . We ran $50$ independent trials in each image. The obtained performance is shown in Table 3, where, for each performance measure, we use blue, brown and red boldfaced letters to mark the best, second best and third best algorithms. To give the reader an additional reference on the performance, we show the SAM maps of the various algorithms in Figs. 5–8; note that the SAM maps shown are from one realization.

From Table 3 we see that, except for runtimes, GLORIA ranks best or second best in all of the performance measures. From Figs. 5–8, we also see that GLORIA yields good results compared to the other algorithms—and this is particularly so for the Indian Pine image in Fig. 5. In fact, we observe that even the NNM method, which is the baseline low-rank matrix estimation method and can be regarded as the precursor of our global-local low-rank pursuit, works reasonably. The above reported results suggest that the exploitation of low-rank spectral-spatial data structures in HSR is a working idea. We speculate that the good performance of GLORIA compared to the other algorithms is because GLORIA exploits the local low-rank data structure, which may provide better robustness to the EV effects. We will use synthetic experiments to examine the EV effects in the next subsection.

We should also discuss the runtime performance. The best algorithms are GSA and GLP, which do not perform very well in recovery performance. Let us compare GLORIA to the representative CNMF and HySure methods. GLORIA runs faster than CNMF and HySure for the cases of Chikusei and University of Pavia, and slower for the cases of Washington DC Mall and Moffett Field. On this issue, we should note that GLORIA deals with $ML$ optimization variables, e.g., $29,491,200$ variables in Chikusei. In comparison, CNMF and HySure require $(M+L)N$ and $NL$ optimization variables, resp., which amount to $6,915,840$ and $6,912,000$ variables, resp., in Chikusei. In terms of runtime per variable, GLORIA is considered efficient.

5.2 Synthetic Data Experiments

Second, we consider synthetic data experiments. The way we prepare the experiment is similar to the one in [9, 11], with the difference that EV is also involved. The procedure is described as follows. We use the local-patch-wise, and EV-present, linear spectral mixture model (8) to generate the SR image. The number of endmembers is $N=5$ . The generations of the local endmember matrix ${\bm{A}}_{i}$ and abundance matrix ${\bm{S}}_{i}$ will be described shortly. Each local patch is rectangular, but its horizontal and vertical lengths are, in each trial, random. Fig. 9(b) shows one such arrangement. We obtain $P=64$ such patches, and it is important to note that none of the algorithms, including GLORIA, has knowledge of such patch arrangement. In GLORIA we will apply the same equi-spaced rectangular segmentation as before, and there will be mismatches between the actual patches and the patches presumed by GLORIA. Doing so is to provide a more realistic simulation as, in reality, it is impossible to exactly know how EV changes in space.

The endmember matrix ${\bm{A}}_{i}$ is chosen as a collection of the spectral signatures of five materials, specifically, Actinolite, Albite, Muscovite, Olivine and Topaz. To simulate the EV effect, for each patch and for each material we randomly pick one variation of that material from the U.S. geological survey (USGS) spectral library [52]. The abundance matrix ${\bm{S}}$ is chosen as the abundance maps extracted from a real HS image, namely, the AVIRIS Cuprite dataset; the extraction is done by applying an HU method called SVMAX [53]. In each trial, we randomly cropped a 120 $\times$ 120 submap from AVIRIS Cuprite; see the illustration in Fig. 9(a) where the submap is marked as a red rectangle. The abundance maps are then extracted from that submap.

Some other simulation settings are as follows. The spectral measurement response matrix corresponds to the spectral response of the Landsat 4 TM sensor. The regularization parameter of GLORIA is $\gamma=40/({\sf SNR}_{\rm M}+{\sf SNR}_{\rm H})$ . We ran $100$ independent trials. Table 4 shows the results, where again the best three algorithms are marked in blue, brown and red. As can be seen, GLORIA generally gives the best HSR recovery performance. This suggests that GLORIA has the flexibility to accommodate the EV effect. Also, GLORIA works better when the patch size is smaller, or when the number of patches $P$ increases. Another observation is that at the lower SNR, i.e., $15$ dB, GLORIA works considerably better than the state-of-art methods.

Previously we mentioned that GLORIA, an inexact MM scheme using accelerated projected gradient, runs very fast. To give the reader some idea, we conduct the following runtime test. We benchmark GLORIA against the exact MM scheme and the inexact MM scheme via the nominal projected gradient (PG). The exact MM scheme solves each MM iteration exactly via the accelerated PG method (see Section 4.2). The inexact MM scheme via the nominal PG removes the extrapolation by setting $\alpha_{k}=0$ in the inexact MM iteration (24). The previously described synthetic experiment is used to test the three MM schemes, and we consider SNR= $25$ dB and $P=16$ . The runtime results, shown in Table 5, indicate significant runtime advantages of GLORIA over the other two candidates.

5.3 Real Data Experiments

Finally, we test the algorithms on real data. The dataset is the one used in [54]. The HS image was acquired by the Hyperion HS sensor. It covers a spectral range from 400nm to 2,500nm, and has 89 spectral bands after removing 131 noisy bands. The MS image was captured by the MS sensor mounted on the Sentinel-2A satellite. It has 13 bands, and we adopt 4 bands whose central wavelengths are 490nm, 560nm, 665nm and 842nm, resp. Readers are referred to [54] for further details. After pre-processing such as co-registration and cropping, we obtain the HS-MS image pair. The image pair is illustrated in Fig. 10. The setting is $(M,M_{\rm M},L,L_{\rm H},N)=(89,4,360^{2},120^{2},30)$ .

We employ the algorithm in [9] to estimate the spectral and spatial measurement responses ${\bm{F}}$ and ${\bm{G}}$ . Empirically we found that, for the tested HS-MS pair, this algorithm happens to yield a poor estimate of ${\bm{G}}$ . To mend this issue, we consider a second-stage estimation: we fix the blurring kernel to be a Gaussian kernel and estimate its standard deviation $\sigma^{2}$ by the nonlinear least-squares $\min_{\sigma^{2}>0}\|{\bm{y}}_{\rm M}^{1}\circ g(\sigma^{2})-\bm{f}^{1}{\bm{Y}}_{\rm H}\|^{2}$ , where $g(\sigma^{2})$ denotes the Gaussian kernel with variance $\sigma^{2}$ , $\circ$ corresponds to the 2-D convolution operation, and ${\bm{y}}_{\rm M}^{1}$ and $\bm{f}^{1}$ are the first row of ${\bm{Y}}_{\rm M}$ and the estimated ${\bm{F}}$ , resp.

We follow the same simulation settings as those in semi-real experiments, except that we use $\gamma=10$ for GLORIA. Figs. 12-12 illustrate the 5th and 20th bands of the original MS image and recovered images. From the figures, we note that GLP does not work well compared to the other algorithms. Moreover, if we zoom in the recovered images, we can see that the images recovered by HySure and LRSR have strip noise, while the images recovered by CNMF and NNM have pepper noise. In comparison, the images recovered by FUMI and GLORIA appear smoother.

6 Conclusion

In this paper we explored the route of low-rank matrix estimation for HSR. By positing a low-rank spectral-spatial data structure, both globally and locally, we built an algorithmic solution, called GLORIA, that exploits such structure for HSR. Our extensive numerical studies, which include semi-real data experiments on five different datasets, one synthetic experiment and one real experiment, show that exploiting global-local low-rank structure not only is a working idea but also provides satisfactory reconstruction results. Our global-local low-rank exploitation is made possible by customizing an efficient first-order strategy in large-scale structured optimization. We close this paper by naming a future direction. It would be interesting to study how low-rank matrix estimation would be useful in other hyperspectral problems such as multisource and multitemporal data fusion.

Appendix A

By definition, a constant $L$ is said to be a Lipschitz constant of $\nabla g_{k}$ on $\mathcal{X}$ if $\|\nabla g_{k}({\bm{X}})-\nabla g_{k}({\bm{Y}})\|_{F}\leq L\|{\bm{X}}-{\bm{Y}}\|_{F}$ for any ${\bm{X}},{\bm{Y}}\in\mathcal{X}$ . From (20) we have

[TABLE]

for any ${\bm{X}},{\bm{Y}}\in\mathbb{R}^{M\times L}$ , where (25b) is due to the triangle inequality; (25c) is due to i) the inequality $\|{\bm{A}}{\bm{B}}\|_{F}\leq\|{\bm{A}}\|_{2}\|{\bm{B}}\|_{F}$ in which $\|{\bm{A}}\|_{2}$ denotes the spectral norm of ${\bm{A}}$ , and ii) the identity $\|{\bm{A}}\|_{2}=\lambda_{\rm max}({\bm{A}})$ for positive semidefinite ${\bm{A}}$ . The proof of Fact 2 is complete.

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Loncan, L. B. De Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanussot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Simoes et al. , “Hyperspectral pansharpening: A review,” IEEE Geosci. Remote Sens. Mag. , vol. 3, no. 3, pp. 27–46, 2015.
2[2] N. Dobigeon, J.-Y. Tourneret, C. Richard, J. C. M. Bermudez, S. Mc Laughlin, and A. O. Hero, “Nonlinear unmixing of hyperspectral images: Models and algorithms,” IEEE Signal Process. Mag. , vol. 31, no. 1, pp. 82–94, 2013.
3[3] R. Heylen, M. Parente, and P. Gader, “A review of nonlinear hyperspectral unmixing methods,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. , vol. 7, no. 6, pp. 1844–1868, 2014.
4[4] N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion,” IEEE Trans. Geosci. Remote Sens. , vol. 50, no. 2, pp. 528–537, 2012.
5[5] E. Wycoff, T.-H. Chan, K. Jia, W.-K. Ma, and Y. Ma, “A non-negative sparse promoting algorithm for high resolution hyperspectral imaging,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP) , 2013, pp. 1409–1413.
6[6] Q. Wei, J. Bioucas-Dias, N. Dobigeon, J.-Y. Tourneret, M. Chen, and S. Godsill, “Multiband image fusion based on spectral unmixing,” IEEE Trans. Geosci. Remote Sens. , vol. 54, no. 12, pp. 7236–7249, 2016.
7[7] R. Wu, H.-T. Wai, and W.-K. Ma, “Hybrid inexact BCD for coupled structured matrix factorization in hyperspectral super-resolution,” to appear in IEEE Trans. Signal Process. , 2020.
8[8] R. Dian, S. Li, L. Fang, and Q. Wei, “Multispectral and hyperspectral image fusion with spatial-spectral sparse representation,” Inf. Fusion , vol. 49, pp. 262–270, 2019.