L1-norm Tucker Tensor Decomposition

Dimitris G. Chachlakis; Ashley Prater-Bennette; and Panos P.; Markopoulos

arXiv:1904.06455·cs.NA·April 16, 2019

L1-norm Tucker Tensor Decomposition

Dimitris G. Chachlakis, Ashley Prater-Bennette, and Panos P., Markopoulos

PDF

TL;DR

This paper introduces L1-Tucker, a robust tensor decomposition method based on L1-norm, with algorithms that resist heavy data corruption, improving analysis of multi-way data.

Contribution

It formulates L1-Tucker decomposition and proposes two algorithms, L1-HOSVD and L1-HOOI, with analysis of their complexity and convergence.

Findings

01

L1-Tucker performs comparably to standard Tucker on clean data.

02

L1-Tucker shows strong robustness against heavily corrupted data.

03

Algorithms are effective for tensor reconstruction and classification.

Abstract

Tucker decomposition is a common method for the analysis of multi-way/tensor data. Standard Tucker has been shown to be sensitive against heavy corruptions, due to its L2-norm-based formulation which places squared emphasis to peripheral entries. In this work, we explore L1-Tucker, an L1-norm based reformulation of standard Tucker decomposition. After formulating the problem, we present two algorithms for its solution, namely L1-norm Higher-Order Singular Value Decomposition (L1-HOSVD) and L1-norm Higher-Order Orthogonal Iterations (L1-HOOI). The presented algorithms are accompanied by complexity and convergence analysis. Our numerical studies on tensor reconstruction and classification corroborate that L1-Tucker, implemented by means of the proposed methods, attains similar performance to standard Tucker when the processed data are corruption-free, while it exhibits sturdy resistance…

Figures10

Click any figure to enlarge with its caption.

Tables1

Table 1. TABLE I : Computational costs of PCA [ 17 ] , L1-PCA (alternating optimization with arbitrary initialization) [ 37 ] , HOSVD [ 7 ] , L1-HOSVD (proposed), HOOI [ 3 ] , and L1-HOOI (proposed). PCA/L1-PCA costs are reported for input matrix 𝐗 ∈ ℝ D 1 × D 2 𝐗 superscript ℝ subscript 𝐷 1 subscript 𝐷 2 \mathbf{X}\in\mathbb{R}^{D_{1}\times D_{2}} and decomposition rank d 1 subscript 𝑑 1 d_{1} . Tucker/L1-Tucker costs are reported for input tensor 𝒳 ∈ ℝ D 1 × D 2 × … × D N 𝒳 superscript ℝ subscript 𝐷 1 subscript 𝐷 2 … subscript 𝐷 𝑁 {\mathbfcal X}\in\mathbb{R}^{D_{1}\times D_{2}\times\ldots\times D_{N}} and mode- n ∈ [ N ] 𝑛 delimited-[] 𝑁 n\in[N] ranks { d n } n ∈ [ N ] subscript subscript 𝑑 𝑛 𝑛 delimited-[] 𝑁 \{d_{n}\}_{n\in[N]} , for P k = ∏ n ∈ [ N ] ∖ k D n subscript 𝑃 𝑘 subscript product 𝑛 delimited-[] 𝑁 𝑘 subscript 𝐷 𝑛 P_{k}=\prod_{n\in[N]\setminus k}D_{n} and p k = ∏ n ∈ [ N ] ∖ k d n . subscript 𝑝 𝑘 subscript product 𝑛 delimited-[] 𝑁 𝑘 subscript 𝑑 𝑛 p_{k}=\prod_{n\in[N]\setminus k}d_{n}. T 𝑇 T is the maximum number of iterations conducted by HOOI and L1-HOOI.

Method	Cost
PCA (SVD)	$𝒪 (D_{1} D_{2} \min {D_{1}, D_{2}})$
L1-PCA (AO)	$𝒪 (D_{1}^{2} D_{2} d_{1})$
HOSVD	$𝒪 (\max_{k \in [N]} {D_{k} P_{k} \min {D_{k}, P_{k}}})$
L1-HOSVD	$𝒪 (\max_{k \in [N]} {D_{k}^{2} P_{k} d_{k}})$
HOOI	$𝒪 (T \max_{n \in [N]} \min_{k \in [N] ∖ n} {D_{k} d_{k} P_{k} + D_{n} p_{n} \min {D_{n}, p_{n}}})$
L1-HOOI	$𝒪 (T \max_{n \in [N]} \min_{k \in [N] ∖ n} {D_{k} d_{k} P_{k} + D_{n}^{2} p_{n} d_{n}})$

Equations89

{U_{n} \in S (D_{n}, d_{n})}_{n \in [N]} max. \mathbfcal X \times_{n \in [N]} U_{n}^{⊤}_{F}^{2},

{U_{n} \in S (D_{n}, d_{n})}_{n \in [N]} max. \mathbfcal X \times_{n \in [N]} U_{n}^{⊤}_{F}^{2},

\mathbfcal G = \mathbfcal X \times_{n \in [N]} U_{n}^{tckr}^{⊤}

\mathbfcal G = \mathbfcal X \times_{n \in [N]} U_{n}^{tckr}^{⊤}

\hat{\mathbfcal X} = \mathbfcal G \times_{n \in [N]} U_{n}^{tckr} .

\hat{\mathbfcal X} = \mathbfcal G \times_{n \in [N]} U_{n}^{tckr} .

U_{n}^{hosvd} \in U \in S (D_{n}, d_{n}) argmax ∥ U^{⊤} [\mathbfcal X]_{(n)} ∥_{F}^{2},

U_{n}^{hosvd} \in U \in S (D_{n}, d_{n}) argmax ∥ U^{⊤} [\mathbfcal X]_{(n)} ∥_{F}^{2},

A_{n, t} := [\mathbfcal X \times_{m \in [n - 1]} U_{m, t}^{hooi} \times_{k \in [N - n] + n} U_{k, t - 1}^{hooi}]_{(n)}

A_{n, t} := [\mathbfcal X \times_{m \in [n - 1]} U_{m, t}^{hooi} \times_{k \in [N - n] + n} U_{k, t - 1}^{hooi}]_{(n)}

U_{n, t}^{hooi} \in U \in S (D_{n}, d_{n}) argmax ∥ U^{⊤} A_{n, t} ∥_{F}^{2},

U_{n, t}^{hooi} \in U \in S (D_{n}, d_{n}) argmax ∥ U^{⊤} A_{n, t} ∥_{F}^{2},

U \in S (D_{1}, d_{1}) max. ∥ U^{⊤} X ∥_{1},

U \in S (D_{1}, d_{1}) max. ∥ U^{⊤} X ∥_{1},

B \in {\pm 1}^{D_{2} \times d_{1}} max. ∥ XB ∥_{*} .

B \in {\pm 1}^{D_{2} \times d_{1}} max. ∥ XB ∥_{*} .

Φ (A) \in U \in S (D, d) argmax Tr (U^{⊤} A) = U \in S (D, d) argmin ∥ U - A ∥_{F} .

Φ (A) \in U \in S (D, d) argmax Tr (U^{⊤} A) = U \in S (D, d) argmin ∥ U - A ∥_{F} .

U \in S (D_{1}, d_{1}) max ∥ U^{⊤} X ∥_{1}

U \in S (D_{1}, d_{1}) max ∥ U^{⊤} X ∥_{1}

= B \in {\pm 1}^{D_{2} \times d_{1}} max ∥ X^{⊤} B ∥_{*} .

B_{t} = sgn (X^{⊤} U_{t - 1}) \in B \in {\pm 1}^{D_{2} \times d_{1}} argmax Tr (U_{t - 1}^{⊤} X B)

B_{t} = sgn (X^{⊤} U_{t - 1}) \in B \in {\pm 1}^{D_{2} \times d_{1}} argmax Tr (U_{t - 1}^{⊤} X B)

U_{t} = Φ (X B_{t}) \in U \in S (D_{1}, d_{1}) argmax Tr (U^{⊤} X B_{t}),

U_{t} = Φ (X B_{t}) \in U \in S (D_{1}, d_{1}) argmax Tr (U^{⊤} X B_{t}),

U_{t} = Φ (X sgn (X^{⊤} U_{t - 1})) .

U_{t} = Φ (X sgn (X^{⊤} U_{t - 1})) .

∥ U_{t - 1}^{⊤} X ∥_{1}

∥ U_{t - 1}^{⊤} X ∥_{1}

\leq Tr (U_{t}^{⊤} X B_{t})

\leq ∥ U_{t}^{⊤} X ∥_{1} .

\frac{∥ U _{t} ^{⊤} X ∥ _{1} - ∥ U _{t - 1} ^{⊤} X ∥ _{1}}{∥ U _{t - 1} ^{⊤} X ∥ _{1}}

\frac{∥ U _{t} ^{⊤} X ∥ _{1} - ∥ U _{t - 1} ^{⊤} X ∥ _{1}}{∥ U _{t - 1} ^{⊤} X ∥ _{1}}

{U_{n} \in S (D_{n}, d_{n})}_{n \in [N]} max. \mathbfcal X \times_{n \in [N]} U_{n}^{⊤}_{1} .

{U_{n} \in S (D_{n}, d_{n})}_{n \in [N]} max. \mathbfcal X \times_{n \in [N]} U_{n}^{⊤}_{1} .

\mathbfcal X \times_{n \in [N]} U_{n}^{⊤}_{1} = U_{m}^{⊤} A_{m}_{1},

\mathbfcal X \times_{n \in [N]} U_{n}^{⊤}_{1} = U_{m}^{⊤} A_{m}_{1},

U_{n}^{l1-hosvd} \in U \in S (D_{n}, d_{n}) argmax U^{⊤} [\mathbfcal X]_{(n)}_{1} .

U_{n}^{l1-hosvd} \in U \in S (D_{n}, d_{n}) argmax U^{⊤} [\mathbfcal X]_{(n)}_{1} .

U_{n}^{l1-hosvd} = L1PCA-AO ([\mathbfcal X]_{(n)}, U_{n}^{hosvd}),

U_{n}^{l1-hosvd} = L1PCA-AO ([\mathbfcal X]_{(n)}, U_{n}^{hosvd}),

U_{n}^{(q)} \in U \in S (D_{n}, d_{n}) argmax \mathbfcal X \times_{m < n} U_{m}^{(q)}^{⊤} \times_{n} U^{⊤} \times_{k > n} U_{k}^{(q - 1)}^{⊤}_{1} .

U_{n}^{(q)} \in U \in S (D_{n}, d_{n}) argmax \mathbfcal X \times_{m < n} U_{m}^{(q)}^{⊤} \times_{n} U^{⊤} \times_{k > n} U_{k}^{(q - 1)}^{⊤}_{1} .

A_{n}^{(q)} := [\mathbfcal X \times_{m < n} U_{m}^{(q)}^{⊤} \times_{k > n} U_{k}^{(q - 1)}^{⊤}]_{(n)},

A_{n}^{(q)} := [\mathbfcal X \times_{m < n} U_{m}^{(q)}^{⊤} \times_{k > n} U_{k}^{(q - 1)}^{⊤}]_{(n)},

U_{n}^{(q)} \in U \in S (D_{n}, d_{n}) argmax U^{⊤} A_{n}^{(q)}_{1} .

U_{n}^{(q)} \in U \in S (D_{n}, d_{n}) argmax U^{⊤} A_{n}^{(q)}_{1} .

U_{n}^{(q)} = L1PCA-AO (A_{n}^{(q)}, U_{n}^{(q - 1)}) .

U_{n}^{(q)} = L1PCA-AO (A_{n}^{(q)}, U_{n}^{(q - 1)}) .

U_{n}^{(q)}^{⊤} A_{n}^{(q)}_{1} \geq U_{n}^{(q - 1)}^{⊤} A_{n}^{(q)}_{1} .

U_{n}^{(q)}^{⊤} A_{n}^{(q)}_{1} \geq U_{n}^{(q - 1)}^{⊤} A_{n}^{(q)}_{1} .

U_{n}^{(q)}^{⊤} A_{n}^{(q)}_{1} \geq U_{m}^{(q)}^{⊤} A_{m}^{(q)}_{1} .

U_{n}^{(q)}^{⊤} A_{n}^{(q)}_{1} \geq U_{m}^{(q)}^{⊤} A_{m}^{(q)}_{1} .

U_{n}^{(q)}^{⊤} A_{n}^{(q)}_{1}

U_{n}^{(q)}^{⊤} A_{n}^{(q)}_{1}

\geq Lemma \ref r e ma r k 1 U_{n}^{(q - 1)}^{⊤} [\mathbfcal X \times_{k < n} U_{k}^{(q)}^{⊤} \times_{l > n} U_{l}^{(q - 1)}^{⊤}]_{(n)}_{1}

= U_{n - 1}^{(q)}^{⊤} [\mathbfcal X \times_{k < n - 1} U_{k}^{(q)}^{⊤} \times_{n} U_{n}^{(q - 1)}^{⊤} \times_{l > n} U_{l}^{(q - 1)}^{⊤}]_{(n - 1)}_{1}

= U_{n - 1}^{(q)}^{⊤} A_{n - 1}^{(q)}_{1} .

U_{1}^{(q)}^{⊤} A_{1}^{(q)}_{1} \geq U_{N}^{(q - 1)}^{⊤} A_{N}^{(q - 1)}_{1} .

U_{1}^{(q)}^{⊤} A_{1}^{(q)}_{1} \geq U_{N}^{(q - 1)}^{⊤} A_{N}^{(q - 1)}_{1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

L1-norm Tucker Tensor Decomposition

Dimitris G. Chachlakis,† Ashley Prater-Bennette,‡ and Panos P. Markopoulos*†∗* *†D. G. Chachlakis and P. P. Markopoulos are with the Department of Electrical and Microelectronic Engineering, Rochester Institute of Technology, Rochester, NY ([email protected], [email protected]).‡*A. Prater-Bennette is with the Air Force Research Laboratory, Information Directorate, Rome, NY ([email protected]).∗Corresponding author.This is a preprint of an article that has been submitted for peer-reviewed publication. This preprint may contain errata. Some preliminary results were presented in[1].

Abstract

Tucker decomposition is a common method for the analysis of multi-way/tensor data. Standard Tucker has been shown to be sensitive against heavy corruptions, due to its L2-norm-based formulation which places squared emphasis to peripheral entries. In this work, we explore L1-Tucker, an L1-norm based reformulation of standard Tucker decomposition. After formulating the problem, we present two algorithms for its solution, namely L1-norm Higher-Order Singular Value Decomposition (L1-HOSVD) and L1-norm Higher-Order Orthogonal Iterations (L1-HOOI). The presented algorithms are accompanied by complexity and convergence analysis. Our numerical studies on tensor reconstruction and classification corroborate that L1-Tucker, implemented by means of the proposed methods, attains similar performance to standard Tucker when the processed data are corruption-free, while it exhibits sturdy resistance against heavily corrupted entries.

Index Terms:

L1-norm, tensor decomposition, Tucker, corrupted data

I Introduction

Tucker decomposition [2, 3] is a cornerstone method for the analysis and compression of tensor data, with a wide array of applications in data science [4], signal processing, machine learning [5], and communications [6], among other fields. Tucker decomposition is typically computed by means of the Higher-Order Singular-Value Decomposition (HOSVD) algorithm, or the Higher-Order Orthogonal Iterations (HOOI) algorithm [7, 3]. Other popular variants include Truncated HOSVD (T-HOSVD) [7, 2, 8], Sequentially Truncated HOSVD (ST-HOSVD) [9], and Hierarchical HOSVD [10]. Parallel algorithms for HOSVD have also been developed [11], with the ability to process very large datasets. When an $N$ -way tensor is processed as a collection of $(N-1)$ -way tensor measurements, Tucker is reformulated to Tucker2 decomposition [12]. For the special case of $N=3$ , Tucker2 has also been presented as Generalized Low-Rank Approximation of Matrices (GLRAM) [13, 14] or 2-Dimensional Principal Component Analysis (2DPCA) [15, 16]. For $N=2$ , both Tucker and Tucker2 boil down to standard matrix Principal-Component Analysis (PCA), which is practically solved through Singular-Value Decomposition (SVD) [17]. In fact, both HOSVD and HOOI are high-order generalizations of SVD.

Due to its L2-norm formulation (minimization of the L2-norm of the residual-error or, equivalently, maximization of the L2-norm of the multi-way projection), standard Tucker decomposition has been shown to be sensitive against faulty entries within the processed tensor (also known as outliers), whether implemented by means of HOSVD, or HOOI[18, 19, 20]. The same sensitivity has also been documented in PCA, which is a special case of Tucker for $2$ -way tensors (matrices). For matrix decomposition, L1-norm-based PCA (L1-PCA) [21], formulated by simple substitution of the L2-norm in PCA with the L1-norm, has exhibited solid robustness against heavily corrupted data in an array of applications [22, 23, 24]. Similar outlier resistance has been recently attained by algorithms for L1-norm reformulation of Tucker2 decomposition of $3$ -way tensors (L1-Tucker2) [25, 26, 27, 20].

In this work we study L1-Tucker, an L1-norm reformulation of the general Tucker decomposition of $N$ -way tensors. Then, we propose two new algorithms for the solution of L1-Tucker, namely L1-HOSVD and L1-HOOI, accompanied by formal convergence and complexity analysis. Our numerical studies show that the proposed L1-Tucker methods perform similar to standard Tucker methods (HOSVD and HOOI) when the processed data are nominal. However, L1-HOSVD and L1-HOOI are markedly less affected by corruptions among the processed data.

II Technical Background

II-A Definitions and Notation

An $N$ -way tensor is an array of scalars, each entry of which is identified by $N$ indices. Vectors and matrices are $1$ -way and $2$ -way tensors, respectively. An $N$ -way tensor ${\mathbfcal X}\in\mathbb{R}^{D_{1}\times D_{2}\times\ldots\times D_{N}}$ can also be viewed as an $M$ -way tensor in $\mathbb{R}^{D_{1}\times D_{2}\times\ldots\times D_{M}}$ , for any $M>N$ , with $D_{m}=1$ for $m>N$ . For any fixed set of indices ${\{i_{m}\}_{m\in[N]\setminus n}}$ , vector ${\mathbfcal X}({i_{1},\ldots,i_{n-1},:,i_{n+1},\ldots,i_{N}})\in\mathbb{R}^{D_{n}}$ is called a mode- $n$ fiber of ${\mathbfcal X}$ . Thus, ${\mathbfcal X}\in\mathbb{R}^{D_{1}\times D_{2}\times\ldots\times D_{N}}$ can also be viewed as a structured collection of its $P_{n}:={\prod_{m\in[N]\setminus n}D_{m}}$ $n$ -th mode fibers, for any ${n\in[N]:=\{1,2,\ldots,N\}}$ . Arranging all mode- $n$ fibers of tensor ${\mathbfcal X}$ as columns of a matrix leads to a mode- $n$ matrix unfolding (also known as flattening) of ${\mathbfcal X}$ , ${[{\mathbfcal X}]_{(n)}\in\mathbb{R}^{D_{n}\times P_{n}}}$ . Certainly, the mode- $n$ fibers of ${\mathbfcal X}$ can be arranged in multiple different orders, resulting in column permutations of $[{\mathbfcal X}]_{(n)}$ . In this work, we consider the common unfolding order, by which tensor element ${\mathbfcal X}({i_{1},i_{2},\ldots,i_{N}})$ is mapped to the mode- $n$ unfolding element $[{\mathbfcal X}]_{(n)}({i_{n},j})$ , for ${j=1+\sum_{m\in[N]\setminus n}(i_{m}-1)J_{m}}$ and ${J_{m}:=\prod_{k\in[m-1]\setminus n}D_{k}}$ , for every $m\in[N]$ [3].

II-B Tucker Decomposition

Tucker tensor decomposition factorizes ${\mathbfcal X}$ into $N$ orthonormal bases and a core tensor. Specifically, considering $\{d_{n}\}_{n\in[N]}$ that satisfy ${d_{n}\leq D_{n}}$ ${\forall n\in[N]}$ , Tucker decomposition is formulated as

[TABLE]

where $\times_{n}$ denotes the mode- $n$ tensor-to-matrix product [3], $\times_{n\in[N]}\mathbf{U}_{n}^{\top}$ summarizes the multi-mode product $\times_{1}\mathbf{U}_{1}^{\top}\times_{2}\mathbf{U}_{2}^{\top}\times_{3}\ldots\times_{N}\mathbf{U}_{N}^{\top}$ , $\mathbb{S}(D,d)=\{\mathbf{U}\in\mathbb{R}^{D\times d};$ $~{}\mathbf{U}^{\top}\mathbf{U}=\mathbf{I}_{d}\}$ is the Stiefel manifold containing all rank- $d$ orthonormal bases in $\mathbb{R}^{D}$ , and $\|\cdot\|_{F}^{2}$ denotes the L2 (or Frobenius) norm, returning the summation of the squared entries of its tensor argument. If ${{\mathbf{U}}}^{\text{tckr}}_{n}$ is the mode- $n$ basis derived by solving (1), then

[TABLE]

is the Tucker core of ${\mathbfcal X}$ , and ${\mathbfcal X}$ is Tucker-approximated by

[TABLE]

Equivalently, $\hat{{\mathbfcal X}}={\mathbfcal X}\times_{n\in[N]}\mathbf{U}_{n}^{\text{tckr}}{\mathbf{U}_{n}^{\text{tckr}}}^{\top}$ . If $d_{n}=D_{n}~{}\forall n$ , it trivially holds that ${\mathbfcal X}=\hat{{\mathbfcal X}}$ . The minimum values of $\{d_{n}\}_{n\in[N]}$ for which $\hat{{\mathbfcal X}}={\mathbfcal X}$ are the respective mode ranks of ${\mathbfcal X}$ . A schematic illustration of Tucker decomposition for $N=3$ is offered in Figure 1.

The solution to (1) is commonly pursued by means of the Higher-Order Singular-Value Decomposition algorithm (HOSVD) [7] or the Higher-Order Orthogonal Iterations algorithm (HOOI) [3]. A brief review of HOSVD and HOOI follows.

II-B1 HOSVD Method

HOSVD approximates ${{\mathbf{U}}}^{\text{tckr}}_{n}$ , disjointly for each ${n\in[N]}$ , by the $d_{n}$ principal components of the mode- $n$ unfolding $[{\mathbfcal X}]_{(n)}$ ,

[TABLE]

computed by means of standard SVD. Arguably, HOSVD draws motivation from the $N=2$ case, where ${\mathbfcal X}={\mathbf{X}}$ is a $D_{1}\times D_{2}$ matrix and Tucker in (1) simplifies to maximizing $\|\mathbf{U}_{1}^{\top}\mathbf{X}\mathbf{U}_{2}\|_{F}^{2}$ . Indeed, for this special case of matrix decomposition, the optimal orthonormal factors $\mathbf{U}_{n}^{\text{tckr}}$ can be found disjointly and coincide with the solution to (4), $\mathbf{U}_{n}^{\text{hosvd}}$ , where $[{\mathbfcal X}]_{(1)}=\mathbf{X}$ and $[{\mathbfcal X}]_{(2)}=\mathbf{X}^{\top}$ . For the general $N>2$ case, however, all $N$ bases are to be found jointly and the HOSVD bases constitute, in general, approximations of the solution to (1).

II-B2 HOOI Method

HOOI is a converging iterative procedure, which provably attains a higher value of the metric in (1) than HOSVD, but still does not necessarily return the optimal solution [14, 28, 29, 30]. For each mode $n\in[N]$ , HOOI is typically (but not necessarily) initialized to the solution of HOSVD as $\mathbf{U}_{n,0}^{\text{hooi}}=\mathbf{U}_{n}^{\text{hosvd}}$ . Then, the HOOI algorithm conducts a sequence of provably converging iterations. That is, at the $t$ -th iteration, HOOI sets

[TABLE]

and updates

[TABLE]

obtained again by standard SVD of $\mathbf{A}_{n,t}$ ( $d_{n}$ dominant left-singular vectors).

II-C Data Corruption and L1-norm Reformulation of PCA

Large datasets often contain heavily corrupted, outlying entries due to various causes, such as sensor malfunctions, errors in data storage/transfer, heavy-tail noise, intermittent variations of the sensing environment, and even intentional dataset “poisoning” [31]. Regretfully, such corruptions that lie far from the sought-after subspaces are known to significantly affect PCA and its multi-way generalization, Tucker, even when they appear as a small fraction of the processed data [21, 18, 20]. Accordingly, in such cases, the performance of any application that relies on PCA and Tucker can be compromised. This corruption sensitivity can be largely attributed to the L2-norm formulation of Tucker/PCA, which places squared emphasis on each entry of the core, thus benefiting corrupted fibers. To demonstrate this sensitivity of Tucker, we present the following simple example. We build ${\mathbfcal X}\in\mathbb{R}^{10\times 10\times 10}$ with entries independently drawn from $\mathcal{N}(0,1)$ . Then, we corrupt additively the single entry $[{\mathbfcal X}]_{2,3,4}$ with a point from $\mathcal{N}(0,\mu^{2})$ . We apply HOSVD on ${\mathbfcal X}$ to obtain the single dimensional bases $\mathbf{u}_{1}\in\mathbb{R}^{10\times 1}$ , $\mathbf{u}_{2}\in\mathbb{R}^{10\times 1}$ , and $\mathbf{u}_{3}\in\mathbb{R}^{10\times 1}$ and measure the aggregate normalized fitting of the bases to the corrupted fibers as $f(\mu^{2})=\sum_{i=1}^{3}{|\mathbf{u}_{i}^{\top}\mathbf{x}_{i}|^{2}}{\|\mathbf{x}_{i}\|_{2}^{-2}},$ where $\mathbf{x}_{1}=[{\mathbfcal X}]_{:,3,4}$ , $\mathbf{x}_{2}=[{\mathbfcal X}]_{2,:,4}$ , and $\mathbf{x}_{2}=[{\mathbfcal X}]_{2,3,:}$ . We repeat this study $3000$ times and plot in Figure 2 the average value of $f(\mu^{2})$ , versus $\mu^{2}=0,10,\ldots,100$ . The impact of the single corrupted entry to the HOSVD bases (which tend to fit the corrupted fibers) is clearly documented.

To counteract the effect of data contaminations/corruptions, researchers have resorted in “robust” reformulations of PCA and Tucker. One popular approach seeks to approximate the processed data tensor as the summation of a sought-after low-rank component and a jointly optimized sparse component that models outlier corruption [32, 33, 19, 34]. This approach relies on ad hoc configured weights that regulate approximation rank, sparsity, and iteration step size. A rather more straightforward approach simply replaces the corruption-responsive L2-norm in PCA by the L1-norm. In matrix analysis, this modification resulted to the L1-PCA formulation [21], which constitutes a core component of the L1-Tucker framework proposed in this work. Therefore, for completeness, we briefly present below the theory behind L1-PCA, as well as a simple algorithm for its approximate computation.

Given a data matrix $\mathbf{X}\in\mathbb{R}^{D_{1}\times D_{2}}$ and $d_{1}\leq\text{rank}(\mathbf{X})$ , L1-PCA is defined as

[TABLE]

where the L1-norm $\|\cdot\|_{1}$ returns the summation of the absolute entries of its matrix argument. L1-PCA in (7) was solved exactly in [21], where authors presented and leveraged the following Theorem 1.

Theorem 1.

Let $\mathbf{B}_{\text{\emph{nuc}}}$ be an optimal solution to

[TABLE]

Then, $\mathbf{U}_{\text{\emph{L1}}}=\Phi(\mathbf{X}\mathbf{B}_{\text{\emph{nuc}}})$ is an optimal solution to L1-PCA in (7). Moreover, $\|\mathbf{X}^{\top}\mathbf{U}_{\text{\emph{L1}}}\|_{1}=\text{\emph{Tr}}\left(\mathbf{U}_{\text{\emph{L1}}}^{\top}\mathbf{X}\mathbf{B}_{\text{\emph{nuc}}}\right)=\|\mathbf{X}\mathbf{B}_{\text{\emph{nuc}}}\|_{*}$ [21].

Nuclear norm $\|\cdot\|_{*}$ in (8) returns the sum of the singular values of its matrix argument. For any tall matrix $\mathbf{A}\in\mathbb{R}^{D\times d}$ that admits SVD $\mathbf{A}=\mathbf{W}\mathbf{S}_{d\times d}\mathbf{Q}^{\top}$ , $\Phi(\cdot)$ in Theorem 1 is defined as $\Phi(\mathbf{A}):=\mathbf{W}\mathbf{Q}^{\top}$ . Moreover, by the Procrustes Theorem [35], it holds that

[TABLE]

By means of the above Theorem 1, the solution to L1-PCA is obtained by the solution to (8), with an additional SVD step. (8) can be solved by exhaustive search in its finite-size feasibility set, or more intelligent algorithms of lower cost, as shown in [21]. Computationally efficient, approximate solvers for (8) and (7) were presented in [36, 37, 38, 39, 40]. Incremental solvers for L1-PCA were presented in [41, 42]. Algorithms for the complex-valued L1-PCA were recently presented in [43, 44]. To date, L1-PCA has found many applications in signal processing and machine learning, such as radar-based motion recognition and foreground-activity extraction in video sequences [23, 24]. Most recently, L1-PCA was extended to L1-norm-based Tucker2 formulation, specifically for $3$ -way tensors [26, 27].

Next, we briefly present the low-cost L1-PCA calculator of [37], based on alternating optimization, which will be employed as a module of our subsequent L1-Tucker decomposition developments. A pseudocode for this L1-PCA solver is offered in Algorithm 1. According to [21],

[TABLE]

By (9), for fixed $\mathbf{B}$ , the middle part of (11) is maximized by $\mathbf{U}=\Phi({\mathbf{X}}\mathbf{B})$ . At the same time, for fixed $\mathbf{U}$ , the middle part of (11) is maximized by $\mathbf{B}=\text{sgn}({\mathbf{X}}^{\top}\mathbf{U})$ , where $\text{sgn}(\cdot)$ returns the $\pm 1$ signs of the entries of its argument ( $\text{sgn}(0)=1$ ). By the above observations, [37] pursued a solution to (7) in an alternating fashion, as

[TABLE]

and

[TABLE]

for $t=1,2,\ldots$ , and arbitrary initialization $\mathbf{U}_{0}\in\mathbb{S}({D,d})$ . Omitting the explicit computation of the auxiliary matrix $\mathbf{B}_{t}$ , (12)-(13) can take the compact form

[TABLE]

For completeness, we present below a proof of the monotonic metric increase attained by the above iterations.

Lemma 1.

For every $t\geq 1$ ,

[TABLE]

At the same time, the metric of (7) is upper bounded by the exact L1-PCA solution [21]. Thus, the iteration in (14) is guaranteed to converge. In practice, iterations can be terminated when the metric-increase ratio

[TABLE]

drops below a predetermined threshold $\tau>0$ , or when $t$ exceeds a maximum number of permitted iterations. As shown in [37], the computational cost of the presented procedure is $\mathcal{O}(D_{1}D_{2}d_{1}T)$ , where $T$ is the number of iterations. In practice, $T$ can be set to be linear in $D_{1}$ . Thus, the overall computational complexity of the above procedure is $\mathcal{O}(D_{1}^{2}D_{2}d_{1})$ . In the sequel, we adopt the function notation $\texttt{L1PCA-AO}(\mathbf{X},\mathbf{U}_{0})$ to denote the basis returned upon termination/converge of the alternating optimization of (14).

III L1-Tucker Decomposition

III-A Formulation

Motivated by the corruption resistance of L1-PCA, in this work we study L1-Tucker decomposition. L1-Tucker derives by simply replacing the L2-norm in (1) by the sturdier L1-norm, as

[TABLE]

That is, (19) strives to maximize the sum of the absolute entries of the Tucker core –while standard Tucker maximizes the sum of the squared entries of the core. We note that, for any $m\in[N]$ ,

[TABLE]

where $\mathbf{A}_{m}=[{\mathbfcal X}\times_{n<m}\mathbf{U}_{n}^{\top}\times_{k>m}\mathbf{U}_{k}^{\top}]_{(m)}$ . Thus, with respect to each individual basis, the metric of L1-Tucker resembles that of L1-PCA.

For $N=3$ , $d_{3}=D_{3}$ , and fixed $\mathbf{U}_{3}=\mathbf{I}_{D_{3}}$ , L1-Tucker in (19) simplifies to a special L1-Tucker2 decomposition, proposed and studied in [25, 27]. For the even more special case of $d_{1}=d_{2}=1$ , L1-Tucker2 was recently solved exactly in [20] through combinatorial optimization. For $N=2$ and fixed $\mathbf{U}_{2}=\mathbf{I}_{D_{2}}$ , L1-Tucker simplifies to L1-PCA, in the form of (7). The above works have shown that, even in its special cases, the exact solution to L1-Tucker is hard to find –which also holds true for standard Tucker decomposition.

Next, we present the first two approximate algorithms for solving the general L1-Tucker decomposition, in the form of (19).

III-B Proposed L1-HOSVD Method

We first present L1-HOSVD, an algorithm analogous to HOSVD for standard Tucker. Specifically, for every $n\in[N]$ , L1-HOSVD seeks to optimize the mode- $n$ basis $\mathbf{U}_{n}$ in (19) individually, by L1-PCA solution (exact or approximate) of the mode- $n$ matrix unfolding of ${\mathbfcal X}$ , $[{\mathbfcal X}]_{(n)}$ . That is, L1-HOSVD approximates the mode- $n$ basis in the solution of (19) by

[TABLE]

Similar to HOSVD, L1-HOSVD decouples the basis optimization task across the modes. We observe that (21) is, in fact, L1-PCA of the mode- $n$ flattening of ${\mathbfcal X}$ . As mentioned above, there are multiple algorithms in the literature for solving L1-PCA in (21), both exactly and approximately [21, 36, 37]. As an L1-Tucker solution framework, L1-HOSVD allows for the employment of any solver for (21), thus making possible different performance/cost trade-offs. In this work, we demonstrate the employment of the simple alternating-optimization method of Algorithm 1, initialized, for example, at the solution of standard HOSVD. That is, L1-HOSVD returns

[TABLE]

for every $n\in[N]$ . As presented in Section II-C, the computational cost of (22) is $\mathcal{O}(D_{n}d_{n}P)$ , where $P:=\prod_{m\in[N]}D_{m}$ , when the number of L1-PCA iterations is linear in $D_{n}$ . A pseudocode of L1-HOSVD is offered in Algorithm 2.

III-C Proposed L1-HOOI Method

Similar to HOSVD, for general dense processed tensors, the disjoint mode optimization of L1-HOSVD may be very limiting. Next, we present L1-HOOI, an iterative algorithm for approximating the solution to (19). Specifically, L1-HOOI is arbitrarily initialized to $N$ feasible bases and conducts a sequence of iterations in which it updates all bases $\{\mathbf{U}_{n}\}_{n\in[N]}$ such that the objective value of L1-Tucker increases. Thus, when initialized to the L1-HOSVD bases, L1-HOOI is guaranteed to outperform L1-HOSVD in the L1-Tucker metric. In this approach, L1-HOSVD can be viewed as an initialization for L1-HOOI, or, conversely, L1-HOOI can be viewed as a refinement of L1-HOSVD.

Formally, L1-HOOI first initializes bases $\{\mathbf{U}_{n}^{(0)}\in\mathbb{S}({D_{n},d_{n}})\}_{n\in[N]}$ –for example, $\mathbf{U}_{n}^{(0)}=\mathbf{U}_{n}^{\text{l1-hosvd}}$ . Then, at the $q$ -th iteration, $q=1,2,\ldots$ , it successively optimizes all $N$ bases, in increasing mode order $n=1,2,\ldots$ . Specifically, at a given iteration $q$ and mode index $n$ , L1-HOOI fixes $\mathbf{U}_{m}^{(q)}$ for $m<n$ and $\mathbf{U}_{k}^{(q-1)}$ for $k>n$ and seeks the mode- $n$ basis $\mathbf{U}_{n}^{(q)}$ that maximizes the L1-Tucker metric. That is, for a given index pair $(q,n)$ , L1-HOOI pursues

[TABLE]

Defining

[TABLE]

(23) is equivalently rewritten in the familiar L1-PCA form

[TABLE]

We notice that, in contrast to (21), the metric of (25) involves the jointly optimized bases of the other modes. As discussed above, there are multiple solvers for (25) that can attain different performance/cost trade-offs. The proposed L1-HOOI framework can be combined with any L1-PCA solver. In the sequel, we employ again the iterative solver of Algorithm 1. That is, for any ( $q,n$ ), we set

[TABLE]

A pseudocode of the proposed L1-HOOI method is offered in Algorithm 3. A formal convergence analysis of the L1-HOOI iterations is presented below.

III-C1 Convergence

We start with introducing Lemma 2, which shows that the $q$ -th update of the mode- $n$ basis increases the L1-Tucker metric.

Lemma 2.

Lemma 1 implies that, for fixed $\{\mathbf{U}_{m}^{(q)}\}_{m<n}$ and $\{\mathbf{U}_{k}^{(q-1)}\}_{k>n}$ ,

[TABLE]

We note that Lemma 2 would also hold if, instead of (26), we employed the bit-flipping iterative L1-PCA solver of [36], initialized to ${\mathbf{U}_{n}^{(q-1)}}$ . Also, Lemma 2 certainly holds true if ${\mathbf{U}_{n}^{(q)}}$ is computed by the exact solution of (25) –that is, by means of the exact algorithms of [21]. The following new Lemma 3 shows that, within the same iteration, the metric increases as we successively optimize the bases.

Lemma 3.

For any $q>0$ and every $n>m\in[N]$ , it holds that

[TABLE]

Proof.

It holds that

[TABLE]

By induction, for every $n>m$ , $\left\|{\mathbf{U}_{n}^{(q)}}^{\top}\mathbf{A}_{n}^{(q)}\right\|_{1}\geq\left\|{\mathbf{U}_{m}^{(q)}}^{\top}\mathbf{A}_{m}^{(q)}\right\|_{1}$ . ∎

The following new Lemma 4 and Proposition 1 conclude our analysis on the monotonic increase of the L1-Tucker metric across the iterations of L1-HOOI.

Lemma 4.

For every $q>0$ , it holds that

[TABLE]

Proof.

It holds that

[TABLE]

∎

In view of Lemmas 2, 3, and 4, the following Proposition 1 holds true.

Proposition 1.

For any $n\in[N]$ and every $q^{\prime}>q$

[TABLE]

Proof.

It holds that

[TABLE]

∎

Denoting $p:=\prod_{n\in[N]}d_{n}$ , the following Lemma 5 provides an upper bound for the L1-Tucker metric.

Lemma 5.

For any $\{\mathbf{U}_{n}\in\mathbb{S}(D_{n},d_{n})\}_{n\in[N]}$ , it holds that

[TABLE]

Proof.

Let $\mathbfcal Y={\mathbfcal X}\times_{n\in[N]}\mathbf{U}_{n}^{\top}\in\mathbb{R}^{d_{1}\times\ldots\times d_{N}}$ and define $\mathbf{y}=\text{vec}([\mathbfcal Y]_{(1)})$ and $\mathbf{x}=\text{vec}([\mathbfcal X]_{(1)})$ . It holds that $\|\mathbfcal Y\|_{1}=\|\mathbf{y}\|_{1}$ and $\|{\mathbfcal X}\|_{F}=\|\mathbf{x}\|_{2}$ . Also, define $\mathbf{Z}=\mathbf{U}_{N}\otimes\mathbf{U}_{N-1}\otimes\ldots\otimes\mathbf{U}_{1}\in\mathbb{S}(P,p)$ . We observe that

[TABLE]

Accordingly,

[TABLE]

∎

Lemma 5 shows that the L1-Tucker metric is upper bounded by $\sqrt{p}\|{\mathbfcal X}\|_{F}$ . This, in conjunction with Proposition 1, imply that as $q$ increases the L1-HOOI iterations converge in the L1-Tucker metric.

To visualize the convergence, we carry out the following study. We form $5$ -way tensor ${\mathbfcal X}\in\mathbb{R}^{10\times 10\times\ldots\times 10}$ , drawing independent entries from $\mathcal{N}(0,1)$ . Then, we apply L1-HOOI on ${\mathbfcal X}$ , initialized to L1-HOSVD. In Fig. 3, we plot the evolution of L1-Tucker metric $\|{\mathbfcal X}\times_{n\in[N]}{\mathbf{U}_{n}^{(q)}}^{\top}\|_{1}$ , versus the L1-HOOI iteration index $q$ . In accordance to our formal analysis, we observe the monotonic increase of the metric and convergence after $16$ iterations.

In practice, one can terminate the L1-HOOI iterations when the metric-increase ratio

[TABLE]

drops below a predetermined threshold $\tau>0$ , or when $q$ exceeds a maximum number of permitted iterations. Next, we discuss the computational cost of L1-HOOI.

III-C2 Complexity Analysis

As studied above, initialization of L1-HOOI by means of L1-HOSVD costs $\mathcal{O}(\max_{k\in[N]}D_{k}d_{k}P)$ , where $P=\prod_{m\in[N]}D_{m}$ . Then, at iteration $q$ , L1-HOOI computes matrix $\mathbf{A}_{n}^{(q)}$ in (25) and its L1-PCA, for every $n$ . Matrix $\mathbf{A}_{n}^{(q)}$ can computed by a sequence of matrix-to-matrix products as follows. First, we compute the mode- $k$ product of ${\mathbfcal X}$ with $\mathbf{U}_{k}^{(z_{k})}$ , for some $k\neq n$ ( $z_{k}=q$ if $k<n$ and $z_{k}=q-1$ if $k>n$ ), with cost $\mathcal{O}(d_{k}P)$ . Next, we compute the $l$ -mode product of ${\mathbfcal X}\times_{k}\mathbf{U}_{k}^{(z_{k})}$ with $\mathbf{U}_{l}^{(z_{l})}$ , for some $l\notin\{n,k\}$ , with cost $\mathcal{O}(d_{l}d_{k}P_{k})$ , where $P_{k}=\prod_{m\in[N]\setminus k}D_{m}$ . We observe that the second product has lower cost than the first, for any selection of $k$ and $l$ . Similarly, each subsequent mode product will have further reduced cost. Thus, keeping the dominant term (cost of first product) and taking products in a computationally favorable order, the computation of $\mathbf{A}_{n}^{(q)}$ costs $\mathcal{O}(\min_{k\in[N]\setminus n}d_{k}P)$ . Importantly, the cost of computing $\mathbf{A}_{n}^{(q)}$ is the same for every iteration index $q$ . After $\mathbf{A}_{n}^{(q)}$ is computed, (25) is solved with cost $\mathcal{O}(D_{n}^{2}p)$ , as shown above, where $p=\prod_{m\in[N]}d_{m}$ . Thus, for fixed $(q,n)$ , computation and L1-PCA of $\mathbf{A}_{n}^{(q)}$ cost $\min_{k\in[N]\setminus n}\{d_{k}P{+}D_{n}^{2}p\}$ . Then, we observe that, there exists mode index $n\in[N]$ such that, the cost of computing $\mathbf{A}_{n}^{(q)}$ and its L1-PCA basis dominates over the respective cost of the $m$ -mode for every $m\in[N]\setminus n$ . The latter implies that the dominant cost at iteration $q$ of L1-HOOI is $\max_{n\in[N]}\min_{k\in[N]\setminus n}\{d_{k}P+D_{n}^{2}p\}$ . Denoting by $T$ the maximum number of iterations permitted by L1-HOOI, the overall computational cost of L1-HOOI is $\mathcal{O}(T(\max_{n\in[N]}\min_{k\in[N]\setminus n}\{D_{k}d_{k}P_{k}+D_{n}^{2}p\})$ . In Table I, we offer the computational costs of PCA, L1-PCA, HOSVD, L1-HOSVD, HOOI, and L1-HOOI. We observe that the costs of the proposed L1-HOSVD and L1-HOOI algorithms are comparable with those of standard HOSVD and HOOI respectively. Next, we conduct numerical studies and compare the performance of standard Tucker solvers with that of the proposed L1-Tucker algorithms.

IV Numerical Studies

IV-A Tensor Reconstruction

We set $N=5$ , $D_{1}=D_{3}=D_{5}=10$ , $D_{2}=D_{4}=15$ , $d_{1}=d_{2}=6$ , $d_{3}=d_{4}=d_{5}=4$ , and generate Tucker-structured ${\mathbfcal X}={\mathbfcal G}\times_{n\in[5]}\mathbf{U}_{n}$ . The core tensor ${\mathbfcal G}$ draws entries from $\mathcal{N}(0,9)$ and, for every $n$ , $\mathbf{U}_{n}$ is an arbitrary basis. We corrupt all entries of ${\mathbfcal X}$ with zero-mean unit-variance additive white Gaussian noise (AWGN), disrupting its Tucker structure. Moreover, we corrupt $N_{o}$ out of the $P=\prod_{i=1}^{5}D_{i}=225,000$ entries of ${\mathbfcal X}$ –corruption ratio $\rho=\frac{N_{o}}{P}$ – by adding high magnitude outliers drawn from $\mathcal{N}(0,\sigma_{o}^{2})$ . Thus, we form ${\mathbfcal X}^{\text{corr}}={\mathbfcal X}+\mathbfcal N+{\mathbfcal O}$ , where $\mathbfcal N$ and ${\mathbfcal O}$ model AWGN and sparse outliers, respectively. Our objective is to reconstruct ${\mathbfcal X}$ from the available ${\mathbfcal X}^{\text{corr}}$ . Towards our objective, we Tucker decompose ${\mathbfcal X}^{\text{corr}}$ by means of HOSVD, HOOI, L1-HOSVD, and L1-HOOI and obtain bases $\{\hat{\mathbf{U}_{n}}\}_{n\in[5]}$ . Then, we reconstruct ${\mathbfcal X}$ as $\hat{{\mathbfcal X}}={\mathbfcal X}^{\text{corr}}\times_{n\in[5]}\hat{\mathbf{U}_{n}}\hat{\mathbf{U}_{n}^{\top}}$ . The normalized squared error (NSE) is defined as ${\|{\mathbfcal X}-\hat{{\mathbfcal X}}\|_{F}^{2}}{\|{\mathbfcal X}\|_{F}^{2}}^{-1}.$ In Figure 4(a), we set $N_{o}=300$ ( $\rho=1.33~{}10^{-3}$ ) and plot the mean NSE (MNSE), evaluated over 1000 independent noise/outlier realizations, versus outlier standard deviation $\sigma_{o}=4,8,\ldots,28$ . In the absence of outliers ( $\sigma_{o}=0$ ), all methods under comparison exhibit similarly low MNSE. As the outlier standard deviation $\sigma_{o}$ increases the MNSE of all methods increases. We notice that the performances of HOSVD and HOOI markedly deteriorate for $\sigma_{o}\geq 12$ and $\sigma_{o}\geq 20$ respectively. On the other hand, L1-HOSVD and L1-HOOI remain robust against corruption, across the board.

In Figure 4(b), we set $\sigma_{o}=26$ and plot the MNSE versus number of outliers $N_{o}=0,40,\ldots,400$ . Expectedly, in the absence of outliers ( $N_{o}=0$ ), all methods exhibit low MNSE. As the number of outliers increases, HOSVD and HOOI start exhibiting high reconstruction error, while L1-HOSVD and L1-HOOI remain robust. For instance, the MNSE of L1-HOSVD for $N_{o}=400$ outliers is lower than the MNSE of standard HOSVD for $N_{o}=40$ (ten times fewer) outliers.

Finally, in Figure 4(c), we set $\sigma_{o}=28$ , $N_{o}=150$ ( $\approx 0.07\%$ of total data entries are corrupted) and plot the MNSE versus $d_{n}\forall n$ while $d_{m}$ is set to its nominal value for every $m\in[N=5]\setminus n$ . We observe that, even for a very small fraction of outlier corrupted entries in ${\mathbfcal X}^{\text{corr}}$ , standard Tucker methods are clearly misled across all $5$ modes. On the other hand, the proposed L1-Tucker counterparts, exhibit sturdy outlier resistance and reconstruct ${\mathbfcal X}$ well, remaining almost unaffected by the outlying entries in ${\mathbfcal X}^{\text{corr}}$ .

A robust tensor analysis algorithm, specifically designed for counteracting sparse outliers, is the High-Order Robust PCA (HORPCA) [19]. Formally, given ${\mathbfcal X}^{\text{corr}}$ , HORPCA solves

[TABLE]

Authors in [19] presented the HoRPCA-S algorithm for the solution of (48) which relies on a specific sparsity penalty parameter $\lambda$ , as well as a thresholding variable $\mu$ . The model in (48) was introduced considering that, apart from the sparse outliers, there is no dense (full rank) corruption to ${\mathbfcal X}$ (see [19], Section 2.6). In the case of additional dense corruption, HORPCA is typically accompanied by HOSVD [19, 34, 33]. In the sequel, we refer to this approach as HORPCA+HOSVD.

In our next study, we set $N=5$ , $D_{n}=5$ , and $d_{n}=2$ for every $n$ , and build the Tucker-structured data tensor ${\mathbfcal X}={\mathbfcal G}\times_{n\in[5]}\mathbf{U}_{n}$ , where the entries of core ${\mathbfcal G}$ are independently drawn from $\mathcal{N}(0,12^{2})$ . Then, we add both dense AWGN and sparse outliers, creating ${\mathbfcal X}^{\text{corr}}={\mathbfcal X}+\mathbfcal N+\mathbfcal O$ , where the entries of noise $\mathbfcal N$ are drawn independently from $\mathcal{N}(0,1)$ and the $15$ non-zero entries of $\mathbfcal O$ (in arbitrary locations) are drawn from $\mathcal{N}(0,20^{2})$ . Then, we attempt to reconstruct ${\mathbfcal X}$ from the available ${\mathbfcal X}^{\text{corr}}$ using HOOI, HORPCA (for $\lambda=0.2,0.6,\ldots,3$ and $\mu=300,500$ ), HORPCA+HOSVD (same $\lambda$ and $\mu$ combinations as HORPCA), and the proposed L1-HOOI.

In Figure 5, we plot MNSE computed over $50$ data/noise/corruption realizations, versus $\lambda$ for the four methods. In addition, we plot the average noise-to-data benchmark ${\|\mathbfcal N\|_{F}^{2}}{\|{\mathbfcal X}\|_{F}^{-2}}.$ In accordance with our previous studies, we observe that L1-HOOI offers markedly lower MNSE than standard HOOI. In addition, we notice that for specific selection of $\mu$ and $\lambda$ ( $\mu=300$ and $\lambda=0.6$ ) HORPCA+HOSVD may attain MNSE even lower than HOOI. However, for any different selection of $\lambda$ , HORPCA+HOSVD attains higher MNSE than HOOI. In addition, we plot the performance of HORPCA when it is not followed by HOSVD. We notice that, expectedly, for specific selections of $\mu$ and $\lambda$ the method is capable of removing the outliers, but not the dense noise component –thus, the MNSE approaches the average noise-to-data benchmark. This study highlights the corruption-resistance of L1-HOOI, while, similarly to HOSVD and HOOI, it does not depend on any tunable parameters, other than $\{d_{n}\}_{n\in[N]}$ .

IV-B Classification

Tucker decomposition is commonly employed for classification of multi-way data samples. Below, we consider the Tucker-based classification framework originally presented in [45]. That is, we consider $C$ classes of order- $N$ tensor objects of size $D_{1}\times D_{1}\times\ldots\times D_{N}$ and $M_{c}$ labeled samples available from the $c$ -th class, $c\in[C]$ , that can be used for training a classifier. The training data from class $c$ are organized in tensor ${\mathbfcal X}_{c}\in\mathbb{R}^{D_{1}\times D_{2}\times\ldots\times D_{N}\times M_{c}}$ and the total of $M=\sum_{c=1}^{C}M_{c}$ training data are organized in tensor ${\mathbfcal X}\in\mathbb{R}^{D_{1}\times\ldots\times D_{N}\times M}$ , constructed by concatenation of ${\mathbfcal X}_{1},\ldots,{\mathbfcal X}_{C}$ across mode $(N+1)$ .

In the first processing step, ${\mathbfcal X}$ is Tucker decomposed, obtaining the feature bases $\{\mathbf{U}_{n}\in\mathbb{S}(D_{n},d_{n})\}_{n\in[N]}$ for the first $N$ modes (feature modes) and the sample basis $\mathbf{Q}\in\mathbb{S}(M,M)$ for the $(N+1)$ -th mode (sample mode). The obtained feature bases are then used to compress the training data, as

[TABLE]

for every $c\in[C]$ . Then, the $M_{c}$ compressed tensor objects from the $c$ -th class are vectorized (equivalent to mode- $(N+1)$ flattening) and stored in the data matrix

[TABLE]

where $p=\prod_{n\in[N]}d_{n}$ . Finally, the labeled columns of $\{\mathbf{G}_{c}\}_{c\in[C]}$ are used to train any standard vector-based classifier, such as support vector machines (SVM), or $k$ -nearest-neighbors ( $k$ -NN).

When an unlabeled testing point $\mathbfcal Y\in\mathbb{R}^{D_{1}\times\ldots\times D_{N}}$ is received, it is first compressed using the Tucker-trained bases as $\mathbfcal Z=\mathbfcal Y\times_{n\in[N]}\mathbf{U}_{n}^{\top}$ . Then, $\mathbfcal Z$ is vectorized as $\mathbf{z}=\text{vec}(\mathbfcal Z)=[\mathbfcal Z]_{(N+1)}^{\top}$ . Finally, vector $\mathbf{z}$ is classified based on the standard vector classifier trained above.

In this study, we focus on the classification of order- $2$ data ( $N=2$ ) from the MNIST image dataset of handwritten digits [46]. Specifically, we consider $C=5$ digit classes (digits $0,1,\ldots,4$ ) and $M_{1}=\ldots=M_{5}=10$ image samples of size $(D=D_{1}=28)\times(D=D_{2})$ available from each class. To make the classification task more challenging, we consider that each training image is corrupted by heavy-tail noise with probability $\alpha$ . Then, each pixel of a corrupted image is additively corrupted by heavy tailed noise $n\sim\text{unif}(0,v)$ , with probability $\beta$ . Denoting the average pixel energy by $w^{2}=\frac{1}{D^{2}M}\|{\mathbfcal X}\|_{F}^{2}$ , we choose $v$ so that $\frac{w}{\sqrt{\mathbb{E}\{n^{2}\}}}=10$ .

We conduct Tucker-based classification as described above, for $d=d_{1}=d_{2}$ , using a nearest-neighbor (NN) classifier (i.e., $1$ -NN), by which testing sample $\mathbf{z}$ is assigned to class111We consider a simple classifier, so that the study focuses to the impact of each compression method.

[TABLE]

For a given training dataset, we classify $500$ testing points from each class. Then, we repeat the training/classification procedure on $300$ distinct realizations of training data, testing data, and corruptions. In Figure 6, we plot the average classification accuracy versus ${d}$ for $\alpha=0.2$ and $\beta=0.5$ , for HOSVD, HOOI, L1-HOSVD, L1-HOOI, as well as PCA, L1-PCA,222Denoting by $\mathbf{U}$ the $\min\{p,M\}$ PCs/L1-PCs of $[{\mathbfcal X}]_{(N+1)}^{\top}\in\mathbb{R}^{P\times M}$ , we train any classifier on the labeled columns of $\mathbf{U}^{\top}[{\mathbfcal X}]_{(N+1)}$ and classify the vectorized and projected testing sample $\mathbf{U}^{\top}\text{vec}(\mathbfcal Y)$ . and plain NN classifier that returns the label of the nearest column of $[{\mathbfcal X}]_{(N+1)}^{\top}\in\mathbb{R}^{P\times M}$ to the vectorized testing sample $\text{vec}(\mathbfcal Y)$ . We observe that, in general, the compression-based methods can attain superior performance than plain NN. Moreover, we notice that ${d}>7$ implies $p>M$ and, thus, the PCA/L1-PCA methods attain constant performance, equal to plain NN. Moreover, we notice that L1-PCA outperforms PCA, for every value of $d\leq 7$ . For $4\leq d\leq 7$ , PCA/L1-PCA outperform the Tucker methods. Finally, the proposed L1-Tucker methods outperform standard Tucker and PCA/L1-PCA, for every $d$ , and attain the highest classification accuracy of about $89\%$ for $d=6$ ( $5\%$ higher than plain NN).

Next, we fix $d=5$ and $\beta=0.8$ and plot in Figure 7 the average classification accuracy, versus $\alpha$ . This figure reveals the sensitivity of standard HOSVD and HOOI as the training data corruption probability increases. At the same time, the proposed L1-Tucker methods exhibit robustness against the corruption, maintaining the highest average accuracy for every value of $\alpha$ . For instance, for image-corruption probability $\alpha=0.3$ , L1-HOSVD and L1-HOOI attain about $87\%$ accuracy, while HOSVD and HOOI attain accuracy $75\%$ and $71\%$ , respectively.

Last, in Figure 8, we plot the average classification accuracy, versus the pixel corruption probability $\beta$ , fixing again $\alpha=0.2$ and $d=5$ . We observe that, for any value of $\beta$ , the performance of the L1-HOSVD and L1-HOOI does not drop below $86\%$ and $87.5\%$ , respectively. On the other hand, as $\beta$ increases, NN and PCA-based methods perform close to $85\%$ . The performance of standard Tucker methods decreases markedly, even as low as $76\%$ , for intense corruption with $\beta=0.8$ . The above studies highlight the benefit of L1-Tucker compared to standard Tucker and PCA counterparts.

V Conclusions

We studied L1-Tucker, an L1-norm based reformulation of standard Tucker tensor decomposition. In addition, we presented two algorithms for its solution, L1-HOSVD and L1-HOOI. Both algorithms were accompanied by formal complexity and convergence analysis. We carried out numerical studies on tensor reconstruction and classification, both on synthetic and on real data, comparing the proposed L1-Tucker methods with standard counterparts. In our numerical studies, L1-Tucker performed similar to standard Tucker when the processed data are corruption-free, while, in contrast to Tucker, it attained sturdy resistance against heavy corruptions.

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. P. Markopoulos, D. G. Chachlakis, and A. Prater-Bennette , L 1-norm Higher-Order Singular-value Decomposition , in Proceedings of IEEE Global Conference on Signal and Information Processing, Anaheim, CA, 2018, pp. 1353–1357.
2[2] L. R. Tucker , Some mathematical notes on three-mode factor analysis , Psychometrika, 31 (1966), pp. 279–311.
3[3] T. G. Kolda and B. W. Bader , Tensor decompositions and applications , SIAM Rev., 51 (2009), pp. 455–500.
4[4] E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos , Tensors for data mining and data fusion: Models, applications, and scalable algorithms , ACM Transactions on on Intelligence Systems Technology, 8 (2017), pp. 16:1–16:44.
5[5] N. D. Sidiropoulos, L. De Lathauwer, X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos , Tensor decomposition for signal processing and machine learning , IEEE Transactions on Signal Processing, 65 (2017), pp. 3551–3582.
6[6] I. V. Cavalcante, A. L. F. de Almeida, and M. Haardt , Tensor-based approach to channel estimation in amplify-and-forward MIMO relaying systems , in Proceedings of IEEE Workshop on Sensor Array Multichannel Signal Processing, A Coruña, Spain, 2014, pp. 445–448.
7[7] L. De Lathauwer, B. D. Moor, and J. Vandewalle , A multilinear singular value decomposition , SIAM Journal on Matrix Analysis and Applications, 21 (2000), pp. 1253–1278.
8[8] M. Haardt, F. Roemer, and G. Del Galdo , Higher-order SVD-based subspace estimation to improve the parameter estimation accuracy in multidimensional harmonic retrieval problems , IEEE Transactions on Signal Processing, 56 (2008), pp. 3198–3213.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

L1-norm Tucker Tensor Decomposition

Abstract

Index Terms:

I Introduction

II Technical Background

II-A Definitions and Notation

II-B Tucker Decomposition

II-B1 HOSVD Method

II-B2 HOOI Method

II-C Data Corruption and L1-norm Reformulation of PCA

Theorem 1**.**

Lemma 1**.**

III L1-Tucker Decomposition

III-A Formulation

III-B Proposed L1-HOSVD Method

III-C Proposed L1-HOOI Method

III-C1 Convergence

Lemma 2**.**

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Proposition 1**.**

Proof.

Lemma 5**.**

Proof.

III-C2 Complexity Analysis

IV Numerical Studies

IV-A Tensor Reconstruction

IV-B Classification

V Conclusions

Theorem 1.

Lemma 1.

Lemma 2.

Lemma 3.

Lemma 4.

Proposition 1.

Lemma 5.