Rank Approximation of a Tensor with Applications in Color Image and   Video Processing

Ramin Goudarzi Karim; Carmeliza Navasca; Da Yan

arXiv:1904.12375·math.NA·April 30, 2019

Rank Approximation of a Tensor with Applications in Color Image and Video Processing

Ramin Goudarzi Karim, Carmeliza Navasca, Da Yan

PDF

Open Access

TL;DR

This paper introduces a block coordinate descent algorithm that estimates tensor rank and provides its canonical polyadic decomposition, with applications demonstrated on color images and videos.

Contribution

The paper presents a novel sparse optimization-based algorithm for tensor rank estimation and decomposition, applicable to image and video processing.

Findings

01

Effective tensor rank estimation on color images and videos

02

Successful application of the algorithm to real-world visual data

03

Demonstrated improvement over existing methods in tensor approximation

Abstract

We propose a block coordinate descent type algorithm for estimating the rank of a given tensor. In addition, the algorithm provides the canonical polyadic decomposition of a tensor. In order to estimate the tensor rank we use sparse optimization method using $ℓ_{1}$ norm. The algorithm is implemented on single moving object videos and color images for approximating the rank.

Tables1

Table 1. Table 1: Rank Approximation

	Size of Tensor
	$I, J, K = 5$	$I, J, K = 7$	$I, J, K = 10$
Actual Rank	5	8	10
Upper bound	10	15	20
Estimated Rank	5	8	12
Residual error	2.85e-1	1.34e-1	1.20e-1
Relative error	5.17e-2	1.05e-2	5.00e-3
Time	2.23	3.86	6.39

Equations175

α min ∥ α ∥_{0} s.t. X = r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r})

α min ∥ α ∥_{0} s.t. X = r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r})

A, B, C, α min \frac{1}{2} ∥ X - r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r}) ∥_{F}^{2} + γ ∥ α ∥_{1}

A, B, C, α min \frac{1}{2} ∥ X - r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r}) ∥_{F}^{2} + γ ∥ α ∥_{1}

min \frac{1}{2} ∥ X - r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r}) ∥_{F}^{2} + \frac{λ}{2} (∥ A ∥_{F}^{2} + ∥ B ∥_{F}^{2} + ∥ C ∥_{F}^{2}) + γ ∥ α ∥_{1} .

min \frac{1}{2} ∥ X - r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r}) ∥_{F}^{2} + \frac{λ}{2} (∥ A ∥_{F}^{2} + ∥ B ∥_{F}^{2} + ∥ C ∥_{F}^{2}) + γ ∥ α ∥_{1} .

a \otimes b = (a_{1} b^{T} \dots a_{I} b^{T})^{T} .

a \otimes b = (a_{1} b^{T} \dots a_{I} b^{T})^{T} .

A ⊙ B = (a_{1} \otimes b_{1} \dots a_{J} \otimes b_{J}) .

A ⊙ B = (a_{1} \otimes b_{1} \dots a_{J} \otimes b_{J}) .

x_{ij k} = a_{i} b_{j} c_{k} .

x_{ij k} = a_{i} b_{j} c_{k} .

vec (W)_{l} = v (l) = w_{ij}

vec (W)_{l} = v (l) = w_{ij}

vec (X)_{β (i, j, k)} = x_{ij k}

vec (X)_{β (i, j, k)} = x_{ij k}

vec (a \circ b \circ c) = c \otimes b \otimes a

vec (a \circ b \circ c) = c \otimes b \otimes a

j = 1 + k = 1 k \neq = n \sum N (i_{k} - 1) J_{k} with J_{k} = m = 1 m \neq = n \prod k - 1 I_{m} .

j = 1 + k = 1 k \neq = n \sum N (i_{k} - 1) J_{k} with J_{k} = m = 1 m \neq = n \prod k - 1 I_{m} .

X_{(1)} (i, l) = x_{ij k}, where l = j + (k - 1) J and X_{(1)} \in R^{I \times J K}

X_{(1)} (i, l) = x_{ij k}, where l = j + (k - 1) J and X_{(1)} \in R^{I \times J K}

X_{(2)} (j, l) = x_{ij k}, where l = i + (k - 1) I and X_{(2)} \in R^{J \times K I}

X_{(2)} (j, l) = x_{ij k}, where l = i + (k - 1) I and X_{(2)} \in R^{J \times K I}

X_{(3)} (k, l) = x_{ij k}, where l = i + (j - 1) I and X_{(3)} \in R^{K \times I J}

X_{(3)} (k, l) = x_{ij k}, where l = i + (j - 1) I and X_{(3)} \in R^{K \times I J}

X \approx r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r})

X \approx r = 1 \sum R α_{r} (a_{r} \circ b_{r} \circ c_{r})

A = [a_{1} \dots a_{R}], B = [b_{1} \dots b_{R}], C = [c_{1} \dots c_{R}]

A = [a_{1} \dots a_{R}], B = [b_{1} \dots b_{R}], C = [c_{1} \dots c_{R}]

A, B, C, α min \frac{1}{2} ∥ X - [A, B, C, α]_{R} ∥_{F}^{2}

A, B, C, α min \frac{1}{2} ∥ X - [A, B, C, α]_{R} ∥_{F}^{2}

\frac{1}{2} ∥ X_{(1)} - A diag (α) (C ⊙ B)^{T} ∥_{F}^{2},

\frac{1}{2} ∥ X_{(1)} - A diag (α) (C ⊙ B)^{T} ∥_{F}^{2},

\frac{1}{2} ∥ X_{(2)} - B diag (α) (C ⊙ A)^{T} ∥_{F}^{2},

\frac{1}{2} ∥ X_{(2)} - B diag (α) (C ⊙ A)^{T} ∥_{F}^{2},

\frac{1}{2} ∥ X_{(2)} - C diag (α) (B ⊙ A)^{T} ∥_{F}^{2},

\frac{1}{2} ∥ X_{(2)} - C diag (α) (B ⊙ A)^{T} ∥_{F}^{2},

\frac{1}{2} ∥ vec (X) - vec ([A, B, C, α]_{R}) ∥_{2}^{2}

\frac{1}{2} ∥ vec (X) - vec ([A, B, C, α]_{R}) ∥_{2}^{2}

α min ∥ α ∥_{0} s.t. X = [A, B, C, α]_{R}

α min ∥ α ∥_{0} s.t. X = [A, B, C, α]_{R}

α min ∥ α ∥_{1} s.t. X = [A, B, C, α]_{R}

α min ∥ α ∥_{1} s.t. X = [A, B, C, α]_{R}

A, B, C, α min \frac{1}{2} ∥ X - [A, B, C, α] ∥_{F}^{2} + γ ∥ α ∥_{1}

A, B, C, α min \frac{1}{2} ∥ X - [A, B, C, α] ∥_{F}^{2} + γ ∥ α ∥_{1}

[A, B, C, α]_{R} = [c A, c^{- 1} B, C, α]_{R}

[A, B, C, α]_{R} = [c A, c^{- 1} B, C, α]_{R}

f (A, B, C, α) = \frac{1}{2} ∥ X - [A, B, C, α] ∥_{F}^{2}

f (A, B, C, α) = \frac{1}{2} ∥ X - [A, B, C, α] ∥_{F}^{2}

g (α) = γ ∥ α ∥_{1}

g (α) = γ ∥ α ∥_{1}

min f (A, B, C, α) + \frac{λ}{2} (∥ A ∥_{F}^{2}) + ∥ B ∥_{F}^{2} + ∥ C ∥_{F}^{2}) + g (α) .

min f (A, B, C, α) + \frac{λ}{2} (∥ A ∥_{F}^{2}) + ∥ B ∥_{F}^{2} + ∥ C ∥_{F}^{2}) + g (α) .

∥ A ∥_{F} = ∥ B ∥_{F} = ∥ C ∥_{F} .

∥ A ∥_{F} = ∥ B ∥_{F} = ∥ C ∥_{F} .

Ψ : R^{R (I + J + K + 1)} \to R^{+}

Ψ : R^{R (I + J + K + 1)} \to R^{+}

Ψ (A, B, C, α)

Ψ (A, B, C, α)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Sparse and Compressive Sensing Techniques · Advanced Neuroimaging Techniques and Applications

Full text

Rank Approximation of a Tensor with Applications in Color Image and Video Processing

Ramin Goudarzi111Department of Mathematics, University of Alabama at Birmingham, Birmingham, AL 35294, USA, [email protected]. Carmeliza Navasca222Department of Mathematics, University of Alabama at Birmingham, 1300 University Boulevard, Birmingham, AL, 35294, USA, [email protected] Da Yan333Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL 35294, USA, [email protected].

Abstract

We propose a block coordinate descent type algorithm for estimating the rank of a given tensor. In addition, the algorithm provides the canonical polyadic decomposition of a tensor. In order to estimate the tensor rank we use sparse optimization method using $\ell_{1}$ norm. The algorithm is implemented on single moving object videos and color images for approximating the rank.

1 Introduction

In 1927, Hitchcock [17, 18] proposed the idea of the polyadic form of a tensor, i.e., expressing a tensor, multilinear array, as the sum of a finite number of rank-one tensors. This decomposition is called the canonical polyadic (CP) decompositon; it is known as CANDECOMP or PARAFAC. It has been extensively applied to many problems in various engineering [30, 32, 1, 13] and science [38, 22]. Specifically, tensor methods have been applied in many multidimensional datasets in signal processing applications [7, 9, 11], color image processing [43, 19] and video processing [33, 4]. Most of these applications rely on decomposing a tensor data into its low rank form to be able to perform efficient computing and to reduce memory requirements. In computer vision, detection of moving objects in video processing relies on foreground and background separation, i.e. the separation of the moving objects called foreground from the static information called background, requires low rank representation of video tensor. In color image processing, the rgb channels in color image representation requires extensions of the matrix models of gray-scale images to low rank tensor methods. There are several numerical techniques [8, 10, 23, 27, 30] for approximating a low rank tensor into its CP decomposition, but they do not give an approximation of the minimum rank. In fact, most low rank tensor algorithms require an a priori tensor rank to find the tensor decomposition. Several theoretical results [24, 25] on tensor rank can help, but they are limited to low-multidimensional and low order tensors.

In this work, the focus is on finding an estimation of the tensor rank and its rank-one tensor decomposition (CP) of a given tensor. There are also algorithms [7, 5] which give tensor rank, but they are specific to symmetric tensor decomposition over the complex field using algebraic geometry tools. Our proposed algorithm addresses two difficult problems for the CP decomposition: (a) one is that finding the rank of tensors is a NP-hard problem [16] and (b) the other is that tensors can be ill-posed [12] and failed to have their best low-rank approximations.

The problem of finding the rank of a tensor can be formulated as a constrained optimization problem.

[TABLE]

where $\|\alpha\|_{0}$ represents the total number of non-zero elements of $\alpha$ . The rank optimization problem is NP hard and so to make it more tractable, the following formulation [42] is used:

[TABLE]

where $\gamma>0$ is the regularization parameter and the objective function is a composition of smooth and non-smooth functions. Our formulation includes a Tikhonov type regularization:

[TABLE]

The added Tikhonov regularization has the effect of forcing the factor matrices to have the equal norm. Moreover, this formulation and its numerical methods described later give an overall improvement in the accuracy and thus, memory requirements of the tensor model found in [42].

1.1 Organization

Our paper is organized as follows. In Section 2, we provide some notations and terminologies used throughout this paper. In Section 3, we formulate an $l_{1}$ -regularization optimization to the low-rank approximation of tensors. In Section 4, we describe a numerical method to solve the $l_{1}$ -regularization optimization by using a proximal alternating minimization technique for the rank and an alternating least-squares for the decomposition. In Section 5, we provide an analysis of convergence of the numerical methods. The numerical experiments in Section 6 consist of simulated low rank tensor, color images and videos. Finally, our conclusion and future work are given in Section 7.

2 Preliminaries

We denote the scalars in $\mathbb{R}$ with lower-case letters $(a,b,\ldots)$ and the vectors with lower-case letters $({a},{b},\ldots)$ . The matrices are written as upper-case letters $({A},{B},\ldots)$ and the symbols for tensors are calligraphic letters $(\mathcal{A},\mathcal{B},\ldots)$ . The subscripts represent the following scalars: $\mathcal{(A)}_{ijk}=a_{ijk}$ , $({A})_{ij}=a_{ij}$ , $({a})_{i}=a_{i}$ and the $r$ -th column of a matrix ${A}$ is ${a_{r}}$ . The matrix sequence is denoted $\{{A}^{k}\}$ . An Nth order tensor $\mathcal{X}\in\mathbb{R}^{I_{1}\times I_{2}\times\cdots\times I_{N}}$ is a multidimensional array with entries $\mathcal{(X)}_{i_{1}i_{2}\cdots i_{N}}=x_{i_{1}i_{2}\cdots i_{N}}$ for $i_{k}\in\{1,\ldots,I_{k}\}$ where $k\in{1,\ldots,N}$ . In particular, a third order tensor $\mathcal{X}\in\mathbb{R}^{I\times J\times K}$ is a multidimensional array with entries $x_{ijk}$ for $i\in\{1,\ldots,I\}$ , $j\in\{1,\ldots,J\}$ and $k\in\{1,\ldots,K\}$ .

Here we present some standard definitions and relations in tensor analysis. The Kronecker product of two vectors $a\in\mathbb{R}^{I}$ and $b\in\mathbb{R}^{J}$ is denoted by $a\otimes b\in\mathbb{R}^{IJ}$ :

[TABLE]

The Khatri-Rao (column-wise Kronecker) product (see[37]) of two matrices $A\in\mathbb{R}^{I\times J}$ and $B\in\mathbb{R}^{K\times J}$ is defined as

[TABLE]

The outer product of three vectors $a\in\mathbb{R}^{I}$ , $b\in\mathbb{R}^{J}$ , $c\in\mathbb{R}^{K}$ is a third order tensor $\mathcal{X}=a\circ b\circ c$ with the entries defined as follows:

[TABLE]

Definition 2.1 (vec)

Given a matrix $W\in\mathbb{R}^{I\times J}$ , the function $\text{vec}:\mathbb{R}^{I\times J}\rightarrow\mathbb{R}^{I\cdot J}$ where $vec(W)=v$ is a vector of size $I\cdot J$ obtained from column-stacking the column vectors of $W$ ; i.e.

[TABLE]

where $l=j+(k-1)J$ .

The vectorization of a third order tensor $\mathcal{X}\in\mathbb{R}^{I\times J\times K}$ is the process of transforming the tensor into a column vector, the $\text{vec}:\mathbb{R}^{I\times J\times K}\to\mathbb{R}^{IJK}$ map is defined as

[TABLE]

where $\beta(i,j,k)=i+(j-1)I+(k-1)IJ$ . Using the definitions above, we get

[TABLE]

Definition 2.2 (Mode- $n$ matricization)

Matricization is the process of reordering the elements of an $N$ th order tensor into a matrix. The mode- $n$ matricization of a tensor $\mathcal{X}\in\mathbb{R}^{I_{1}\times I_{2}\times\cdots\times I_{N}}$ is denoted by $\mathcal{X}_{(n)}$ and arranges the mode- $n$ columns to be the columns of the resulting matrix. The mode- $n$ column, ${x_{i_{1}\cdots i_{n-1}:i_{n+1}\cdots i_{N}}}$ , is a vector obtained by fixing every index with the exception of the $n$ th index.

If we use a map to express such matricization process for any $N$ th order tensor $\mathcal{T}\in\mathbb{R}^{I_{1}\times I_{2}\times\cdots\times I_{N}}$ , that is, the tensor element $(i_{1},i_{2},\dots,i_{N})$ maps to matrix element $(i_{n},j)$ , then there is a formula to calculate $j$ :

[TABLE]

For example, the tensor unfolding or matricization of a third order tensor $\mathcal{X}$ is the process or rearranging the elements of $\mathcal{X}$ into a matrix. The mode- $n$ ( $n=1,2,3$ ) matricization is denoted by $\mathcal{X}_{(n)}$ and the elements of it can be expressed by the following relations:

[TABLE]

2.1 CP decomposition and the Alternating Least-Squares Method

In 1927, Hitchcock [17][18] proposed the idea of the polyadic form of a tensor, i.e., expressing a tensor as the sum of a finite number of rank-one tensors. Today, this decomposition is called the canonical polyadic (CP); it is known as CANDECOMP or PARAFAC. It has been extensively applied to many problems in various engineering [30, 32, 1, 13] and science [38, 22]. The well-known iterative method for implementing the sum of rank one terms is the Alternating Least-Squares (ALS) technique. Independently, the ALS was introduced by Carrol and Chang [6] and Harshman [15] in 1970. Among those numerical algorithms, the ALS method is the most popular one since it is robust. However, the ALS has some drawbacks. For example, the convergence of ALS can be extremely slow.

The CP decomposition of a given third order tensor $\mathcal{X}\in\mathbb{R}^{I\times J\times K}$ factorizes it to a sum of rank one tensors.

[TABLE]

For simplicity we use the notation $[A,B,C,\alpha]_{R}$ to represent the sum on the right hand side of the equation above, where $A\in\mathbb{R}^{I\times R}$ , $B\in\mathbb{R}^{J\times R}$ and $C\in\mathbb{R}^{K\times R}$ are called factor matrices.

[TABLE]

The CP decomposition problem can be formulated as an optimization problem. Given $R$ the goal is to find vectors $a_{r},b_{r},c_{r}$ , such that the distance between the tensor $\mathcal{X}$ and the sum of the outer products of $a_{r},b_{r},c_{r}$ is minimized. The Frobenius norm (sum of squares of the entries) is mainly used to measure the distance.

[TABLE]

Using the Khatri-Rao product, the objective function in (2.2) can be stated in the following four equivalent forms:

[TABLE]

and

[TABLE]

All the functions in (2.3), (2.4), (2.5) and (2.6) are linear least squares problems with respect to matrices A, B, C and vector $\alpha$ . To find approximations to A,B,C, and $\alpha$ , these four optimization problems (2.3)-(2.6) are implemented iteratively and the minimizers are updated between each optimization problems (via Gauss-Seidel sweep) with a stopping criteria. This technique is called the Alternating Least Squares (ALS) Method. The ALS method is popular since it is robust and easily implementable. However, the ALS has some drawbacks. For example, the convergence of ALS can be extremely slow. Another drawback is the requirement of a tensor rank $R$ before a CP decomposition is approximated. The next sections deal with tensor rank approximation.

3 Rank Approximation of a Tensor

The problem of finding the rank of a tensor can be formulated as a constrained optimization problem.

[TABLE]

where $\|\alpha\|_{0}$ represents the total number of non-zero elements of $\alpha$ . Since the problem is NP hard (ref), we replace $\|\alpha\|_{0}$ by the $\ell_{1}$ norm of $\alpha$ . The $\ell_{1}$ norm is defined as the sum of absolute value of the elements of $\alpha$ . So the rank approximation problem can be written as

[TABLE]

In order to obtain a CP decomposition of the given tensor $\mathcal{X}$ as well as the rank approximation, we formulate the rank approximation problem as follow:

[TABLE]

where $\gamma>0$ is the regularization parameter. The objective function of the problem (3.1) is non-convex and non-smooth. However, it is a composition of a smooth and non-smooth functions.

Moreover, it is known that CP decomposition of a tensor is unique up to scaling anf permutation of factor matrices. Note that

[TABLE]

for a nonzero scalar $c\in\mathbb{R}$ . In order to overcome the scaling indeterminacy, we add a Tikhonov type regularization term to our objective function [20]. Let f and g be the following:

[TABLE]

and

[TABLE]

which represent the fitting term and the $\ell_{1}$ regularization term in (3.1), then the rank approximation problem can be formulated as

[TABLE]

The added Tikhonov regularization has the effect of forcing the factor matrices to have the equal norm. i.e.

[TABLE]

[29], Now let $\Psi$ represent the objective function in (3.4) collectively, then

[TABLE]

where

[TABLE]

Let $\omega=(A,B,C,\alpha)$ , when $B,C,\alpha$ are fix, we represent $f(\omega)$ by $f(A)$ and $\Psi(\omega)$ by $\Psi(A)$ .

4 Approximation of Tensor Decomposition with Tensor Rank

In this section we propose a block coordinate descent type algorithm for solving the problem (3.4). We consider four blocks of variables with respect to $A,B,C$ and $\alpha$ . In particular, at each inner iteration, we solve the following minimization problems

[TABLE]

and

[TABLE]

where $L_{f}^{\beta^{k}}(\alpha)$ represents the proximal linearization [3] of $f$ with respect to $\alpha$ , namely

[TABLE]

Note that each of the minimization problems in (4.1)-(4.4) is strictly convex, therefore $A,B,C,\alpha$ are uniquely determined at each iteration. In fact, the subproblems in (4.1)-(4.3) are standard liner least squares problems with an additional Tikhonov regularization term. One can see by vectorization of the objective functions, for instance, the residual term in (2.3) can be written as follows

[TABLE]

Since the objective functions in (4.1)-(4.3) are strictly convex, the first order optimality condition is sufficient for a point to be minimum. In other words, the exact solutions of (4.1)-(4.3) can be given be the following normal equations

[TABLE]

and

[TABLE]

where $E^{k}=\text{diag}(\alpha^{k})(C^{k}\odot B^{k})^{T}$ , $F^{k}=\text{diag}(\alpha^{k})(C^{k}\odot A^{k+1})^{T}$ and $G^{k}=\text{diag}(\alpha^{k})(B^{k+1}\odot A^{k+1})^{T}$ .

To update $\alpha$ in (4.4), we discuss the proximal operator first.

Definition 4.1

(proximal operator) Let $g:\mathbb{R}^{n}\to\mathbb{R}$ be a lower semicontinuous convex function, then the proximal operator of $g$ with parameter $\beta>0$ is defined as follow

[TABLE]

Using the proximal operator notation, the equation (4.4) is equivalent to

[TABLE]

This is easy to verify because

[TABLE]

Remark 4.2

The proximal operator in (4.5) is well-defined because the function $g(\alpha)$ is continuous and convex. Using the vec operator, we have

[TABLE]

where $M\in\mathbb{R}^{IJK\times R}$ is the matrix with columns $c_{r}\otimes b_{r}\otimes a_{r}$ . Therefore we can rewrite the objective function $f$ as

[TABLE]

It is easy to calculate the gradient of (4.9) with respect to $\alpha$ :

[TABLE]

This implies the Lipschitz continuity of the gradient of $f$ with respect to $\alpha$ . The Lipschitz constant is $Q_{\alpha}=\|M^{T}M\|$ so we must have

[TABLE]

5 Analysis of Convergence

in this section, we study the global convergence of the proposed algorithm under mild assumptions. The Kurdyka-Lojasiewicz [21], [26] property plays a key role in our analysis. We begin this section by stating the descent lemma.

Lemma 5.1

(Descent Lemma) Let $h:\mathbb{R}^{n}\to\mathbb{R}$ be continuously differentiable function, and $\nabla h$ is Lipschitz continuous with constand L, then for any $x,y\in\mathbb{R}^{n}$ we have

[TABLE]

Next lemma provides the theoretical estimate for the decrease in the objective function after a single update $\alpha$ .

Lemma 5.2

Suppose that $\alpha^{k+1}$ is obtained by the equation (4) and $0<\beta^{k}<1/Q_{\alpha}^{k}$ , where $Q_{\alpha}^{k}$ ’s are defined in (4.11), then there is a constant $N^{k}>0$ such that

[TABLE]

Proof. Recall that

[TABLE]

and $\alpha^{k+1}$ is obtained by the equation

[TABLE]

therefore we must have

[TABLE]

Since $\nabla_{\alpha}f$ is Lipschitz continuous with constant $Q_{\alpha}^{k}$ , by the descent lemma we have

[TABLE]

with (5.2), the above inequality implies

[TABLE]

setting

[TABLE]

proves the lemma. $\square$

Remark 5.3

Suppose that $Q_{\alpha}^{k}$ ’s are bounded from above by the constant $Q_{\alpha}$ in the previous lemma, then for fixed step-size $\beta$ where

[TABLE]

we have

[TABLE]

*for each $k=1,2,\ldots$ . *

Definition 5.4

[14]** A differentiable function $h:\mathbb{R}^{n}\to\mathbb{R}$ is called strongly convex if there is a constant $\mu>0$ such that

[TABLE]

for any $x,y\in\mathbb{R}^{n}$ .

Lemma 5.5

Suppose that $A^{k+1}$ is obtained by equation (4.1), then we have

[TABLE]

Proof. Note that the objective functions in (4.1) is strongly convex with parameter $\lambda$ and by the first-order optimality condition we must have

[TABLE]

now the strong convexity of $f+\|.\|_{F}^{2}$ yields

[TABLE]

which implies

[TABLE]

This proves the lemma. $\square$

Remark 5.6

Similar results hold for the blocks $B$ and $C$ , if they are updated by equations (4.2) and (4.3). In particular, we have that

The next theorem guarantees that the value of $\Psi$ decreases monotonically at each iteration. This shows that the sequence $\{\omega^{k}\}$ generated by scheme (4.1), (4.2), (4.3) and (4.4) is monotonically decreasing in value,

Theorem 5.7

(Sufficient decrease property) Let $\Psi$ represent the objective function in (ref) and $\omega^{k}=(A^{k},B^{k},C^{k},\alpha^{k})$ , then we have

[TABLE]

for some positive constant $\rho$ . In addition we have

[TABLE]

Proof. By lemmas 5.2 and 5.5 we have

[TABLE]

setting $\rho=\min\{\lambda/2,N_{\alpha}\}$ gives the first result. This shows that the sequence $\{\Psi(\omega^{k})\}$ generated by our algorithm is decreasing. The monotonicity of $\{\Psi(\omega^{k})\}$ with the fact that $\Psi$ is bounded from below, implies $\Psi(\omega^{k})\to\inf\Psi=\underline{\Psi}$ as $k\to\infty$ , next let $n>2$ be a positive integer, then

[TABLE]

letting $n\to\infty$ proves the last statement. $\square$

Remark 5.8

The sequence $\{\omega^{k}\}$ generated by the scheme (4.2)-(4.4) is bounded. The reason comes from the fact that unboundedness of $\{\omega^{k}\}$ occurs when at least one of the blocks $A,B,C$ or $\alpha$ gets unbounded. This never happens due to the regularization terms in the objective function $\Psi$ and the fact that $\Psi(\omega^{k})$ is non-increasing.

Theorem 5.9

Let $\{\omega^{k}\}_{k\in\mathbb{N}}$ be the sequence generated by our algorithm, then there exists a positive constant $\rho>0$ such that for any $k\in\mathbb{N}$ there is a vector $\eta^{k+1}\in\partial\Psi(\omega^{k+1})$ such that

[TABLE]

Proof. Let $k$ be a positive integer. By equations (4.1), (4.2), (4.3) and the first order optimality condition we have

[TABLE]

and

[TABLE]

define

[TABLE]

then $\eta_{1}^{k+1}=\nabla_{A}\Psi(\omega^{k+1})$ . similarly we can define vectors $\eta_{2}^{k+1}$ , $\eta_{3}^{k+1}$ . Next, by equation (4.4), we have that

[TABLE]

hence by the optimality condition, there exists $u\in\partial g(\alpha^{k+1})$ such that

[TABLE]

define

[TABLE]

so $\eta_{4}^{k+1}\in\partial\Psi_{\alpha}(\omega^{k+1})$ . From these facts we have that

[TABLE]

We now estimate the norm of $\eta^{k+1}$ . First note that by 5.8, $\{\omega^{k}\}$ is bounded and the objective function (without the $\ell_{1}$ regularization term) is twice continuously differentiable, therefore as a consequence of mean value theorem, $\nabla f$ must be Lipschitz continuous. Hence there must exist a constant $P_{1}$ such that

[TABLE]

similarly, constants $P_{2}$ and $P_{3}$ exist such that

[TABLE]

and

[TABLE]

setting $\nu=\max\{P_{1},P_{2},P_{3},P_{4}\}$ , gives us the result. $\square$

Let $f:\mathbb{R}^{n}\to\mathbb{R}$ be a continuous function. The function $f$ is said to have Kurdyka-Lojasiewicz (KL) property at point $\hat{x}\in\partial f$ if there exists $\theta\in[0,1)$ such that

[TABLE]

is bounded around $\hat{x}$ [44]. A very rich class of functions satisfying the KL property is the semi-algebraic functions. These are functions where their graphs can be expressed as an algebraic set, that is

[TABLE]

where $P_{ij}$ ’s and $Q_{ij}$ ’s are polynomial functions and the graph of $f$ is defined by

[TABLE]

Note that the univariate function $g(x)=|x|$ is semialgebraic because

[TABLE]

The class of semi algebraic functions are closed under addition and composition [2]. Hence The objective function in (3.5) is semialgebraic therefore it satisfies KL property.

Theorem 5.10

Suppose that $\{\omega^{k}\}_{k\in\mathbb{N}}$ is the sequence generated by our algorithm, then $\{\omega^{k}\}_{k\in\mathbb{N}}$ converges to the critical point of $\Psi$ .

6 Numerical Experiment and Results

In this section we test our algorithm on tensors with different rank and dimensions. We randomly generate tensors with specified ranks and compare the performance of our algorithm with other available algorithms such as LRAT [42]. Next, we apply our algorithm on single moving object videos in order to extract the background and target object.

6.1 Tensor Rank Approximation

In this subsection we test the performance of our algorithm on randomly generated cubic tensors with various dimensions and various rank. The upper bound for the rank of tensors are set to be equal to $\min\{IJ,JK,IK\}$ . The results are shown in TABLE I.

6.2 Comparison between LRAT and our algorithm

In this subsection, we compare the performance of our proposed algorithm to LRAT [42]. We generate a random cubic tensor $\mathcal{A}\in\mathbb{R}^{5\times 5\times 5}$ where its rank is equal to five. The comparison is based on the residual function as well as the sparsity of vector $\alpha$ . The upper bound for the rank of the tested tensor is set to be equal to ten for both algorithms.

6.3 Application in background extraction of single moving object videos

In this subsection we apply our algorithm to extract the background of videos. See Figure 1. The video example [4, 33] is a $48\times 48\times 51$ with rank $23$ tensor. The relative residual error of $\|\mathcal{X}-\sum_{r}^{R}\alpha_{r}a_{r}\circ b_{r}\circ c_{r}\|_{F}^{2}$ is $10^{-8}$ .

7 Conclusion

We presented the iterative algorithm for approximating tensor rank and CP decomposition based on a sparse optimization problem. Specifically, we apply a Tikhonov regularization method for finding the decomposition and a proximal algorithm for the tensor rank. We have also provided convergence analysis and numerical experiments on color images and videos. Overall, this new tensor sparse model and its computational method dramatically improve the accuracy and memory requirements.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Acar, C. A. Bingol, H. Bingol, R. Bro, and B. Yener, Multiway analysis of epilepsy tensors , Bioinformatics, 23 (13), pp. i 10-i 18, 2007.
2[2] H. Attouch, J. Bolte, B.F. Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. Ser. A 137 (2013), 91–129.
3[3] J. Bolte,S. Sabach, M. Teboulle, Proximal alternating linearized minimization for nonconvex and nonsmooth problems, Math. Program,vol. 146, pp. 459-494, (2014).
4[4] T. Bouwmans, A. Sobral, S. Javed and S.-K. Jung, and E.-H. Zahzah, Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset, Computer Science Review, 23 (2017), Page: 1-71
5[5] J. Brachat, P. Comon, B. Mourrain and E. Tsigaridas Symmetric Tensor Decomposition , Linear Algebra and Applications 433, 11-12 (2010), pp. 1851-1872.
6[6] J. Carrol and J. Chang. Analysis of Individual Differences in Multidimensional Scaling via an N 𝑁 N -way Generalization of “Eckart-Young” Decomposition. Psychometrika, 9, 267-283, 1970.
7[7] P. Comon, G. Golub, L-H. Lim and B. Mourrain. Symmetric tensors and symmetric tensor rank. SIAM Journal on Matrix Analysis and Applications, 30 (3), 1254-1279, 2008.
8[8] P. Comon , Tensor decompositions , in Mathematics in Signal Processing V, J. G. Mc Whirter and I. K. Proudler, eds., Clarendon Press, Oxford, UK, 2002, pp. 1-24.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Rank Approximation of a Tensor with Applications in Color Image and Video Processing

Abstract

1 Introduction

1.1 Organization

2 Preliminaries

Definition 2.1** (vec)**

Definition 2.2** (Mode-nnn matricization)**

2.1 CP decomposition and the Alternating Least-Squares Method

3 Rank Approximation of a Tensor

4 Approximation of Tensor Decomposition with Tensor Rank

Definition 4.1

Remark 4.2

5 Analysis of Convergence

Lemma 5.1

Lemma 5.2

Remark 5.3

Definition 5.4

Lemma 5.5

Remark 5.6

Theorem 5.7

Remark 5.8

Theorem 5.9

Theorem 5.10

6 Numerical Experiment and Results

6.1 Tensor Rank Approximation

6.2 Comparison between LRAT and our algorithm

6.3 Application in background extraction of single moving object videos

7 Conclusion

Definition 2.1 (vec)

Definition 2.2 (Mode- $n$ matricization)