The Multi-Dimensional Decomposition with Constraints

Ilgis Ibragimov; Elena Ibragimova

arXiv:1701.08544·math.SP·June 6, 2017

The Multi-Dimensional Decomposition with Constraints

Ilgis Ibragimov, Elena Ibragimova

PDF

Open Access

TL;DR

This paper introduces a novel constrained multi-dimensional matrix decomposition method that simplifies the optimization problem, enabling efficient gradient computation and effective convergence in three-way decomposition tasks.

Contribution

It presents a new approach transforming a complex matrix approximation problem into a simpler one with fewer unknowns, improving computational efficiency and convergence.

Findings

01

Gradient computation complexity is only four times the function evaluation.

02

The new algorithm requires minimal additional memory.

03

Successful application to three-way decomposition with good convergence results.

Abstract

We search for the best fit in Frobenius norm of $A \in C^{m \times n}$ by a matrix product $B C^{*}$ , where $B \in C^{m \times r}$ and $C \in C^{n \times r}$ , $r \leq m$ so $B = {b_{ij}}$ , ( $i = 1, \dots, m$ ,~ $j = 1, \dots, r$ ) definite by some unknown parameters $σ_{1}, \dots, σ_{k}$ , $k << m r$ and all partial derivatives of $\frac{δ b _{ij}}{δ σ _{l}}$ are definite, bounded and can be computed analytically. We show that this problem transforms to a new minimization problem with only $k$ unknowns, with analytical computation of gradient of minimized function by all $σ$ . The complexity of computation of gradient is only 4 times bigger than the complexity of computation of the function, and this new algorithm needs only $3 m r$ additional memory. We apply this approach for solution of the three-way decomposition problem and…

Equations20

C, σ_{1}, \dots, σ_{k} min ∣∣ A - B (\overset{σ}{ˉ}) C^{*} ∣ ∣_{F}^{2},

C, σ_{1}, \dots, σ_{k} min ∣∣ A - B (\overset{σ}{ˉ}) C^{*} ∣ ∣_{F}^{2},

B, σ min j = 1 \sum J k = 1 \sum K a_{j k} - l = 1 \sum L b_{j l} e^{i σ_{kl}}_{2}^{2},

B, σ min j = 1 \sum J k = 1 \sum K a_{j k} - l = 1 \sum L b_{j l} e^{i σ_{kl}}_{2}^{2},

B, σ min j = 1 \sum J k = 1 \sum K a_{j k} - l = 1 \sum L b_{j l} e^{i σ_{kl}}_{2}^{2},

B, σ min j = 1 \sum J k = 1 \sum K a_{j k} - l = 1 \sum L b_{j l} e^{i σ_{kl}}_{2}^{2},

σ_{1}, \dots, σ_{k} min ∣∣ A - B (B^{*} B)^{- 1} B^{*} A ∣ ∣_{F} = σ_{1}, \dots, σ_{k} min ∣∣ A ∣ ∣_{F}^{2} - ∣∣ A^{*} Q (B) ∣ ∣_{F}^{2},

σ_{1}, \dots, σ_{k} min ∣∣ A - B (B^{*} B)^{- 1} B^{*} A ∣ ∣_{F} = σ_{1}, \dots, σ_{k} min ∣∣ A ∣ ∣_{F}^{2} - ∣∣ A^{*} Q (B) ∣ ∣_{F}^{2},

\left(\begin{tabular}[]{ccc}$I_{mr\times mr}$&0&0\\ $F_{mr\times\frac{mr(r+1)}{2}}$&$L_{\frac{mr(r+1)}{2}\times\frac{mr(r+1)}{2}}$&0\\ 0&$h_{\frac{mr(r+1)}{2}}^{*}$&1\\ \end{tabular}\right)\left(\begin{tabular}[]{c}$\frac{\delta}{\delta b_{ij}}$\\ $\hat{g}^{*}$\end{tabular}\right)=\left(\begin{tabular}[]{c}$I_{mr\times mr}$\\ 0\\ \end{tabular}\right)\hskip 28.45274pt{\rm or}

\left(\begin{tabular}[]{ccc}$I_{mr\times mr}$&0&0\\ $F_{mr\times\frac{mr(r+1)}{2}}$&$L_{\frac{mr(r+1)}{2}\times\frac{mr(r+1)}{2}}$&0\\ 0&$h_{\frac{mr(r+1)}{2}}^{*}$&1\\ \end{tabular}\right)\left(\begin{tabular}[]{c}$\frac{\delta}{\delta b_{ij}}$\\ $\hat{g}^{*}$\end{tabular}\right)=\left(\begin{tabular}[]{c}$I_{mr\times mr}$\\ 0\\ \end{tabular}\right)\hskip 28.45274pt{\rm or}

\left(\begin{tabular}[]{ccc}$I_{mr\times mr}$&$F_{mr\times\frac{mr(r+1)}{2}}^{*}$&0\\ 0&$L_{\frac{mr(r+1)}{2}\times\frac{mr(r+1)}{2}}^{*}$&$h_{\frac{mr(r+1)}{2}}$\\ 0&0&1\\ \end{tabular}\right)\left(\begin{tabular}[]{c}$\hat{g}$\\ $*$\\ $\vdots$\\ $*$\\ \end{tabular}\right)=\left(\begin{tabular}[]{c}$1$\\ $0$\\ $\vdots$\\ $0$\\ \end{tabular}\right),

\left(\begin{tabular}[]{ccc}$I_{mr\times mr}$&$F_{mr\times\frac{mr(r+1)}{2}}^{*}$&0\\ 0&$L_{\frac{mr(r+1)}{2}\times\frac{mr(r+1)}{2}}^{*}$&$h_{\frac{mr(r+1)}{2}}$\\ 0&0&1\\ \end{tabular}\right)\left(\begin{tabular}[]{c}$\hat{g}$\\ $*$\\ $\vdots$\\ $*$\\ \end{tabular}\right)=\left(\begin{tabular}[]{c}$1$\\ $0$\\ $\vdots$\\ $0$\\ \end{tabular}\right),

b_{1}

b_{1}

m

m

m

m

m

m

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research

Full text

THE MULTI-DIMENSIONAL DECOMPOSITION WITH CONSTRAINTS

Ilgis Ibragimov, Elena Ibragimova

Elegant Mathematics LLC, 82834 WY USA &

Elegant Mathematics Ltd, 66564 Germany

e-mail: [email protected]

ABSTRACT

We search for the best fit in the Frobenius norm of $A\in\mathbb{C}^{m\times n}$ by a matrix product $BC^{*}$ , where $B\in\mathbb{C}^{m\times r}$ and $C\in\mathbb{C}^{n\times r}$ , with $r\leq m$ so that $B=\{b_{ij}\}_{\tiny\begin{tabular}[]{l}i=1, \ldots, m\\ j=1, \ldots, r\end{tabular}}$ is defined by some unknown parameters $\sigma_{1},\dots,\sigma_{k}$ , $k<<mr$ , and all partial derivatives of $\displaystyle\frac{\delta b_{ij}}{\delta\sigma_{l}}$ are definite, bounded, and can be computed analytically.

We show that this problem transforms to a new minimization problem with only $k$ unknowns by the analytical computation of the gradient of the minimized function over all $\sigma$ . The complexity of computation of this gradient is only 4 times greater than the complexity of computation of the function, and this new algorithm needs only $3mr$ additional words in memory.

We apply this approach for the solution of the three-way decomposition problem and obtain good result of convergence for the Broyden algorithm.

INTRODUCTION

Suppose we have $A\in\mathbb{C}^{m\times n}$ . The idea is to find $B\in\mathbb{C}^{m\times r}$ and $C\in\mathbb{C}^{n\times r}$ , $r\leq m$ so

[TABLE]

that $B=\{b_{ij}\}_{\tiny\begin{tabular}[]{l}i=1, \ldots, m\\ j=1, \ldots, r\end{tabular}}$ is defined by unknown parameters $\sigma_{1},\dots,\sigma_{k}$ , $k<<mr$ , and all partial derivatives of $\displaystyle\frac{\delta b_{ij}}{\delta\sigma_{l}}$ are definite, bounded and can be computed analytically.

This problem occurs in statistics [1], nuclear magnetic resonance [2], spectroscopy and multi-dimensional decomposition [3]. Consider one popular application [4] — a low rank approximation of two and multidimansional data array with one factor matrix containing vectors formed as complex exponents:

[TABLE]

and

[TABLE]

Since the total amount of minimizing paramenters $\sigma$ usually is several orders less than the total amount of minimizing paramenters in $B$ , it is highly desired to perform minimization over only $\sigma$ to save computational complexity.

If we freeze $B$ , then this function is linear in $C$ , and $C=A^{*}B(B^{*}B)^{-1}$ . The problem (1) then turns into a new nonlinear problem with only $k$ unknowns:

[TABLE]

where $Q(B)\in\mathbb{C}^{m\times r}$ contains the orthonormal subspace from $B$ .

The main difficulty in applying minimization methods for (4) is the computation of the gradient of the function over all $\sigma$ . The finite difference method needs $k$ or $2k$ computations of this function for one evaluation of the gradient and cannot be considered accurate. There is a good alternative for it, Baur-Strassen (BS) method [6], which allows computing the gradient of a function using only $5n$ operations if the original function can be computed by $n$ simple arithmetical operations with no more than 2 operands. The big disadvantage of the BS method is its memory requirement: it needs ${\cal O}(n)$ words in memory, which is too many for most applications.

We suggest a new approach for computing the gradient of a function. This approach contains Modified Gramm–Schmidt (MGS) orthogonalization with low memory requirements and is based on the BS method.

ALGORITHM

To compute (4), we perform the following steps:

1)

create $B$ from $\sigma_{1},\dots,\sigma_{k}$ ;

2)

compute orthonormal subspace $Q$ in $B$ ;

3)

compute (4).

In this article, we discuss how to compute a gradient $\hat{g}\in\mathbb{C}^{mr}$ of (4) over all entries of $B$ . We will use both $G\in\mathbb{C}^{m\times r}$ and $\hat{g}$ for the same data. Let the dependence of $B$ on $\sigma_{1},\dots,\sigma_{k}$ be so simple that one can compute the gradient of (4) by $\sigma_{1},\dots,\sigma_{k}$ if $G$ is known.

Steps 2 and 3 need $mr$ additional words in memory and compute within $2mr(r+n)$ arithmetical operations in the event that the MGS algorithm is used for step $2$ . The BS algorithm can compute the gradient with the same order of arithmetical complexity but needs $4mr(r+n)$ additional words in memory.

Let us consider a computation of (4) from $B$ . Let $B=[b_{1},\dots,b_{k}]$ be the initial matrix and $Q=[q_{1},\dots,q_{k}]$ the orthonormal subspace, which we are going to compute. Then

$\displaystyle q_{1}=\frac{b_{1}}{||b_{1}||_{2}}$

$do$ $i=2$ , $r$

$u=b_{i},$

$do$ $j=1$ , $i-1$

$u=u-q_{j}q_{j}^{*}u$

$enddo$

$\displaystyle q_{i}=\frac{u}{||u||_{2}}$

$enddo$

$\displaystyle f=\sqrt{||A||_{F}^{2}-\sum_{i=1}^{r}||A^{*}q_{i}||_{2}^{2}}$

Let’s construct a gradient of $f$ by $B$ . We will call ${\bf d}y_{i}\in\mathbb{C}^{m}$ the vector of derivatives — each $k$ -th element of this vector contains the derivative of the $k$ -th element of vector $y_{i}$ . Then there are the following formulas for the gradient:

$\displaystyle{\bf d}q_{1}=\frac{1}{||b_{1}||_{2}}(I-q_{1}q_{1}^{*}){\bf d}b_{1}$

$do$ $i=2$ , $r$

$u=b_{i}$

$do$ $j=1$ , $i-1$

${\bf d}u_{new}=(I-q_{j}q_{j}^{*}){\bf d}u_{old}-(q_{j}^{*}u_{old}I+u_{old}q_{j}^{*}){\bf d}q_{j}$

$enddo$

$\displaystyle{\bf d}q_{i}=\frac{1}{||u||_{2}}(I-q_{i}q_{i}^{*}){\bf d}u$

$enddo$

$\displaystyle{\bf d}f=-\frac{1}{f}\sum_{i=1}^{r}q_{i}^{*}AA^{*}{\bf d}q_{i}$

We can write all these equations in matrix notation:

[TABLE]

where $L_{\frac{mr(r+1)}{2}\times\frac{mr(r+1)}{2}}$ is a lower block triangular matrix with block size $m\times m$ . This matrix has $I_{m\times m}$ blocks on the diagonal. The block matrix $[F,L]$ contains no more than 3 nonzero blocks in each row (see Fig. 1 for one example with $4$ vectors). We marked with $*$ the elements that we are not interested in; $\hat{g}$ is the vector of the gradient of the original function over all $b_{ij}$ . To compute $\hat{g}$ , we solve the linear system (5). If we create $L$ and $F$ matrices, then we need at least $mr(r+3)$ words to solve $L$ .

[TABLE]

Figure 1 Shows the matrix $[F,L]$ when $B$ has $4$ vectors, here $S=\frac{I-uu^{*}}{||u||_{2}}$ , $V=q^{*}uI+uq^{*}$ , $W=qq^{*}-I$ .

We suggest an improvement where we need only $4mr$ words to store some parts of $L$ and $F$ but still compute the solution. Let’s remark that in the loop for the variable $j$ , we update $(i-1)$ times vector $u$ . If we store matrices $B$ and $Q$ , we can recompute all updates of $u$ from this loop for particular $i$ with $2(i-1)M$ additional arithmetical operations and store them in one additional array $T\in\mathbb{C}^{m\times r}$ . Then, during backward substitution we recompute all updates of $u$ only when we need it. Obviously, this occurs $r-1$ times for all $i=r,\dots,2$ . All multiplications to matrices $S$ , $V$ , and $W$ need ${\cal O}(m)$ arithmetical operations. Thus, we need only $B$ , $Q$ , $T$ , and $G$ arrays with size $m\times r$ for this computation. Here is an algorithm:

$G=-\frac{1}{f}AA^{*}Q$

$do$ $i=r$ , $1$ , $-1$

$t_{1}=b_{i}$

$do$ $j=1$ , $i-1$

$z_{j}=q_{j}^{*}t_{j}$

$t_{j+1}=t_{j}-z_{j}q_{j}$

$enddo$

$\displaystyle g_{i}=\frac{g_{i}-q_{i}q_{i}^{*}g_{i}}{||t_{i}||_{2}}$

$do$ $j=i-1$ , $1$ , $-1$

$\alpha=q_{j}^{*}g_{i}$

$g_{j}=g_{j}-z_{j}^{*}g_{i}-\alpha^{*}t_{j}$

$g_{i}=g_{i}-\alpha q_{j}$

$enddo$

Here we use the $Z=(z_{1},\dots,z_{r})\in\mathbb{C}^{r}$ array with only $r$ elements for better performance.

The total arithmetical complexity of the computation of the gradient is $4mr(2r+n)$ operations. If we compare this with MGS ( $2mr(r+n)$ ), it is less than 4 times greater.

We obtain similar results for the Gramm-Schmidt (not MGS) orthogonalization: it needs $\displaystyle 3mr+\frac{r(r+1)}{2}$ words in memory and works with $2mr(3r+2n)$ operations, but because of stability issues we do not recommend using it.

NUMERICAL EXPERIMENTS

First we compare the general characteristics of our new approach with those of well-know approaches. We create the complex matrices $A$ and $B$ with random numbers, compute derivatives for different sizes of the problems by our new methods based on Gramm-Schmidt (AGS) and Modified Gramm-Schmidt (AMGS) algorithms, and compare our methods with the finite difference (FD) and Baur-Strassen (BS) methods (Tables 1, 2).

Furthermore, we show how those algorithms work. We perform a set of experiments and check the number of iterations for convergence of the Broyden method [7]. In this set of experiments, the matrix $B$ is a real matrix with $b_{i}=p_{i}\otimes q_{i}\in\mathbb{R}^{n^{2}}$ , $i=1,\dots,R$ , where $\otimes$ is the Kronecker product of vectors and $p_{i},q_{i}\in\mathbb{R}^{n}$ are unknown vectors. We change $n\in[2,20]$ and $r\in[2,20]$ (Table 3). This problem occurs in the three-way decomposition [3, 5].

Hence, our new method (AMGS) is stable enough (like the BS method), yet also up to a thousand times faster than BS and FD methods and does not require much additional memory (only 4 times more than FD).

Table 1. Memory requirements (in words) for FD, BS, AGS, and AMGS methods.

[TABLE]

Table 2. Computational time of FD, BS, AGS, and AMGS methods.

[TABLE]

Table 3. The dependence of the total number of iterations in the Broyden method on the method of gradient computation and problem size for the first series of experiments ( $N_{u}$ is the total number of unknowns).

[TABLE]

Bibliography7

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Harshman R., Ladefoged P. and Goldstein L., Factor analysis of tongue shapes, J. Acoust. Soc. Am. , 1977 , 62:693.
2[2] Jaravine V, Ibraghimov I, Orekhov V. Removal of a time barrier for high-resolution multidimensional NMR spectroscopy. Nature , 2006 , 3:605–607.
3[3] Ibraghimov I., A new approach to solution of SVD-like approximation problem. ENUMATH 99 , 2000 , 548–555.
4[4] Ibragimova, Ibragimov. The ELEGANT NMR Spectrometer. ar Xiv:1706.00237, 2017 .
5[5] Ibraghimov I. Application of the three-way decomposition for matrix compression. Numer. Lin. Alg. Appl. , 2002 , 9:551–565.
6[6] Baur W., Strassen V., The complexity of partial derivatives. Theor. Comput. Sci. , 1983 , 22:317–330.
7[7] Dennis J.E., Schnabel R.B., Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall , 1983 .