The Multi-Dimensional Decomposition with Constraints
Ilgis Ibragimov, Elena Ibragimova

TL;DR
This paper introduces a novel constrained multi-dimensional matrix decomposition method that simplifies the optimization problem, enabling efficient gradient computation and effective convergence in three-way decomposition tasks.
Contribution
It presents a new approach transforming a complex matrix approximation problem into a simpler one with fewer unknowns, improving computational efficiency and convergence.
Findings
Gradient computation complexity is only four times the function evaluation.
The new algorithm requires minimal additional memory.
Successful application to three-way decomposition with good convergence results.
Abstract
We search for the best fit in Frobenius norm of by a matrix product , where and , so , (,~ ) definite by some unknown parameters , and all partial derivatives of are definite, bounded and can be computed analytically. We show that this problem transforms to a new minimization problem with only unknowns, with analytical computation of gradient of minimized function by all . The complexity of computation of gradient is only 4 times bigger than the complexity of computation of the function, and this new algorithm needs only additional memory. We apply this approach for solution of the three-way decomposition problem and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research
THE MULTI-DIMENSIONAL DECOMPOSITION WITH CONSTRAINTS
Ilgis Ibragimov, Elena Ibragimova
Elegant Mathematics LLC, 82834 WY USA &
Elegant Mathematics Ltd, 66564 Germany
e-mail: [email protected]
ABSTRACT
We search for the best fit in the Frobenius norm of by a matrix product , where and , with so that B=\{b_{ij}\}_{\tiny\begin{tabular}[]{l}i=1, \ldots, m\\ j=1, \ldots, r\end{tabular}} is defined by some unknown parameters , , and all partial derivatives of are definite, bounded, and can be computed analytically.
We show that this problem transforms to a new minimization problem with only unknowns by the analytical computation of the gradient of the minimized function over all . The complexity of computation of this gradient is only 4 times greater than the complexity of computation of the function, and this new algorithm needs only additional words in memory.
We apply this approach for the solution of the three-way decomposition problem and obtain good result of convergence for the Broyden algorithm.
INTRODUCTION
Suppose we have . The idea is to find and , so
[TABLE]
that B=\{b_{ij}\}_{\tiny\begin{tabular}[]{l}i=1, \ldots, m\\ j=1, \ldots, r\end{tabular}} is defined by unknown parameters , , and all partial derivatives of are definite, bounded and can be computed analytically.
This problem occurs in statistics [1], nuclear magnetic resonance [2], spectroscopy and multi-dimensional decomposition [3]. Consider one popular application [4] — a low rank approximation of two and multidimansional data array with one factor matrix containing vectors formed as complex exponents:
[TABLE]
and
[TABLE]
Since the total amount of minimizing paramenters usually is several orders less than the total amount of minimizing paramenters in , it is highly desired to perform minimization over only to save computational complexity.
If we freeze , then this function is linear in , and . The problem (1) then turns into a new nonlinear problem with only unknowns:
[TABLE]
where contains the orthonormal subspace from .
The main difficulty in applying minimization methods for (4) is the computation of the gradient of the function over all . The finite difference method needs or computations of this function for one evaluation of the gradient and cannot be considered accurate. There is a good alternative for it, Baur-Strassen (BS) method [6], which allows computing the gradient of a function using only operations if the original function can be computed by simple arithmetical operations with no more than 2 operands. The big disadvantage of the BS method is its memory requirement: it needs words in memory, which is too many for most applications.
We suggest a new approach for computing the gradient of a function. This approach contains Modified Gramm–Schmidt (MGS) orthogonalization with low memory requirements and is based on the BS method.
ALGORITHM
To compute (4), we perform the following steps:
- 1)
create from ;
- 2)
compute orthonormal subspace in ;
- 3)
compute (4).
In this article, we discuss how to compute a gradient of (4) over all entries of . We will use both and for the same data. Let the dependence of on be so simple that one can compute the gradient of (4) by if is known.
Steps 2 and 3 need additional words in memory and compute within arithmetical operations in the event that the MGS algorithm is used for step . The BS algorithm can compute the gradient with the same order of arithmetical complexity but needs additional words in memory.
Let us consider a computation of (4) from . Let be the initial matrix and the orthonormal subspace, which we are going to compute. Then
,
,
Let’s construct a gradient of by . We will call the vector of derivatives — each -th element of this vector contains the derivative of the -th element of vector . Then there are the following formulas for the gradient:
,
,
We can write all these equations in matrix notation:
[TABLE]
[TABLE]
where is a lower block triangular matrix with block size . This matrix has blocks on the diagonal. The block matrix contains no more than 3 nonzero blocks in each row (see Fig. 1 for one example with vectors). We marked with the elements that we are not interested in; is the vector of the gradient of the original function over all . To compute , we solve the linear system (5). If we create and matrices, then we need at least words to solve .
[TABLE]
Figure 1 Shows the matrix when has vectors, here , , .
We suggest an improvement where we need only words to store some parts of and but still compute the solution. Let’s remark that in the loop for the variable , we update times vector . If we store matrices and , we can recompute all updates of from this loop for particular with additional arithmetical operations and store them in one additional array . Then, during backward substitution we recompute all updates of only when we need it. Obviously, this occurs times for all . All multiplications to matrices , , and need arithmetical operations. Thus, we need only , , , and arrays with size for this computation. Here is an algorithm:
, ,
,
, ,
Here we use the array with only elements for better performance.
The total arithmetical complexity of the computation of the gradient is operations. If we compare this with MGS (), it is less than 4 times greater.
We obtain similar results for the Gramm-Schmidt (not MGS) orthogonalization: it needs words in memory and works with operations, but because of stability issues we do not recommend using it.
NUMERICAL EXPERIMENTS
First we compare the general characteristics of our new approach with those of well-know approaches. We create the complex matrices and with random numbers, compute derivatives for different sizes of the problems by our new methods based on Gramm-Schmidt (AGS) and Modified Gramm-Schmidt (AMGS) algorithms, and compare our methods with the finite difference (FD) and Baur-Strassen (BS) methods (Tables 1, 2).
Furthermore, we show how those algorithms work. We perform a set of experiments and check the number of iterations for convergence of the Broyden method [7]. In this set of experiments, the matrix is a real matrix with , , where is the Kronecker product of vectors and are unknown vectors. We change and (Table 3). This problem occurs in the three-way decomposition [3, 5].
Hence, our new method (AMGS) is stable enough (like the BS method), yet also up to a thousand times faster than BS and FD methods and does not require much additional memory (only 4 times more than FD).
Table 1. Memory requirements (in words) for FD, BS, AGS, and AMGS methods.
[TABLE]
Table 2. Computational time of FD, BS, AGS, and AMGS methods.
[TABLE]
Table 3. The dependence of the total number of iterations in the Broyden method on the method of gradient computation and problem size for the first series of experiments ( is the total number of unknowns).
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Harshman R., Ladefoged P. and Goldstein L., Factor analysis of tongue shapes, J. Acoust. Soc. Am. , 1977 , 62:693.
- 2[2] Jaravine V, Ibraghimov I, Orekhov V. Removal of a time barrier for high-resolution multidimensional NMR spectroscopy. Nature , 2006 , 3:605–607.
- 3[3] Ibraghimov I., A new approach to solution of SVD-like approximation problem. ENUMATH 99 , 2000 , 548–555.
- 4[4] Ibragimova, Ibragimov. The ELEGANT NMR Spectrometer. ar Xiv:1706.00237, 2017 .
- 5[5] Ibraghimov I. Application of the three-way decomposition for matrix compression. Numer. Lin. Alg. Appl. , 2002 , 9:551–565.
- 6[6] Baur W., Strassen V., The complexity of partial derivatives. Theor. Comput. Sci. , 1983 , 22:317–330.
- 7[7] Dennis J.E., Schnabel R.B., Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall , 1983 .
