On the compressibility of tensors
Tianyi Shi, Alex Townsend

TL;DR
This paper develops three methodologies to bound tensor compressibility based on algebraic structure, smoothness, and displacement structure, explaining why many tensors in applied mathematics are compressible.
Contribution
It introduces three new bounds on tensor compressibility that help explain the prevalence of compressible tensors in applied mathematics.
Findings
Solution tensor for Poisson equation can be approximated with O(n (log n)^2 (log(1/ε))^2) degrees of freedom.
Constructive bounds enable spectral solution of Poisson equation with O(n (log n)^3 (log(1/ε))^3) complexity.
Bound methods relate tensor structure to compressibility, aiding efficient tensor approximations.
Abstract
Tensors are often compressed by expressing them in low rank tensor formats. In this paper, we develop three methodologies that bound the compressibility of a tensor: (1) Algebraic structure, (2) Smoothness, and (3) Displacement structure. For each methodology, we derive bounds on storage costs that partially explain the abundance of compressible tensors in applied mathematics. For example, we show that the solution tensor of a discretized Poisson equation on with zero Dirichlet conditions can be approximated to a relative accuracy of in the Frobenius norm by a tensor in tensor-train format with degrees of freedom. As this bound is constructive, we are also able to solve this equation spectrally with $\mathcal{O}(n (\log n)^3…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTensor decomposition and applications · Algorithms and Data Compression
\headers
Numerical tensor ranksTianyi Shi and Alex Townsend
On the compressibility of tensors††thanks: Submitted to the editors .
\fundingThis work is supported by National Science Foundation grant no. 1818757.
Tianyi Shi Center for Applied Mathematics, Cornell University, Ithaca, NY 14853. () [email protected]
Alex Townsend Department of Mathematics, Cornell University, Ithaca, NY 14853. () [email protected]
Abstract
Tensors are often compressed by expressing them in low rank tensor formats. In this paper, we develop three methodologies that bound the compressibility of a tensor: (1) Algebraic structure, (2) Smoothness, and (3) Displacement structure. For each methodology, we derive bounds on storage costs that partially explain the abundance of compressible tensors in applied mathematics. For example, we show that the solution tensor of a discretized Poisson equation on with zero Dirichlet conditions can be approximated to a relative accuracy of in the Frobenius norm by a tensor in tensor-train format with degrees of freedom. As this bound is constructive, we are also able to solve this equation spectrally with complexity.
keywords:
numerical low rank, Tensor-Train, Multilinear, Canonical Polyadic, displacement
{AMS}
15A69, 65F99
1 Introduction
A wide variety of applications, such as approximation theory [30], continuum mechanics [10], differential equations [29, 33], and data analysis [36], lead to problems involving data or solutions that can be represented by tensors [32]. A general -order tensor has entries, which prevents it from being stored explicitly except for modest . It is often essential to represent or approximate tensors using sparse data formats, such as low rank tensor decompositions [15, 32]. However, the need for data sparse formats does not imply that such approximations are always mathematically possible. In this paper, we derive bounds on numerical tensor ranks for certain families of tensors, and in doing so, we partially justify the use of low rank tensor decompositions. Analogous theoretical results have already been derived that explicitly bound the numerical rank of matrices [4, 38, 45, 50].
The situation for tensors is more complicated than for matrices, and this is reflected in several distinct low rank tensor decompositions [32, 41]. Here, we consider three such decompositions: (a) Tensor-train decomposition (see Section 2.1), (b) Orthogonal Tucker decomposition (see Section 2.2), and (c) Canonical Polyadic (CP) decomposition (see Section 2.3). These three tensor decompositions supply three different definitions of tensor rank, and therefore each one requires separate attention.
For a given tensor , we are interested in developing a variety of tools to theoretically explain whether there exists a low rank tensor , in one or more of the tensor formats, such that
[TABLE]
where is an accuracy tolerance. If can be well-approximated by , then dramatic storage and computational benefits can be achieved by replacing by [15, 17]. We say that a tensor is compressible if it can be approximated by a low rank tensor, in the sense of (1) that can be represented in a relative small number of degrees of freedom. In order to compare different low rank tensor formats, we examine the number of degrees of freedom required to store an approximate tensor.
In this paper, we explore three methodologies to bound the compressibility of a tensor:
- •
Algebraic structures: If a tensor is constructed by sampling a multivariable function that can be expressed as a sum of products of single-variable functions, then that tensor is often compressible. Occasionally, one may have to perform algebraic manipulations to a function to explicitly reveal its desired structure, for example, by using trigonometric identities (see Section 3.1).
- •
Smoothness: If a tensor can be constructed by sampling a smooth function on a tensor-product grid, then that tensor is often compressible. This observation can be made rigorous by using the fact that smooth functions on compact domains can be well-approximated by polynomials [51, 17].
- •
Displacement structure: If a tensor satisfies a multidimensional Sylvester equation of the form:
[TABLE]
where ‘’ denotes the -mode matrix product of a tensor (see Eq. 3), then this—under additional assumptions—can ensure that the tensor is well-approximated by a low rank tensor. Multidimensional Sylvester equations such as Eq. 2 appear when discretizing certain partial differential equations with finite differences [35] and are satisfied by several classes of structured tensors [16]. For example, we show that the solution tensor to on can be represented up to a relative accuracy of in the Frobenius norm with just degrees of freedom in tensor-train format, despite the solution having weak corner singularities.
The first two methodologies are considered in [17], and the third methodology is related to the literature on exponential sums [14, 25]. In this manuscript, we formally provide bounds on the compressibility of such tensors and illustrate the methodologies with worked examples.
After some experience, one can successfully identify which methodology is likely to result in the best theoretical bounds on the compressibility of a tensor. We emphasize that these three methodologies provide upper bounds using numerical tensor ranks, and do not provide a complete characterization on the compressibility of tensors. Another approach that partially explains the abundance of tensors with small storage is artificial coordinate alignment [52], though we do not know how to use this observation to derive explicit bounds on tensor ranks.
1.1 Tensor notation
We follow as closely as possible the notation for tensors found in [32], which we briefly review now for the reader’s convenience.
The -mode product.
The -fold (or -mode) product of a tensor with a matrix is denoted by , and defined elementwise as
[TABLE]
It corresponds to each mode- fiber of being multiplied by the matrix .
Double bracket.
In the tensor literature, the double bracket denotes a mapping from the parametric space to the space of tensors. Specifically, it can be considered as a weighted sums of rank-1 tensors, i.e.,
[TABLE]
where is often referred to as the core tensor and is the -way outer-product of vectors [32].
Flattening by reshaping.
One can always reorganize the entries of a tensor into a matrix and this idea is fundamental to the tensor-train decomposition [41]. Conventionally, one reorganizes the entries so that the mode-1 fibers are stacked. This is equivalent to .111In MATLAB, the reshape command reorganizes the entries of a tensor. For example, if , then returns a matrix of size formed by stacking entries according to their multi-index. We call the th unfolding of .
Flattening via matricization.
Another way to flatten a tensor is called mode- matricization (or th matricization), which arranges the mode- fibers to be the columns of a matrix [31]. We denote the mode- matricization of a tensor by . It is easy to see that the first unfolding and the mode-1 matricization of a tensor are identical, i.e., . In this paper, for a tensor , matricizations are constructed so that there exists another tensor satisfying [8]
[TABLE]
1.2 Summary of paper
In the next section, we review three tensor decompositions, and in Section 3.1 we study the ranks of tensors that are constructed by sampling multivariate functions that have some algebraic structure. In Section 3.2, we consider the storage cost of tensors constructed by sampling smooth multivariate functions. Finally, in Section 4 we consider tensors that satisfy a multidimensional Sylvester equation, including a fast tensor Sylvester equation solver that exploits the compressibility of these tensors Section 4.6.
2 Three tensor decompositions
In this section, we review three tensor decompositions: (a) Tensor-train decomposition, (b) Orthogonal Tucker decomposition, and (c) CP decomposition. For each decomposition, and a given , we say a tensor is -compressible if there exists a tensor that can be represented with degrees of freedom and .
2.1 Tensor-train decomposition
The tensor-train decomposition is a generalization of the singular value decomposition (SVD) that can be computed by a sequence of matrix SVDs [41, 43]. A tensor has a tensor-train rank of at most , if there exists matrix-valued functions for such that
[TABLE]
This decomposition writes each entry of as a product of matrices, where the th matrix in the “train” of length is determined by . Since the product of the matrices must always be a scalar, we have . Each can be represented by an tensor so a tensor-train decomposition of rank at most requires degrees of freedom to store the format in memory. Figure 1 illustrates a tensor-train decomposition of rank at most .
Normally a tensor-train decomposition is constructed by separating out one dimension at a time, and compressing each dimension in turn [41]. For simplicity, the decomposition considered in this paper is performed in the order of dimension 1, dimension 2, and so on. In this way, the entries of the tensor-train rank are bounded from above by the ranks of matrices formed by flattening [41]. That is, for we have
[TABLE]
where is the rank of the matrix . Therefore, if the ranks of all the matrices for are small, then the tensor can be exactly represented in a data-sparse format as a tensor-train decomposition.
2.2 Orthogonal Tucker decomposition
The orthogonal Tucker decomposition is a factorization of a tensor into a set of matrices and a core tensor, where the matrices have orthonormal columns [21, 32, 8]. A tensor has a multilinear rank222The closely related Tucker rank of a tensor is also associated to the Tucker decomposition, except the matrices in Eq. 8 are not constrained to have orthonormal columns. Since multilinear ranks are more commonly used in applications, we do not consider Tucker ranks in this paper. of at most , if there are matrices with orthonormal columns and a core tensor such that
[TABLE]
Such a decomposition contains {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}p^{{\rm ML}}\leq}\sum_{k=1}^{d}n_{k}t_{k}+\prod_{k=1}^{d}t_{k} degrees of freedom, and can be computed by the so-called higher-order singular value decomposition [8].
2.3 Canonical Polyadic decomposition
The CP decomposition expresses a tensor as a sum of rank-1 tensors. A tensor is of rank at most , if there are matrices and a diagonal tensor that
[TABLE]
where the only nonzero entries of are for . If is omitted in this bracket notation, then by convention all the nonzero entries of are 1. This tensor decomposition can be stored using {\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}p^{{\rm CP}}\leq r+r\sum_{k=1}^{d}n_{k}} degrees of freedom, but the decomposition is NP-hard to compute for worst case examples [19]. The CP decomposition in Eq. 9 is similar to the orthogonal Tucker decomposition with two important differences: (1) The matrices in Eq. 9 do not need to have orthogonal columns and (2) The core tensor must be diagonal. This means that Eq. 9 is equivalent to expressing a tensor as a sum of rank-1 tensors.
Since we are aiming for upper bounds on the rank of a tensor to bound compressibility of a tensor in CP format, we can take any decomposition of the form in Eq. 9 with a potentially large , and see if its factor matrices are themselves low rank. For example, we find that [34, Lem. 1]:333Lemma 1 of [34] shows that the dimension of the vector space that spans the slices in the th index is equal to the rank of . The inequality in Eq. 10 follows from the extra assumption that the slices are themselves low rank tensors.
[TABLE]
where for . The bound in Eq. 10 is useful because it allows one to derive upper bounds on the rank of a tensor via bounds on the rank of factor matrices.
3 Tensors derived by sampling smooth functions
One often finds that tensors derived from sampling smooth functions are compressible, and we make this observation precise. Tensors derived from sampling multivariate functions have been considered in [26, 22] and analogous results for matrices are available in the literature [45, 49].
3.1 Tensors constructed via sampling algebraically structured functions
In practice, one often encounter tensors that are sampled from multivariate functions. For example, one can take a continuous function of three variables, , and sample on a tensor grid to obtain a tensor:
[TABLE]
where , , and are sets of points.
3.1.1 Polynomials and algebraic structure
One common scenario where it is easy to spot compressible tensors is when the tensor is sampled from a polynomial. To be specific, if a tensor is derived by sampling a multivariate polynomial of degree at most in the variable from a tensor-product grid, then one finds that is highly compressible.
Lemma 3.1**.**
Let be a polynomial of degree at most in the variable for , and let be the tensor constructed by sampling , i.e.,
[TABLE]
where are sets of nodes, respectively. Then,
- •
, where ,
- •
, and
- •
, where** .
*Here, the tensor-train decomposition is constructed in the order of .
Proof 3.2**.**
According to the degree assumptions on , we can write as
[TABLE]
where is a polynomial in the variables and for , has degree at most . After sampling, this means that and the bound on follows.
Another way to write is
[TABLE]
where is a polynomial in of degree at most in . After sampling, this shows that and the bound on follows.
Finally, separating out , we can also write as
[TABLE]
*where each term in Eq. 11 is a rank 1 tensor after sampling. We can do this to each variable and thus . The bound on follows. *
A special case of Lemma 3.1 is when the polynomial has maximal degree of at most so that .444We say that a polynomial has maximal degree if is a polynomial of degree at most in all the variables . We find that
- •
, where ,
- •
, and
- •
.
The important observation is that tensors constructed by sampling polynomials on a grid are highly compressible.
3.1.2 Other special cases of algebraic structure
Similar to multivariate polynomials, it is easy to spot—after some experience—the mathematical tensor ranks of tensors constructed by sampling functions that have explicit algebraic structure since each variable in the function can be thought as a fiber of the tensor. The easiest ones to spot are those tensors derived from functions that are the sums of products of single-variable functions, such as
[TABLE]
If and are tensors constructed by sampling and on a and tensor-product grid, respectively, then the storage costs in different formats are bounded by
[TABLE]
where the tensor-train decompositions are performed in the order . Other examples are functions that can be expressed with exponentials and powers, and similar examples have also been considered [40, 27].
Some special functions require reorganizations to reveal their algebraic structures. If the function is expressed with trigonometric functions, then the sampled tensor can often be low rank due to trigonometric identities. For example, consider the function that is a special case of the examples in [43, 6]. Since it can be written as
[TABLE]
any tensor constructed by sampling on a tensor-product grid satisfies
[TABLE]
These examples can often be combined to build more complicated functions that result in compressible tensors. This is an process and requires human ingenuity to express the sampled function in a revealing form. Again, tensors constructed by sampling such algebraically structured functions on a sufficiently large tensor-product grid can be represented using a small number of degrees of freedom.
3.2 Tensors derived by sampling smooth functions
Although most functions do not have the algebraic structure specified in Section 3.1, tensors that are constructed by sampling smooth functions are often well approximated by compressible tensors. In light of Lemma 3.1, our idea to understand the compressibility of a tensor derived from sampling a function is first to approximate that function by a multivariate polynomial, which is already a routine procedure for computing with low rank approximations to multivariate functions [18].
Without loss of generality, suppose that is formed by sampling a smooth function on a tensor-product grid in , i.e.,
[TABLE]
where are sets of nodes in . Our idea is to find a multivariate polynomial of degree in the variable that approximates in and then set . By Lemma 3.1, can be represented with a small number of degrees of freedom and is an approximation to . In particular, we have
[TABLE]
where denotes the supremum norm on and is the absolute maximum entry norm. Therefore, if is a good approximation to , then is a good approximation to too. Although the error bound is good for small , this approximation still suffers from the curse of dimensionality for large .
One can now propose any linear or nonlinear approximation scheme to find a polynomial approximation of on . Clearly, excellent bounds on are obtained by finding a so that
[TABLE]
where is the space of -variate polynomials of maximal degree in for . This best multivariable polynomial problem is often, but not always, tricky to solve directly. In those cases, near-optimal polynomial approximations are used instead. One common choice is to use as the multivariate Chebyshev projection of . That is,
[TABLE]
where the primes indicate that the first term in the sum is halved and is the Chebyshev polynomial of degree . Importantly, is a near-best polynomial approximation to [51], and the error can be bounded. Thus, this choice of leads to bounds on the compressibility of .
3.3 Worked examples
Here, we give two examples that illustrate how to understand the compressibility of tensors constructed by sampling smooth functions. We consider two functions: (1) A Fourier-like function, where we use best polynomial approximation, and (2) A sum of Gaussian bumps, where we use Chebyshev approximation.
3.3.1 Fourier-like function
Consider a tensor constructed by sampling the following Fourier-like function on a tensor-product grid [49]:
[TABLE]
where is a real parameter. While representing exactly requires degrees of freedom, it can be approximated by tensors that require fewer degrees of freedom (in the tensor-train and Tucker formats). To see this, let and be the best minimax polynomial approximations of degree to and on , respectively, and define . Note that has maximal degree at most so that
[TABLE]
where is the space of trivariate polynomials of maximal degree and the equality follows since if . Furthermore, we have and so
[TABLE]
By the equioscillation theorem [44, Thm. 7.4], for since equioscillates times in . Similarly, equioscillates times in and hence, for . However, for , decays super-geometrically to zero as . This also indicates that the error between the tensors sampled from and rapidly goes to 0 as . Hence, the numerical maximal degree, , of satisfies for some constant as . Lemma 3.1 shows that an approximant to only requires degrees of freedom. In particular, if is the second element of the tensor-train rank of an approximant tensor to the one sampled by , then as .
Figure 2 (left) plots the ratio of the second element of the tensor-train rank, , of a tensor sampled from the Fourier-like function and . We observe that as .
3.3.2 A sum of Gaussian bumps
Consider a tensor constructed by sampling a sum of Gaussian bumps, centered at arbitrary locations in , i.e.,
[TABLE]
Each Gaussian bump is a separable function of three variables so, mathematically, the tensor ranks of depend linearly on . However, since the sum is a smooth function, the ranks are related to the polynomial degree required to approximate in to an accuracy of . Hence, the tensor ranks of depend on and have very mild growth in in the storage costs.
Due to the symmetry in , , and as well as separability of each term in Eq. 14, we find that the Chebyshev approximation to can be bounded by
[TABLE]
where , , , and are Chebyshev approximations of degree to , , , and , respectively. An explicit Chebyshev expansion for is known and given by [37, p. 32]
[TABLE]
where the prime on the summation indicates that the first term is halved, and is the modified Bessel function of the first kind with parameter [39, (10.25.2)]. This means that one can show that [12, Lem. 5]:555Unfortunately, there is a typo in [12, Lem. 5] and should be replaced by .
[TABLE]
By Lemma 3.1 and Eq. 13, we can understand the compressibility of . In particular, we can find an approximant tensor whose tensor-train ranks are bounded by the smallest integer such that . We find it straightforward to visualize compressibility via elements of the tensor ranks and their bounds, due to the way storage costs are calculated. Figure 2 (right) shows the second element of the tensor-train rank, of the approximant tensor, along with the bound that we derived. The bounds are relatively tight when is small.
4 Tensors with displacement structure
We say that has an -displacement structure of if satisfies the multidimensional Sylvester equation
[TABLE]
where ‘’ is the -mode matrix product of a tensor. In this section, we show that when are normal matrices with “separated” spectra and is a low rank tensor, then is compressible. Several classes of structured tensors (e.g., the Hilbert tensor) and the solution tensors of certain discretized partial differential equations (e.g., the discretized solution to Poisson’s equation) have a displacement structure, which leads to an understanding of their compressibility.
4.1 Zolotarev numbers
The bounds that we derive on compressibility of tensors involve so-called Zolotarev numbers [1, 13, 53]. A Zolotarev number is a positive number between [math] and defined via an infimum problem involving rational functions [53]. Namely,
[TABLE]
where and are disjoint complex sets and is the set of irreducible rational functions of the form with polynomials and of degree at most . If and are well-separated, then one finds that decays rapidly with . This is because one can construct a low degree rational function that is small on and large on . If and are close to each other, then typically decreases much more slowly with .
Zolotarev numbers can be used to bound the singular values of matrices with displacement structure [4, Thm. 2.1]. In particular, if with satisfies the displacement structure
[TABLE]
where and are normal matrices with spectra and , then the singular values of satisfy [4, Thm. 2.1]
[TABLE]
Roughly speaking, if and are well-separated and is small, then the singular values decrease rapidly to [math].
When working with tensors, we translate the inequalities in Eq. 18 into Frobenius norm error bounds so that matrix results can then be utilized.
Lemma 4.1**.**
If is a matrix satisfying Eq. 17 and is the best rank approximation to , then
[TABLE]
*where denotes the matrix Frobenius norm.
Proof 4.2**.**
To simplify notation let , , for , and for . If , then and , so and the statement follows automatically. Now consider , note that for any we have
[TABLE]
where the inequalities come from the repeated application of the bound in Eq. 18. Therefore, we can bound by partitioning the singular values into groups of . That is,
[TABLE]
where the last equality is obtained by summing up the geometric series. Since , we find that
[TABLE]
*The result follows by rearranging. *
For , the numerical rank of measured in the Frobenius norm is the smallest integer, , such that
[TABLE]
We denote this integer by . From Lemma 4.1, we find that for matrices that satisfy Eq. 17, we have
[TABLE]
where is the smallest integer so that . Therefore, Zolotarev numbers are very useful when trying to bound the numerical rank of matrices with displacement structure. For example, for an Pick matrix constructed with real numbers from an inverval with , one can find that {\rm rank}_{\epsilon}(P_{n})\leq{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}2}\Bigl{\lceil}\log(4b/a)\log(4/\epsilon)/\pi^{2}\Bigr{\rceil} [4].
4.2 The compressibility of tensors with displacement structure in the tensor-train format
Zolotarev numbers can also be used to understand the compressibility of tensors satisfying (15). From the bounds in Eq. 7, one finds that the numerical ranks of each unfolding provides an upper bound on all entries of the tensor-train ranks of approximant tensors. More precisely, if is a tensor and , then there exists a tensor such that [41, Thm. 2.2]
[TABLE]
where \delta=\epsilon/\sqrt{d{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-1}} and is the th unfolding of . In order to easily relate tensor-train ranks with multilinear ranks in the next subsection, we choose to use .
If satisfies Eq. 15, then by rearranging Eq. 15 one can show that each unfolding matrix, , has a displacement structure. This is precisely , where is the th unfolding of and
[TABLE]
From properties of the Kronecker product [46, Thm 2.5], we know that and are normal matrices with and .666By we mean the Minkowski sum, formed by adding each element in to each element in , i.e.,
From Lemma 4.1 we see that for any integer such that , then
[TABLE]
Therefore, a necessary condition to bound the numerical tensor-train ranks of using this approach is that the spectra of are Minkowski sum separated.
Definition 4.3**.**
We say that normal matrices are Minkowski sum separated if there are disjoint sets and so that
[TABLE]
*where the set additions are Minkowski sums and denotes the spectrum of . *
Figure 3 illustrates three Minkowski sum separated matrices , and along with possible choices for the sets and for . We summarize our findings as a theorem.
Theorem 4.4**.**
Suppose satisfies Eq. 15, where are Minkowski sum separated with disjoint sets and for . Then, for a fixed , we have
[TABLE]
*where is the th unfolding of and is an integer so that . *
For special choices of and , explicit bounds on are known and therefore the bounds in Theorem 4.4 are also explicit. Here we mention two special cases:
Intervals
If for , then one can take and in Theorem 4.4. From [4, Cor. 4.2], we find that
[TABLE]
In particular, the following bound holds:
[TABLE]
where .
Disks
If for and , then one finds that and . From [47, p. 123], we find that
[TABLE]
where . In particular,
[TABLE]
where .
In Section 4.5, we use Eq. 21 to bound the numerical storage cost in tensor-train format of the Hilbert tensor and the solution tensor of a discretized Poisson equation.
4.3 The compressibility of tensors with displacement structure in the Tucker format
The matrix SVD can be used to calculate the numerical multilinear rank [32, 8]. Indeed, if is a tensor and , then there exists a tensor such that [8]:
[TABLE]
where and is the th matricization of .
Since the first unfolding of coincides with the first matricization of , the bound on the second element of the tensor-train rank is also a bound on the first element of the multilinear rank of . One can use a similar idea to bound all entries of the multilinear ranks by considering the various matricizations. However, one finds that the spectra of need to be separated in a slightly different sense.
Definition 4.5**.**
We say that normal matrices are Minkowski singly separated if there are disjoint sets and so that
[TABLE]
*where the set additions are Minkowski sums and denotes the spectrum of . *
Figure 4 illustrates the spectra of Minkowski singly separated matrices , , and along with their enclosed sets and Minkowski sums of the sets. Under this separation condition, we have the following theorem:
Theorem 4.6**.**
Suppose satisfies Eq. 15, where are Minkowski singly separated with disjoint sets and for . Then, for a fixed , we have
[TABLE]
*where is an integer so that . *
Proof 4.7**.**
One can bound all the entries of the multilinear rank vector of by the second entry of the tensor-train rank vector of the tensors (see Eq. 5). Due to the way is constructed, it can be shown that satisfies
[TABLE]
*where and is constructed from in the same way that is constructed from . The result follows from Theorem 4.4 as the th element of the multilinear rank of is bounded above by the bound of the second entry of the tensor-train rank of . *
As before, explicit bounds on the compressibility in Tucker format can be obtained from Theorem 4.6 by special choices of and such as when they are intervals or disks.
Intervals
If for , then one can take and . Therefore, we find that [4, Cor. 4.2]
[TABLE]
where and .
Disks
If for and , then one can take and . From [47, p. 123], we find that
[TABLE]
where , , and .
4.4 The compressibility of tensors with displacement structure in the CP format
While deriving the bounds in this paper, we also considered including bounds on the compressibility of tensors with displacement structure in CP format. We were unable to come up with nontrivial bounds unless we introduced several additional and arbitrary assumptions in the statements of our theorem.
4.5 Worked examples
Here, we give two examples that illustrate how to use the displacement structure of a tensor to understand its compressibility. Since the bounds in tensor-train format and Tucker format are related through ranks, we only show results for the tensor-train format. As in the previous examples, we use the second element of the tensor-train rank and its bound to visualize the compressibility. We consider two tensors: (1) The 3D Hilbert tensor and (2) The solution tensor of a Poisson equation.
4.5.1 The 3D Hilbert tensor
Consider the Hilbert tensor defined by
[TABLE]
This tensor is analogous to the notoriously ill-conditioned Hilbert matrix [20, 9]. It is easy to verify that the tensor possesses the following displacement structure:
[TABLE]
where is the tensor of all ones and is a diagonal matrix with . Thus, and the ranks of the unfoldings of are all 1.
Since the spectrum of is contained in , Eq. 21 tells us that for any we have
[TABLE]
That is, and means that the Hilbert tensor can be stored, up to an accuracy of in the Frobenius norm, in just degrees of freedom. Figure 5 (left) shows the compressibility of with by computing the ratio of the storage costs using tensor-train format and explicit storage. Our theoretical results bound the savings well. Figure 5 (right) shows the compressibility of by plotting and its bound in (23) for different values of . The actual tensor-train ranks of are computed with the TT-SVD algorithm [41].
4.5.2 Tensor solution of a discretized Poisson equation
Tensor decompositions can be incorporated into efficient solvers of partial differential equations [7, 42, 2, 48, 24, 28]. Displacement structure arises for the solution tensor when one discretizes a Laplace operator, or any Laplace-like operator. Here, consider the 3D Poisson equation on with zero Dirichlet conditions, i.e.,
[TABLE]
If one writes down a second-order finite difference discretization of Eq. 24 on an equispaced grid, then one obtains the following multidimensional Sylvester equation
[TABLE]
where and for . The solution tensor is unknown and for large , one hope that for is a reasonably good approximation. The eigenvalues of are given by for with [35, (2.23)]. Since for and , the eigenvalues of are contained in the interval .
We are interested in understanding the compressibility of in tensor-train format when . Since and , Eq. 21 gives
[TABLE]
Figure 6 (left) shows the second element of the tensor-train rank, , and the bound of the approximate solution tensor to the Poisson equation via finite difference discretization.
One wonders if there is also a fast Poisson solver for spectral discretizations. This turns out to be feasible with a carefully constructed ultraspherical spectral discretization. The Poisson equation can be discretized to a tensor equation as [11]:
[TABLE]
where
[TABLE]
[TABLE]
is the degree orthonormalized ultraspherical polynomial with parameter [39, Table 18.3.1], , , is a diagonal matrix, and are both symmetric pentadiagonal matrices, and the spectrum of satisfies . If , Eq. 21 gives
[TABLE]
Figure 6 (right) shows the second element of the tensor-train rank, , and the bound of the approximate solution tensor to the Poisson equation via ultraspherical spectral discretization. This spectral discretization indicates that the tensor discretization of the solution can be approximated with only degrees of freedom. This is a significant reduction in the cost of storing the solution, with a relatively straightforward decomposition. Comparatively, one can achieve with quantics tensor formats [27, 23], but those require more complicated representations.
Some special functions can be well-approximated by exponential sums of the form
[TABLE]
and these approximant can be used to represent the solution to PDEs with Laplace-like operators [14, 25]. In [25], the author uses exponential sums to show that the solution tensor to several 3D elliptic PDEs can be degrees of freedom. However, the constants in these compressibility statements are left implicit. In general, both exponential sum approximation and Zolotarev numbers can be used to bound the th singular value of matrices with displacement structure and capture the geometric decay, but the Zolotarev bound tends to be tighter and does not involve an algebraic factor related to [50].
4.6 Solving for tensors in compressed formats
Since the proof of Theorem 4.4 and Theorem 4.6 are constructive, we can use their implicit algorithms to solve 3D tensor Sylvester equations of the form:
[TABLE]
where , , , and . In particular, we can compute approximate solutions to Eq. 28 in tensor-train or Tucker format when is a low rank tensor and the spectra of , , and are well-separated. If , , and are Minkowski sum separated, and the unfoldings and of have low rank decompositions , and with rank and , respectively, then we can solve for in tensor-train format.
The tensor-train factors of obtained by the TT-SVD algorithm are orthogonal matrices for the column and row spaces of unfoldings of . For example, the first tensor-train factor of can be found as a matrix with orthonormal columns spanning the column space of the first unfolding . Since satisfies Eq. 28, we find that satisfies the Sylvester equation
[TABLE]
We can use the factored alternating direction implicit (fADI) method to solve Eq. 29 for a matrix such that [5]. One can then use the QR decomposition of , i.e., , to calculate the first tensor-train core .
Second and third tensor-train factors can be computed by finding matrices with orthonormal columns for the column and row spaces associated to , where . It can be shown that satisfies the Sylvester equation
[TABLE]
One can, again, use fADI to solve for a low rank decomposition of , i.e., . This low rank decomposition can be compressed by performing a QR factorization of and and then doing a SVD to obtain , where and are matrices with orthonormal columns and is a diagonal matrix. In this way, the second tensor-train factor is and the third factor . Although the fADI method requires the solution of shifted linear systems with , the Kronecker product structure allows one to reshape these linear systems into Sylvester equations, which can themselves be solved with the alternating direction implicit (ADI) method [5]. That is, one can completely avoid solving a huge linear system. As a result, if , and , , and have structures so that shifted linear systems can be solved in , then the solver has a complexity that is less than . In summary, the ADI-based tensor Sylvester equation solver is the following:
Similarly, if all matricizations of are low rank, and , , and are Minkowski singly separated, then we can solve for the solution in orthogonal Tucker format via the higher order singular value decomposition (HOSVD) method [8]. Each factor matrix of is a matrix with orthonormal columns that span the column space of the matricization of , which satisfies the Sylvester equation:
[TABLE]
where
[TABLE]
If solving shifted linear systems with , , and is fast, then we can use fADI to solve for the orthogonal column space of , and use a direct method, such as a 3D Bartels–Stewart algorithm to solve for the core tensor [3].
4.6.1 Poisson equation solver
Consider the example of Poisson equation in Section 4.5.2 with ultraspherical discretization Eq. 26. Since is a penta-diagonal matrix, we can solve shifted linear systems with in time. Therefore, we can obtain a fast Poisson equation solver that computes the solution in tensor-train or orthogonal Tucker format. The complexity for the tensor-train format solver is , where is the accuracy.
Figure 7 shows the running time of different discretized Poisson solvers. The red line represents the direct solver that converts (27) into a huge linear system via Kronecker product. The green line represents an eigen decomposition solver, which computes the eigen decomposition of to diagonalize the equation, and solves each element of directly by scaling. The blue line represents our fADI-based tensor-train solver. We can see as gets large, our algorithm is the winner.777The fADI solver is implemented in C++, while the direct and the eigen solvers are implemented in MATLAB. However, both backslash linear system solver and eigen decomposition are carried out in LAPACK, so our comparison of the three solvers is still fair. All timings are performed in MATLAB R2019a on the super computer of Cornell’s Math department.
Acknowledgements
We have had discussions with David Bindel, Dan Fortunato, Leslie Greengard, and Madeleine Udell about the results in this paper and appreciate their thoughts and comments. We are grateful to Nicolas Boulle and Heather Wilber for carefully reading an earlier draft of this manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. I. Akhiezer , Elements of the theory of elliptic functions , vol. 79, Amer. Math. Soc., 1990.
- 2[2] J. Ballani and L. Grasedyck , A projection method to solve linear systems in tensor format , Numer. Lin. Alg. Appl., 20 (2013), pp. 27–43.
- 3[3] R. H. Bartels and G. W. Stewart , Solution of the matrix equation ax+ xb= c [f 4] , Communications of the ACM, 15 (1972), pp. 820–826.
- 4[4] B. Beckermann and A. Townsend , On the singular values of matrices with displacement structure , SIAM J. Matrix Anal. Appl., 38 (2017), pp. 1227–1248.
- 5[5] P. Benner, R.-C. Li, and N. Truhar , On the adi method for sylvester equations , J. Comput. Appl. Math., 233 (2009), pp. 1035–1045.
- 6[6] G. Beylkin and M. J. Mohlenkamp , Numerical operator calculus in higher dimensions , Proceedings of the National Academy of Sciences, 99 (2002), pp. 10246–10251.
- 7[7] , Algorithms for numerical analysis in high dimensions , SIAM J. Sci. Comput., 26 (2005), pp. 2133–2159.
- 8[8] L. De Lathauwer, B. De Moor, and J. Vandewalle , A multilinear singular value decomposition , SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278.
