On the compressibility of tensors

Tianyi Shi; Alex Townsend

arXiv:1812.09576·math.NA·February 4, 2020·SIAM J. Matrix Anal. Appl.

On the compressibility of tensors

Tianyi Shi, Alex Townsend

PDF

Open Access

TL;DR

This paper develops three methodologies to bound tensor compressibility based on algebraic structure, smoothness, and displacement structure, explaining why many tensors in applied mathematics are compressible.

Contribution

It introduces three new bounds on tensor compressibility that help explain the prevalence of compressible tensors in applied mathematics.

Findings

01

Solution tensor for Poisson equation can be approximated with O(n (log n)^2 (log(1/ε))^2) degrees of freedom.

02

Constructive bounds enable spectral solution of Poisson equation with O(n (log n)^3 (log(1/ε))^3) complexity.

03

Bound methods relate tensor structure to compressibility, aiding efficient tensor approximations.

Abstract

Tensors are often compressed by expressing them in low rank tensor formats. In this paper, we develop three methodologies that bound the compressibility of a tensor: (1) Algebraic structure, (2) Smoothness, and (3) Displacement structure. For each methodology, we derive bounds on storage costs that partially explain the abundance of compressible tensors in applied mathematics. For example, we show that the solution tensor $X \in C^{n \times n \times n}$ of a discretized Poisson equation $- \nabla^{2} u = 1$ on $[- 1, 1]^{3}$ with zero Dirichlet conditions can be approximated to a relative accuracy of $0 < ϵ < 1$ in the Frobenius norm by a tensor in tensor-train format with $O (n (lo g n)^{2} (lo g (1/ ϵ))^{2})$ degrees of freedom. As this bound is constructive, we are also able to solve this equation spectrally with $\mathcal{O}(n (\log n)^3…

Equations148

∥ X - \tilde{X} ∥_{F} \leq ϵ ∥ X ∥_{F}, ∥ X ∥_{F}^{2} = i_{1} = 1 \sum n_{1} \dots i_{d} = 1 \sum n_{d} ∣ X_{i_{1}, \dots, i_{d}} ∣^{2},

∥ X - \tilde{X} ∥_{F} \leq ϵ ∥ X ∥_{F}, ∥ X ∥_{F}^{2} = i_{1} = 1 \sum n_{1} \dots i_{d} = 1 \sum n_{d} ∣ X_{i_{1}, \dots, i_{d}} ∣^{2},

X \times_{1} A^{(1)} + \dots + X \times_{d} A^{(d)} = G, A^{(k)} \in C^{n_{k} \times n_{k}}, G \in C^{n_{1} \times \dots \times n_{d}},

X \times_{1} A^{(1)} + \dots + X \times_{d} A^{(d)} = G, A^{(k)} \in C^{n_{k} \times n_{k}}, G \in C^{n_{1} \times \dots \times n_{d}},

(X \times_{k} A)_{i_{1}, \dots, i_{k - 1}, j, i_{k + 1}, \dots, i_{d}} = i_{k} = 1 \sum n_{k} X_{i_{1}, \dots, i_{d}} A_{j, i_{k}} .

(X \times_{k} A)_{i_{1}, \dots, i_{k - 1}, j, i_{k + 1}, \dots, i_{d}} = i_{k} = 1 \sum n_{k} X_{i_{1}, \dots, i_{d}} A_{j, i_{k}} .

[[G; A^{(1)}, \dots, A^{(d)}]] = i_{1} = 1 \sum r_{1} \dots i_{d} = 1 \sum r_{d} G_{i_{1}, \dots, i_{d}} A_{i_{1}}^{(1)} \circ \dots \circ A_{i_{d}}^{(d)}, A^{(k)} \in C^{n_{k} \times r_{k}},

[[G; A^{(1)}, \dots, A^{(d)}]] = i_{1} = 1 \sum r_{1} \dots i_{d} = 1 \sum r_{d} G_{i_{1}, \dots, i_{d}} A_{i_{1}}^{(1)} \circ \dots \circ A_{i_{d}}^{(d)}, A^{(k)} \in C^{n_{k} \times r_{k}},

Y_{(1)}^{j} = X_{(j)}, \dots, Y_{(d - j + 1)}^{j} = X_{(d)}, Y_{(d - j + 2)}^{j} = X_{(1)}, \dots, Y_{(d)}^{j} = X_{(j - 1)} .

Y_{(1)}^{j} = X_{(j)}, \dots, Y_{(d - j + 1)}^{j} = X_{(d)}, Y_{(d - j + 2)}^{j} = X_{(1)}, \dots, Y_{(d)}^{j} = X_{(j - 1)} .

X_{i_{1}, \dots, i_{d}} = G_{1} (i_{1}) G_{2} (i_{2}) \dots G_{d} (i_{d}), 1 \leq i_{k} \leq n_{k} .

X_{i_{1}, \dots, i_{d}} = G_{1} (i_{1}) G_{2} (i_{2}) \dots G_{d} (i_{d}), 1 \leq i_{k} \leq n_{k} .

s_{k} \leq rank (X_{k}), X_{k} = reshape (X, s = 1 \prod k n_{s}, s = k + 1 \prod d n_{s}),

s_{k} \leq rank (X_{k}), X_{k} = reshape (X, s = 1 \prod k n_{s}, s = k + 1 \prod d n_{s}),

X = [[G; A^{(1)}, \dots, A^{(d)}]], A^{(k)} \in C^{n_{k} \times t_{k}} .

X = [[G; A^{(1)}, \dots, A^{(d)}]], A^{(k)} \in C^{n_{k} \times t_{k}} .

X = [[D; A^{(1)}, \dots, A^{(d)}]], A^{(k)} \in C^{n_{k} \times r}, D \in C^{r \times \dots \times r},

X = [[D; A^{(1)}, \dots, A^{(d)}]], A^{(k)} \in C^{n_{k} \times r}, D \in C^{r \times \dots \times r},

rank (X) \leq 1 \leq j \leq d min \frac{1}{r _{j}} i = 1 \prod d r_{i},

rank (X) \leq 1 \leq j \leq d min \frac{1}{r _{j}} i = 1 \prod d r_{i},

X_{ij k} = f (x_{i}, y_{j}, z_{k}), 1 \leq i, j, k \leq n,

X_{ij k} = f (x_{i}, y_{j}, z_{k}), 1 \leq i, j, k \leq n,

X_{i_{1}, \dots, i_{d}} = p (x_{i_{1}}^{(1)}, \dots, x_{i_{d}}^{(d)}), 1 \leq i_{j} \leq n_{j}, 1 \leq j \leq d,

X_{i_{1}, \dots, i_{d}} = p (x_{i_{1}}^{(1)}, \dots, x_{i_{d}}^{(d)}), 1 \leq i_{j} \leq n_{j}, 1 \leq j \leq d,

p (x_{1}, \dots, x_{d}) = q_{1} = 0 \sum N_{1} - 1 \dots q_{k} = 0 \sum N_{k} - 1 a_{q_{1}, \dots, q_{k}} (x_{k + 1}, \dots, x_{d}) x_{1}^{q_{1}} \dots x_{k}^{q_{k}}, 1 \leq k \leq d,

p (x_{1}, \dots, x_{d}) = q_{1} = 0 \sum N_{1} - 1 \dots q_{k} = 0 \sum N_{k} - 1 a_{q_{1}, \dots, q_{k}} (x_{k + 1}, \dots, x_{d}) x_{1}^{q_{1}} \dots x_{k}^{q_{k}}, 1 \leq k \leq d,

p (x_{1}, \dots, x_{d}) = j = 0 \sum N_{k} - 1 b_{j} (x_{1}, \dots, x_{k - 1}, x_{k + 1}, \dots, x_{d}) x_{k}^{j}, 1 \leq k \leq d,

p (x_{1}, \dots, x_{d}) = j = 0 \sum N_{k} - 1 b_{j} (x_{1}, \dots, x_{k - 1}, x_{k + 1}, \dots, x_{d}) x_{k}^{j}, 1 \leq k \leq d,

p (x_{1}, \dots, x_{d}) = q_{1} = 0 \sum N_{1} - 1 \dots q_{d - 1} = 0 \sum N_{d - 1} - 1 c_{q_{1}, \dots, q_{d - 1}} (x_{d}) x_{1}^{q_{1}} \dots x_{d - 1}^{q_{d - 1}},

p (x_{1}, \dots, x_{d}) = q_{1} = 0 \sum N_{1} - 1 \dots q_{d - 1} = 0 \sum N_{d - 1} - 1 c_{q_{1}, \dots, q_{d - 1}} (x_{d}) x_{1}^{q_{1}} \dots x_{d - 1}^{q_{d - 1}},

f (x, y, z) = 1 + tan (x) y + y^{2} z^{3}, g (x, y, z, w) = cos (x) sin (y) + e^{10 z} e^{100 w} .

f (x, y, z) = 1 + tan (x) y + y^{2} z^{3}, g (x, y, z, w) = cos (x) sin (y) + e^{10 z} e^{100 w} .

p^{TT} (F) \leq 8 n,

p^{TT} (F) \leq 8 n,

p^{TT} (G) \leq 12 n,

f (x, y, z) = (cos (x) cos (y) - sin (x) sin (y)) cos (z) - (sin (x) cos (y) + cos (x) sin (y)) sin (z),

f (x, y, z) = (cos (x) cos (y) - sin (x) sin (y)) cos (z) - (sin (x) cos (y) + cos (x) sin (y)) sin (z),

p^{TT} (F) \leq 8 n, p^{ML} (F) \leq 6 n + 8, p^{CP} (F) \leq 12 n + 4.

p^{TT} (F) \leq 8 n, p^{ML} (F) \leq 6 n + 8, p^{CP} (F) \leq 12 n + 4.

\mathcal{X}_{i_{1},\ldots,i_{d}}=f(x_{i_{1}}^{(1)},\dots,x_{i_{d}}^{(d)}),\qquad{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}1\leq i_{k}\leq n_{k},\qquad 1\leq k\leq d},

\mathcal{X}_{i_{1},\ldots,i_{d}}=f(x_{i_{1}}^{(1)},\dots,x_{i_{d}}^{(d)}),\qquad{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}1\leq i_{k}\leq n_{k},\qquad 1\leq k\leq d},

∥ X - Y ∥_{F} \leq (i = 1 \prod d n_{i})^{\frac{1}{2}} ∥ X - Y ∥_{max} \leq (i = 1 \prod d n_{i})^{\frac{1}{2}} ∥ f - p_{N} ∥_{\infty},

∥ X - Y ∥_{F} \leq (i = 1 \prod d n_{i})^{\frac{1}{2}} ∥ X - Y ∥_{max} \leq (i = 1 \prod d n_{i})^{\frac{1}{2}} ∥ f - p_{N} ∥_{\infty},

∥ f - p ∥_{\infty} \approx q \in P_{N_{1}, \dots, N_{d}} in f ∥ f - q ∥_{\infty},

∥ f - p ∥_{\infty} \approx q \in P_{N_{1}, \dots, N_{d}} in f ∥ f - q ∥_{\infty},

p_{N_{1}, \dots, N_{d}}^{cheb} (x_{1}, \dots, x_{d})

p_{N_{1}, \dots, N_{d}}^{cheb} (x_{1}, \dots, x_{d})

c_{i_{1}, \dots, i_{d}}

f (x, y, z) = e^{i M π x y z}, x, y, z \in [- 1, 1],

f (x, y, z) = e^{i M π x y z}, x, y, z \in [- 1, 1],

w_{k - 1} \in P_{k - 1} in f x, y, z \in [- 1, 1] sup e^{i M π x y z} - w_{k - 1} (x, y, z)

w_{k - 1} \in P_{k - 1} in f x, y, z \in [- 1, 1] sup e^{i M π x y z} - w_{k - 1} (x, y, z)

= t \in [- 1, 1] sup e^{i M π t} - h_{k - 1} (t),

t \in [- 1, 1] sup e^{i M π t} - h_{k - 1} (t) \leq t \in [- 1, 1] sup cos (M π t) - p_{k - 1}^{best} (t) + t \in [- 1, 1] sup sin (M π t) - q_{k - 1}^{best} (t) .

t \in [- 1, 1] sup e^{i M π t} - h_{k - 1} (t) \leq t \in [- 1, 1] sup cos (M π t) - p_{k - 1}^{best} (t) + t \in [- 1, 1] sup sin (M π t) - q_{k - 1}^{best} (t) .

f (x, y, z) = j = 1 \sum M e^{- γ ((x - x_{j})^{2} + (y - y_{j})^{2} + (z - z_{j})^{2})}, γ > 0.

f (x, y, z) = j = 1 \sum M e^{- γ ((x - x_{j})^{2} + (y - y_{j})^{2} + (z - z_{j})^{2})}, γ > 0.

x, y, z \in [- 1, 1] sup f (x, y, z) - j = 1 \sum M p_{ℓ}^{j} (x) q_{ℓ}^{j} (y) r_{ℓ}^{j} (z) \leq 3 M x \in [- 1, 1] sup e^{- γ x^{2}} - h_{ℓ} (x),

x, y, z \in [- 1, 1] sup f (x, y, z) - j = 1 \sum M p_{ℓ}^{j} (x) q_{ℓ}^{j} (y) r_{ℓ}^{j} (z) \leq 3 M x \in [- 1, 1] sup e^{- γ x^{2}} - h_{ℓ} (x),

e^{- γ x^{2}} = j = 0 \sum \infty^{^{'}} (- 1)^{j} e^{- γ /2} I_{j} (γ /2) T_{2 j} (x),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Algorithms and Data Compression

Full text

\headers

Numerical tensor ranksTianyi Shi and Alex Townsend

On the compressibility of tensors††thanks: Submitted to the editors .

\fundingThis work is supported by National Science Foundation grant no. 1818757.

Tianyi Shi Center for Applied Mathematics, Cornell University, Ithaca, NY 14853. () [email protected]

Alex Townsend Department of Mathematics, Cornell University, Ithaca, NY 14853. () [email protected]

Abstract

Tensors are often compressed by expressing them in low rank tensor formats. In this paper, we develop three methodologies that bound the compressibility of a tensor: (1) Algebraic structure, (2) Smoothness, and (3) Displacement structure. For each methodology, we derive bounds on storage costs that partially explain the abundance of compressible tensors in applied mathematics. For example, we show that the solution tensor $\mathcal{X}\in\mathbb{C}^{n\times n\times n}$ of a discretized Poisson equation $-\nabla^{2}u=1$ on $[-1,1]^{3}$ with zero Dirichlet conditions can be approximated to a relative accuracy of $0<\epsilon<1$ in the Frobenius norm by a tensor in tensor-train format with $\mathcal{O}(n(\log n)^{2}(\log(1/\epsilon))^{2})$ degrees of freedom. As this bound is constructive, we are also able to solve this equation spectrally with $\mathcal{O}(n(\log n)^{3}(\log(1/\epsilon))^{3})$ complexity.

keywords:

numerical low rank, Tensor-Train, Multilinear, Canonical Polyadic, displacement

{AMS}

15A69, 65F99

1 Introduction

A wide variety of applications, such as approximation theory [30], continuum mechanics [10], differential equations [29, 33], and data analysis [36], lead to problems involving data or solutions that can be represented by tensors [32]. A general $d$ -order tensor $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ has $\prod_{k=1}^{d}n_{k}$ entries, which prevents it from being stored explicitly except for modest $d$ . It is often essential to represent or approximate tensors using sparse data formats, such as low rank tensor decompositions [15, 32]. However, the need for data sparse formats does not imply that such approximations are always mathematically possible. In this paper, we derive bounds on numerical tensor ranks for certain families of tensors, and in doing so, we partially justify the use of low rank tensor decompositions. Analogous theoretical results have already been derived that explicitly bound the numerical rank of matrices [4, 38, 45, 50].

The situation for tensors is more complicated than for matrices, and this is reflected in several distinct low rank tensor decompositions [32, 41]. Here, we consider three such decompositions: (a) Tensor-train decomposition (see Section 2.1), (b) Orthogonal Tucker decomposition (see Section 2.2), and (c) Canonical Polyadic (CP) decomposition (see Section 2.3). These three tensor decompositions supply three different definitions of tensor rank, and therefore each one requires separate attention.

For a given tensor $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ , we are interested in developing a variety of tools to theoretically explain whether there exists a low rank tensor $\tilde{\mathcal{X}}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ , in one or more of the tensor formats, such that

[TABLE]

where $0\leq\epsilon<1$ is an accuracy tolerance. If $\mathcal{X}$ can be well-approximated by $\tilde{\mathcal{X}}$ , then dramatic storage and computational benefits can be achieved by replacing $\mathcal{X}$ by $\tilde{\mathcal{X}}$ [15, 17]. We say that a tensor is compressible if it can be approximated by a low rank tensor, in the sense of (1) that can be represented in a relative small number of degrees of freedom. In order to compare different low rank tensor formats, we examine the number of degrees of freedom required to store an approximate tensor.

In this paper, we explore three methodologies to bound the compressibility of a tensor:

•

Algebraic structures: If a tensor is constructed by sampling a multivariable function that can be expressed as a sum of products of single-variable functions, then that tensor is often compressible. Occasionally, one may have to perform algebraic manipulations to a function to explicitly reveal its desired structure, for example, by using trigonometric identities (see Section 3.1).

•

Smoothness: If a tensor can be constructed by sampling a smooth function on a tensor-product grid, then that tensor is often compressible. This observation can be made rigorous by using the fact that smooth functions on compact domains can be well-approximated by polynomials [51, 17].

•

Displacement structure: If a tensor $\mathcal{X}$ satisfies a multidimensional Sylvester equation of the form:

[TABLE]

where ‘ $\times_{k}$ ’ denotes the $k$ -mode matrix product of a tensor (see Eq. 3), then this—under additional assumptions—can ensure that the tensor $\mathcal{X}$ is well-approximated by a low rank tensor. Multidimensional Sylvester equations such as Eq. 2 appear when discretizing certain partial differential equations with finite differences [35] and are satisfied by several classes of structured tensors [16]. For example, we show that the solution tensor $\mathcal{X}\in\mathbb{C}^{n\times n\times n}$ to $-\nabla^{2}u=1$ on $[-1,1]^{3}$ can be represented up to a relative accuracy of $0<\epsilon<1$ in the Frobenius norm with just $\mathcal{O}(n(\log n)^{2}(\log(1/\epsilon))^{2})$ degrees of freedom in tensor-train format, despite the solution having weak corner singularities.

The first two methodologies are considered in [17], and the third methodology is related to the literature on exponential sums [14, 25]. In this manuscript, we formally provide bounds on the compressibility of such tensors and illustrate the methodologies with worked examples.

After some experience, one can successfully identify which methodology is likely to result in the best theoretical bounds on the compressibility of a tensor. We emphasize that these three methodologies provide upper bounds using numerical tensor ranks, and do not provide a complete characterization on the compressibility of tensors. Another approach that partially explains the abundance of tensors with small storage is artificial coordinate alignment [52], though we do not know how to use this observation to derive explicit bounds on tensor ranks.

1.1 Tensor notation

We follow as closely as possible the notation for tensors found in [32], which we briefly review now for the reader’s convenience.

The $k$ -mode product.

The $k$ -fold (or $k$ -mode) product of a tensor $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ with a matrix $A\in\mathbb{C}^{n_{k}\times n_{k}}$ is denoted by $\mathcal{X}\times_{k}A$ , and defined elementwise as

[TABLE]

It corresponds to each mode- $k$ fiber of $X$ being multiplied by the matrix $A$ .

Double bracket.

In the tensor literature, the double bracket denotes a mapping from the parametric space to the space of tensors. Specifically, it can be considered as a weighted sums of rank-1 tensors, i.e.,

[TABLE]

where $\mathcal{G}\in\mathbb{C}^{r_{1}\times\cdots\times r_{d}}$ is often referred to as the core tensor and $v_{1}\circ\cdots\circ v_{d}$ is the $d$ -way outer-product of vectors [32].

Flattening by reshaping.

One can always reorganize the entries of a tensor into a matrix and this idea is fundamental to the tensor-train decomposition [41]. Conventionally, one reorganizes the entries so that the mode-1 fibers are stacked. This is equivalent to $X_{k}={\rm reshape}(\mathcal{X},\prod_{s=1}^{k}n_{s},\prod_{s=k+1}^{d}n_{s})$ .111In MATLAB, the reshape command reorganizes the entries of a tensor. For example, if $\mathcal{X}\in\mathbb{C}^{n_{1}\times\dots\times n_{d}}$ , then ${\rm reshape}(\mathcal{X},\prod_{s=1}^{k}n_{s},\prod_{s=k+1}^{d}n_{s})$ returns a matrix of size $(\prod_{s=1}^{k}n_{s})\times(\prod_{s=k+1}^{d}n_{s})$ formed by stacking entries according to their multi-index. We call $X_{k}$ the $k$ th unfolding of $\mathcal{X}$ .

Flattening via matricization.

Another way to flatten a tensor is called mode- $n$ matricization (or $n$ th matricization), which arranges the mode- $n$ fibers to be the columns of a matrix [31]. We denote the mode- $n$ matricization of a tensor $\mathcal{X}$ by $X_{(n)}$ . It is easy to see that the first unfolding and the mode-1 matricization of a tensor are identical, i.e., $X_{(1)}=X_{1}$ . In this paper, for a tensor $\mathcal{X}$ , matricizations are constructed so that there exists another tensor $\mathcal{Y}^{j}$ satisfying [8]

[TABLE]

1.2 Summary of paper

In the next section, we review three tensor decompositions, and in Section 3.1 we study the ranks of tensors that are constructed by sampling multivariate functions that have some algebraic structure. In Section 3.2, we consider the storage cost of tensors constructed by sampling smooth multivariate functions. Finally, in Section 4 we consider tensors that satisfy a multidimensional Sylvester equation, including a fast tensor Sylvester equation solver that exploits the compressibility of these tensors Section 4.6.

2 Three tensor decompositions

In this section, we review three tensor decompositions: (a) Tensor-train decomposition, (b) Orthogonal Tucker decomposition, and (c) CP decomposition. For each decomposition, and a given $0<\epsilon<1$ , we say a tensor $\mathcal{X}$ is $p$ -compressible if there exists a tensor $\mathcal{\tilde{X}}$ that can be represented with $p$ degrees of freedom and $||\mathcal{X}-\mathcal{\tilde{X}}||_{F}\leq\epsilon||\mathcal{X}||_{F}$ .

2.1 Tensor-train decomposition

The tensor-train decomposition is a generalization of the singular value decomposition (SVD) that can be computed by a sequence of matrix SVDs [41, 43]. A tensor $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ has a tensor-train rank of at most $\boldsymbol{s}=(s_{0},\ldots,s_{d})$ , if there exists matrix-valued functions $G_{k}:\{1,\ldots,n_{k}\}\mapsto\mathbb{C}^{s_{k-1}\times s_{k}}$ for $1\leq k\leq d$ such that

[TABLE]

This decomposition writes each entry of $\mathcal{X}$ as a product of matrices, where the $k$ th matrix in the “train” of length $d$ is determined by $i_{k}$ . Since the product of the matrices must always be a scalar, we have $s_{0}=s_{d}=1$ . Each $G_{k}$ can be represented by an $s_{k-1}\times n_{k}\times s_{k}$ tensor so a tensor-train decomposition of rank at most $\boldsymbol{s}$ requires $p^{{\rm TT}}\leq\sum_{k=1}^{d}s_{k-1}s_{k}n_{k}$ degrees of freedom to store the format in memory. Figure 1 illustrates a tensor-train decomposition of rank at most $\boldsymbol{s}=(s_{0},\ldots,s_{d})$ .

Normally a tensor-train decomposition is constructed by separating out one dimension at a time, and compressing each dimension in turn [41]. For simplicity, the decomposition considered in this paper is performed in the order of dimension 1, dimension 2, and so on. In this way, the entries of the tensor-train rank are bounded from above by the ranks of matrices formed by flattening [41]. That is, for $1\leq k\leq d-1$ we have

[TABLE]

where ${\rm rank}(X_{k})$ is the rank of the matrix $X_{k}$ . Therefore, if the ranks of all the matrices $X_{k}$ for $1\leq k\leq d-1$ are small, then the tensor $X$ can be exactly represented in a data-sparse format as a tensor-train decomposition.

2.2 Orthogonal Tucker decomposition

The orthogonal Tucker decomposition is a factorization of a tensor into a set of matrices and a core tensor, where the matrices have orthonormal columns [21, 32, 8]. A tensor $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ has a multilinear rank222The closely related Tucker rank of a tensor is also associated to the Tucker decomposition, except the matrices $A_{k}$ in Eq. 8 are not constrained to have orthonormal columns. Since multilinear ranks are more commonly used in applications, we do not consider Tucker ranks in this paper. of at most $\boldsymbol{t}=(t_{1},\ldots,t_{d})$ , if there are matrices $A_{1},\ldots,A_{d}$ with orthonormal columns and a core tensor $\mathcal{G}\in\mathbb{C}^{t_{1}\times\cdots\times t_{d}}$ such that

[TABLE]

Such a decomposition contains ${\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}p^{{\rm ML}}\leq}\sum_{k=1}^{d}n_{k}t_{k}+\prod_{k=1}^{d}t_{k}$ degrees of freedom, and can be computed by the so-called higher-order singular value decomposition [8].

2.3 Canonical Polyadic decomposition

The CP decomposition expresses a tensor as a sum of rank-1 tensors. A tensor $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ is of rank at most $r$ , if there are matrices $A^{(1)},\ldots,A^{(d)}$ and a diagonal tensor $D$ that

[TABLE]

where the only nonzero entries of $\mathcal{D}$ are $\mathcal{D}_{i,\ldots,i}$ for $1\leq i\leq r$ . If $\mathcal{D}$ is omitted in this bracket notation, then by convention all the nonzero entries of $\mathcal{D}$ are 1. This tensor decomposition can be stored using ${\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}p^{{\rm CP}}\leq r+r\sum_{k=1}^{d}n_{k}}$ degrees of freedom, but the decomposition is NP-hard to compute for worst case examples [19]. The CP decomposition in Eq. 9 is similar to the orthogonal Tucker decomposition with two important differences: (1) The matrices $A^{(1)},\ldots,A^{(d)}$ in Eq. 9 do not need to have orthogonal columns and (2) The core tensor $\mathcal{D}$ must be diagonal. This means that Eq. 9 is equivalent to expressing a tensor as a sum of rank-1 tensors.

Since we are aiming for upper bounds on the rank of a tensor to bound compressibility of a tensor in CP format, we can take any decomposition of the form in Eq. 9 with a potentially large $r$ , and see if its factor matrices $A^{(1)},\dots,A^{(d)}$ are themselves low rank. For example, we find that [34, Lem. 1]:333Lemma 1 of [34] shows that the dimension of the vector space that spans the slices in the $\nu$ th index is equal to the rank of $\mathcal{X}$ . The inequality in Eq. 10 follows from the extra assumption that the slices are themselves low rank tensors.

[TABLE]

where $r_{i}={\rm rank}(A^{(i)})$ for $1\leq i\leq d$ . The bound in Eq. 10 is useful because it allows one to derive upper bounds on the rank of a tensor via bounds on the rank of factor matrices.

3 Tensors derived by sampling smooth functions

One often finds that tensors derived from sampling smooth functions are compressible, and we make this observation precise. Tensors derived from sampling multivariate functions have been considered in [26, 22] and analogous results for matrices are available in the literature [45, 49].

3.1 Tensors constructed via sampling algebraically structured functions

In practice, one often encounter tensors that are sampled from multivariate functions. For example, one can take a continuous function of three variables, $f(x,y,z)$ , and sample $f$ on a tensor grid to obtain a tensor:

[TABLE]

where $\{x_{1},\ldots,x_{n}\}$ , $\{y_{1},\ldots,y_{n}\}$ , and $\{z_{1},\ldots,z_{n}\}$ are sets of points.

3.1.1 Polynomials and algebraic structure

One common scenario where it is easy to spot compressible tensors is when the tensor is sampled from a polynomial. To be specific, if a tensor $\mathcal{X}$ is derived by sampling a multivariate polynomial $p(x_{1},\dots,x_{d})$ of degree at most $N_{j}-1$ in the variable $x_{j}$ from a tensor-product grid, then one finds that $\mathcal{X}$ is highly compressible.

Lemma 3.1.

Let $p(x_{1},\ldots,x_{d})$ be a polynomial of degree at most $N_{j}-1$ in the variable $x_{j}$ for $1\leq j\leq d$ , and let $\mathcal{X}\in\mathbb{C}^{n_{1}\times\dots\times n_{d}}$ be the tensor constructed by sampling $p$ , i.e.,

[TABLE]

where $x^{(1)},\dots,x^{(d)}$ are sets of $n_{1},\dots,n_{d}$ nodes, respectively. Then,

•

$p^{{\rm TT}}(\mathcal{X})\leq\sum_{k=1}^{d}t_{k-1}t_{k}n_{k}$ , where $t_{k}=\min\{\prod_{j=1}^{k}N_{j},\prod_{j=k+1}^{d}N_{j}\}$ ,

•

$p^{{\rm ML}}(\mathcal{X})\leq\sum_{k=1}^{d}n_{k}N_{k}+\prod_{k=1}^{d}N_{k}$ , and

•

$p^{{\rm CP}}(\mathcal{X})\leq r+r\sum_{k=1}^{d}n_{k}$ , where** $r=\min_{1\leq k\leq d}\frac{1}{N_{k}}\prod_{j=1}^{d}N_{j}$ .

*Here, the tensor-train decomposition is constructed in the order of $x_{1},\dots,x_{d}$ .

Proof 3.2.

According to the degree assumptions on $p$ , we can write $p$ as

[TABLE]

where $a_{q_{1},\dots,q_{k}}(x_{k+1},\dots,x_{d})$ is a polynomial in the variables $x_{k+1},\dots,x_{d}$ and for $k+1\leq j\leq d$ , $x_{j}$ has degree at most $N_{j}-1$ . After sampling, this means that ${\rm rank}(X_{k})\leq\min\{\prod_{j=1}^{k}N_{j},\prod_{j=k+1}^{d}N_{j}\}$ and the bound on $p^{{\rm TT}}(\mathcal{X})$ follows.

Another way to write $p$ is

[TABLE]

where $b_{j}$ is a polynomial in $x_{1},\dots,x_{k-1},x_{k+1},\dots,x_{d}$ of degree at most $N_{j}-1$ in $x_{j}$ . After sampling, this shows that ${\rm rank}(X_{(k)})\leq N_{k}$ and the bound on $p^{{\rm ML}}(\mathcal{X})$ follows.

Finally, separating out $x_{d}$ , we can also write $p$ as

[TABLE]

*where each term in Eq. 11 is a rank 1 tensor after sampling. We can do this to each variable and thus ${\rm rank}(\mathcal{X})\leq\min_{1\leq k\leq d}\frac{1}{N_{k}}\prod_{j=1}^{d}N_{j}$ . The bound on $p^{{\rm CP}}(\mathcal{X})$ follows. *

A special case of Lemma 3.1 is when the polynomial $p$ has maximal degree of at most $N-1$ so that $N_{1}=\cdots=N_{d}=N$ .444We say that a polynomial $p_{N}(x_{1},\ldots,x_{d})$ has maximal degree $\leq N$ if $p_{N}$ is a polynomial of degree at most $N$ in all the variables $x_{i}$ . We find that

•

$p^{{\rm TT}}(\mathcal{X})\leq\sum_{k=1}^{d}N^{2t-1}n_{k}$ , where $t=\min\{k,d-k\}$ ,

•

$p^{{\rm ML}}(\mathcal{X})\leq N\sum_{k=1}^{d}n_{k}+N^{d}$ , and

•

$p^{{\rm CP}}(\mathcal{X})\leq N^{d-1}\sum_{k=1}^{d}n_{k}+N^{d-1}$ .

The important observation is that tensors constructed by sampling polynomials on a grid are highly compressible.

3.1.2 Other special cases of algebraic structure

Similar to multivariate polynomials, it is easy to spot—after some experience—the mathematical tensor ranks of tensors constructed by sampling functions that have explicit algebraic structure since each variable in the function can be thought as a fiber of the tensor. The easiest ones to spot are those tensors derived from functions that are the sums of products of single-variable functions, such as

[TABLE]

If $\mathcal{F}$ and $\mathcal{G}$ are tensors constructed by sampling $f$ and $g$ on a $n\times n\times n$ and $n\times n\times n\times n$ tensor-product grid, respectively, then the storage costs in different formats are bounded by

[TABLE]

where the tensor-train decompositions are performed in the order $x,y,z,w$ . Other examples are functions that can be expressed with exponentials and powers, and similar examples have also been considered [40, 27].

Some special functions require reorganizations to reveal their algebraic structures. If the function is expressed with trigonometric functions, then the sampled tensor can often be low rank due to trigonometric identities. For example, consider the function $f(x,y,z)=\cos(x+y+z)$ that is a special case of the examples in [43, 6]. Since it can be written as

[TABLE]

any tensor $\mathcal{F}$ constructed by sampling $f$ on a $n\times n\times n$ tensor-product grid satisfies

[TABLE]

These examples can often be combined to build more complicated functions that result in compressible tensors. This is an $ad\ hoc$ process and requires human ingenuity to express the sampled function in a revealing form. Again, tensors constructed by sampling such algebraically structured functions on a sufficiently large tensor-product grid can be represented using a small number of degrees of freedom.

3.2 Tensors derived by sampling smooth functions

Although most functions do not have the algebraic structure specified in Section 3.1, tensors that are constructed by sampling smooth functions are often well approximated by compressible tensors. In light of Lemma 3.1, our idea to understand the compressibility of a tensor derived from sampling a function is first to approximate that function by a multivariate polynomial, which is already a routine procedure for computing with low rank approximations to multivariate functions [18].

Without loss of generality, suppose that $\mathcal{X}$ is formed by sampling a smooth function $f$ on a tensor-product grid in $[-1,1]^{d}$ , i.e.,

[TABLE]

where $x^{(1)},\dots,x^{(d)}$ are sets of $n_{1},\dots,n_{d}$ nodes in $[-1,1]$ . Our idea is to find a multivariate polynomial $p$ of degree $\leq N_{j}-1$ in the variable $x_{j}$ that approximates $f$ in $[-1,1]^{d}$ and then set $\mathcal{Y}_{i_{1},\ldots,i_{d}}=p(x_{i_{1}}^{(1)},\dots,x_{i_{d}}^{(d)})$ . By Lemma 3.1, $\mathcal{Y}$ can be represented with a small number of degrees of freedom and $\mathcal{Y}$ is an approximation to $\mathcal{X}$ . In particular, we have

[TABLE]

where $\|\cdot\|_{\infty}$ denotes the supremum norm on $[-1,1]^{d}$ and $\|\cdot\|_{\rm max}$ is the absolute maximum entry norm. Therefore, if $p$ is a good approximation to $f$ , then $\mathcal{Y}$ is a good approximation to $\mathcal{X}$ too. Although the error bound is good for small $d$ , this approximation still suffers from the curse of dimensionality for large $d$ .

One can now propose any linear or nonlinear approximation scheme to find a polynomial approximation $p$ of $f$ on $[-1,1]^{d}$ . Clearly, excellent bounds on $\|\mathcal{X}-\mathcal{Y}\|_{F}$ are obtained by finding a $p$ so that

[TABLE]

where $\mathcal{P}_{N_{1},\dots,N_{d}}$ is the space of $d$ -variate polynomials of maximal degree $\leq N_{i}-1$ in $x_{i}$ for $1\leq i\leq d$ . This best multivariable polynomial problem is often, but not always, tricky to solve directly. In those cases, near-optimal polynomial approximations are used instead. One common choice is to use $p$ as the multivariate Chebyshev projection of $f$ . That is,

[TABLE]

where the primes indicate that the first term in the sum is halved and $T_{k}(x)$ is the Chebyshev polynomial of degree $k$ . Importantly, $p_{N_{1},\dots,N_{d}}^{\rm cheb}$ is a near-best polynomial approximation to $f$ [51], and the error $\smash{\|f-p_{N_{1},\dots,N_{d}}^{\rm cheb}\|_{\infty}}$ can be bounded. Thus, this choice of $p$ leads to bounds on the compressibility of $\mathcal{X}$ .

3.3 Worked examples

Here, we give two examples that illustrate how to understand the compressibility of tensors constructed by sampling smooth functions. We consider two functions: (1) A Fourier-like function, where we use best polynomial approximation, and (2) A sum of Gaussian bumps, where we use Chebyshev approximation.

3.3.1 Fourier-like function

Consider a tensor $\mathcal{X}\in\mathbb{C}^{n\times n\times n}$ constructed by sampling the following Fourier-like function on a tensor-product grid [49]:

[TABLE]

where $M\geq 1$ is a real parameter. While representing $\mathcal{X}$ exactly requires $n^{3}$ degrees of freedom, it can be approximated by tensors that require fewer degrees of freedom (in the tensor-train and Tucker formats). To see this, let $p^{{\rm best}}_{k-1}$ and $q^{{\rm best}}_{k-1}$ be the best minimax polynomial approximations of degree $\leq k-1$ to $\cos(M\pi t)$ and $\sin(M\pi t)$ on $[-1,1]$ , respectively, and define $h_{k-1}=p^{{\rm best}}_{k-1}+iq^{{\rm best}}_{k-1}$ . Note that $h_{k-1}(xyz)$ has maximal degree at most $k-1$ so that

[TABLE]

where $\mathcal{P}_{k-1}$ is the space of trivariate polynomials of maximal degree $\leq k-1$ and the equality follows since $t=xyz\in[-1,1]$ if $x,y,z\in[-1,1]$ . Furthermore, we have $e^{iM\pi t}=\cos(M\pi t)+i\sin(M\pi t)$ and so

[TABLE]

By the equioscillation theorem [44, Thm. 7.4], $p^{{\rm best}}_{k-1}=0$ for $k-1\leq 2\lfloor M\rfloor-1$ since $\cos(M\pi t)$ equioscillates $2\lfloor M\rfloor+1$ times in $[-1,1]$ . Similarly, $\sin(M\pi t)$ equioscillates $2\lfloor M\rfloor$ times in $[-1,1]$ and hence, $q^{{\rm best}}_{k-1}=0$ for $k-1\leq 2\lfloor M\rfloor-2$ . However, for $k>2\lfloor M\rfloor$ , $\sup_{t\in[-1,1]}\!\left|e^{iM\pi t}-h_{k-1}(t)\right|$ decays super-geometrically to zero as $k\rightarrow\infty$ . This also indicates that the error between the tensors sampled from $e^{iM\pi xyz}$ and $h_{k-1}(x,y,z)$ rapidly goes to 0 as $k\rightarrow\infty$ . Hence, the numerical maximal degree, $N_{\epsilon}$ , of $e^{iM\pi xyz}$ satisfies $N_{\epsilon}/2M\rightarrow c$ for some constant $c\geq 1$ as $M\rightarrow\infty$ . Lemma 3.1 shows that an approximant to $\mathcal{X}$ only requires $\mathcal{O}(M)$ degrees of freedom. In particular, if $s_{1}$ is the second element of the tensor-train rank of an approximant tensor to the one sampled by $e^{iM\pi xyz}$ , then $s_{1}/(2M)\rightarrow 1$ as $M\rightarrow\infty$ .

Figure 2 (left) plots the ratio of the second element of the tensor-train rank, $s_{1}$ , of a tensor sampled from the Fourier-like function and $2M$ . We observe that $s_{1}/(2M)\rightarrow 1$ as $M\rightarrow\infty$ .

3.3.2 A sum of Gaussian bumps

Consider a tensor $\mathcal{X}\in\mathbb{C}^{n\times n\times n}$ constructed by sampling a sum of $M$ Gaussian bumps, centered at arbitrary locations $(x_{1},y_{1},z_{1}),\dots,(x_{M},y_{M},z_{M})$ in $[-1,1]^{3}$ , i.e.,

[TABLE]

Each Gaussian bump is a separable function of three variables so, mathematically, the tensor ranks of $\mathcal{X}$ depend linearly on $M$ . However, since the sum is a smooth function, the ranks are related to the polynomial degree required to approximate $f(x,y,z)$ in $[-1,1]^{3}$ to an accuracy of $0<\epsilon<1$ . Hence, the tensor ranks of $X$ depend on $\gamma$ and have very mild growth in $M$ in the storage costs.

Due to the symmetry in $x$ , $y$ , and $z$ as well as separability of each term in Eq. 14, we find that the Chebyshev approximation to $f(x,y,z)$ can be bounded by

[TABLE]

where $p_{\ell}^{j}$ , $q_{\ell}^{j}$ , $r_{\ell}^{j}$ , and $h_{\ell}$ are Chebyshev approximations of degree $\leq\ell$ to $\smash{e^{-\gamma(x-x_{j})^{2}}}$ , $\smash{e^{-\gamma(y-y_{j})^{2}}}$ , $\smash{e^{-\gamma(z-z_{j})^{2}}}$ , and $\smash{e^{-\gamma x^{2}}}$ , respectively. An explicit Chebyshev expansion for $\smash{e^{-\gamma x^{2}}}$ is known and given by [37, p. 32]

[TABLE]

where the prime on the summation indicates that the first term is halved, and $I_{j}(z)$ is the modified Bessel function of the first kind with parameter $j$ [39, (10.25.2)]. This means that one can show that [12, Lem. 5]:555Unfortunately, there is a typo in [12, Lem. 5] and $I_{\ell+1}(\gamma/4)$ should be replaced by $I_{\lfloor\ell/2\rfloor+1}(\gamma/4)$ .

[TABLE]

By Lemma 3.1 and Eq. 13, we can understand the compressibility of $\mathcal{X}$ . In particular, we can find an approximant tensor whose tensor-train ranks are bounded by the smallest integer $\ell$ such that $6Mn^{3/2}e^{-\gamma/4}I_{\lfloor\ell/2\rfloor+1}(\gamma/4)\leq\epsilon$ . We find it straightforward to visualize compressibility via elements of the tensor ranks and their bounds, due to the way storage costs are calculated. Figure 2 (right) shows the second element of the tensor-train rank, $s_{1}$ of the approximant tensor, along with the bound that we derived. The bounds are relatively tight when $\epsilon$ is small.

4 Tensors with displacement structure

We say that $\mathcal{X}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ has an $(A_{1},\ldots,A_{d})$ -displacement structure of $\mathcal{G}\in\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ if $\mathcal{X}$ satisfies the multidimensional Sylvester equation

[TABLE]

where ‘ $\times_{k}$ ’ is the $k$ -mode matrix product of a tensor. In this section, we show that when $A_{1},\ldots,A_{d}$ are normal matrices with “separated” spectra and $\mathcal{G}$ is a low rank tensor, then $\mathcal{X}$ is compressible. Several classes of structured tensors (e.g., the Hilbert tensor) and the solution tensors of certain discretized partial differential equations (e.g., the discretized solution to Poisson’s equation) have a displacement structure, which leads to an understanding of their compressibility.

4.1 Zolotarev numbers

The bounds that we derive on compressibility of tensors involve so-called Zolotarev numbers [1, 13, 53]. A Zolotarev number is a positive number between [math] and $1$ defined via an infimum problem involving rational functions [53]. Namely,

[TABLE]

where $E$ and $F$ are disjoint complex sets and $\mathcal{R}_{k,k}$ is the set of irreducible rational functions of the form $p(x)/q(x)$ with polynomials $p$ and $q$ of degree at most $k$ . If $E$ and $F$ are well-separated, then one finds that $Z_{k}(E,F)$ decays rapidly with $k$ . This is because one can construct a low degree rational function that is small on $E$ and large on $F$ . If $E$ and $F$ are close to each other, then typically $Z_{k}(E,F)$ decreases much more slowly with $k$ .

Zolotarev numbers can be used to bound the singular values of matrices with displacement structure [4, Thm. 2.1]. In particular, if $X\in\mathbb{C}^{m\times n}$ with $m\geq n$ satisfies the displacement structure

[TABLE]

where $A\in\mathbb{C}^{m\times m}$ and $B\in\mathbb{C}^{n\times n}$ are normal matrices with spectra $\Lambda(A)\subseteq E$ and $\Lambda(B)\subseteq F$ , then the singular values of $X$ satisfy [4, Thm. 2.1]

[TABLE]

Roughly speaking, if $\Lambda(A)$ and $\Lambda(B)$ are well-separated and $\nu$ is small, then the singular values $\sigma_{j}(X)$ decrease rapidly to [math].

When working with tensors, we translate the inequalities in Eq. 18 into Frobenius norm error bounds so that matrix results can then be utilized.

Lemma 4.1.

If $X\in\mathbb{C}^{m\times n}$ is a matrix satisfying Eq. 17 and $X_{\nu k}$ is the best rank $\nu k$ approximation to $X$ , then

[TABLE]

*where $\|\cdot\|_{F}$ denotes the matrix Frobenius norm.

Proof 4.2.

To simplify notation let $Z_{k}=Z_{k}(E,F)$ , $r=\nu k$ , $\sigma_{j}=\sigma_{j}(X)$ for $1\leq j\leq n$ , and $\sigma_{j}=0$ for $j>n$ . If $k=0$ , then $r=0$ and $Z_{k}=1$ , so $X_{r}=0$ and the statement follows automatically. Now consider $k>0$ , note that for any $s\geq 1$ we have

[TABLE]

where the inequalities come from the repeated application of the bound in Eq. 18. Therefore, we can bound $\|X-X_{r}\|_{F}^{2}$ by partitioning the singular values into groups of $r$ . That is,

[TABLE]

where the last equality is obtained by summing up the geometric series. Since $\|X\|_{F}^{2}=\sum_{j=1}^{n}\sigma_{j}^{2}$ , we find that

[TABLE]

*The result follows by rearranging. *

For $0<\epsilon<1$ , the numerical rank of $X$ measured in the Frobenius norm is the smallest integer, $r_{\epsilon}$ , such that

[TABLE]

We denote this integer by ${\rm rank}_{\epsilon}(X)$ . From Lemma 4.1, we find that for matrices that satisfy Eq. 17, we have

[TABLE]

where $k$ is the smallest integer so that $Z_{k}(E,F)\leq\epsilon$ . Therefore, Zolotarev numbers are very useful when trying to bound the numerical rank of matrices with displacement structure. For example, for an $n\times n$ Pick matrix $P_{n}$ constructed with real numbers from an inverval $[a,b]$ with $0<a<b<\infty$ , one can find that ${\rm rank}_{\epsilon}(P_{n})\leq{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}2}\Bigl{\lceil}\log(4b/a)\log(4/\epsilon)/\pi^{2}\Bigr{\rceil}$ [4].

4.2 The compressibility of tensors with displacement structure in the tensor-train format

Zolotarev numbers can also be used to understand the compressibility of tensors satisfying (15). From the bounds in Eq. 7, one finds that the numerical ranks of each unfolding provides an upper bound on all entries of the tensor-train ranks of approximant tensors. More precisely, if $\mathcal{X}\in\mathbb{C}^{n_{1}\times\dots\times n_{d}}$ is a tensor and $0<\epsilon<1$ , then there exists a tensor $\tilde{\mathcal{X}}$ such that [41, Thm. 2.2]

[TABLE]

where $\delta=\epsilon/\sqrt{d{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}-1}}$ and $X_{k}$ is the $k$ th unfolding of $\mathcal{X}$ . In order to easily relate tensor-train ranks with multilinear ranks in the next subsection, we choose to use $\delta=\epsilon/\sqrt{d}$ .

If $\mathcal{X}$ satisfies Eq. 15, then by rearranging Eq. 15 one can show that each unfolding matrix, $X_{j}$ , has a displacement structure. This is precisely $B_{j}X_{j}-X_{j}C_{j}^{T}=G_{j}$ , where $G_{j}$ is the $j$ th unfolding of $\mathcal{G}$ and

[TABLE]

From properties of the Kronecker product [46, Thm 2.5], we know that $B_{j}$ and $C_{j}$ are normal matrices with $\Lambda(B_{j})=\Lambda(A^{(1)})+\dots+\Lambda(A^{(j)})\subseteq E_{j}$ and $\Lambda(C_{j})=-(\Lambda(A^{(j+1)})+\dots+\Lambda(A^{(d)}))\subseteq F_{j}$ .666By $\Lambda(A)+\Lambda(B)$ we mean the Minkowski sum, formed by adding each element in $\Lambda(A)$ to each element in $\Lambda(B)$ , i.e.,

$\Lambda(A)+\Lambda(B)=\{a+b\ |\ a\in\Lambda(A),\ b\in\Lambda(B)\}.$

From Lemma 4.1 we see that for any integer $k_{j}$ such that $Z_{k_{j}}(E_{j},F_{j})\leq\delta$ , then

[TABLE]

Therefore, a necessary condition to bound the numerical tensor-train ranks of $\mathcal{X}$ using this approach is that the spectra of $A^{(1)},\ldots,A^{(d)}$ are Minkowski sum separated.

Definition 4.3.

We say that normal matrices $A^{(1)},\ldots,A^{(d)}$ are Minkowski sum separated if there are disjoint sets $E_{j}$ and $F_{j}$ so that

[TABLE]

*where the set additions are Minkowski sums and $\Lambda(A^{(j)})$ denotes the spectrum of $A^{(j)}$ . *

Figure 3 illustrates three Minkowski sum separated matrices $A^{(1)},A^{(2)}$ , and $A^{(3)}$ along with possible choices for the sets $E_{j}$ and $F_{j}$ for $j=1,2$ . We summarize our findings as a theorem.

Theorem 4.4.

Suppose $\mathcal{X}\in\mathbb{C}^{n_{1}\times\dots\times n_{d}}$ satisfies Eq. 15, where $A^{(1)},\dots,A^{(d)}$ are Minkowski sum separated with disjoint sets $E_{j}$ and $F_{j}$ for $1\leq j\leq d-1$ . Then, for a fixed $0<\epsilon<1$ , we have

[TABLE]

*where $G_{j}$ is the $j$ th unfolding of $\mathcal{G}$ and $k_{j}$ is an integer so that $Z_{k_{j}}(E_{j},F_{j})\leq\epsilon/\sqrt{d}$ . *

For special choices of $E_{j}$ and $F_{j}$ , explicit bounds on $Z_{k_{j}}(E_{j},F_{j})$ are known and therefore the bounds in Theorem 4.4 are also explicit. Here we mention two special cases:

Intervals

If $\Lambda(A^{(j)})\subseteq[a,b]$ for $0<a<b<\infty$ , then one can take $E_{j}=[ja,jb]$ and $F_{j}=[-(d-j)b,-(d-j)a]$ in Theorem 4.4. From [4, Cor. 4.2], we find that

[TABLE]

In particular, the following bound holds:

[TABLE]

where $\nu_{j}={\rm rank}(G_{j})$ .

Disks

If $\Lambda(A^{(j)})\subseteq\{z\in\mathbb{C}\ :\ |z-z_{0}|\leq\eta\}$ for $0<\eta<z_{0}$ and $z_{0},\eta\in\mathbb{R}$ , then one finds that $\Lambda(A^{(1)})+\dots+\Lambda(A^{(j)})\subseteq\{z\in\mathbb{C}\ :\ |z-jz_{0}|\leq j\eta\}$ and $-(\Lambda(A^{(j+1)})+\dots+\Lambda(A^{(d)}))\subseteq\{z\in\mathbb{C}\ :\ |z+(d-j)z_{0}|\leq(d-j)\eta\}$ . From [47, p. 123], we find that

[TABLE]

where $\xi_{j}=\left(d^{2}z_{0}^{2}-((d-j)^{2}+j^{2})\eta^{2}\right)^{2}-4j^{2}(d-j)^{2}\eta^{4}$ . In particular,

[TABLE]

where $\nu_{j}={\rm rank}(G_{j})$ .

In Section 4.5, we use Eq. 21 to bound the numerical storage cost in tensor-train format of the Hilbert tensor and the solution tensor of a discretized Poisson equation.

4.3 The compressibility of tensors with displacement structure in the Tucker format

The matrix SVD can be used to calculate the numerical multilinear rank [32, 8]. Indeed, if $\mathcal{X}\in\mathbb{C}^{n_{1}\times\dots\times n_{d}}$ is a tensor and $0<\epsilon<1$ , then there exists a tensor $\tilde{\mathcal{X}}$ such that [8]:

[TABLE]

where $\delta=\epsilon/\sqrt{d}$ and $X_{(j)}$ is the $j$ th matricization of $\mathcal{X}$ .

Since the first unfolding of $\mathcal{X}$ coincides with the first matricization of $\mathcal{X}$ , the bound on the second element of the tensor-train rank is also a bound on the first element of the multilinear rank of $\mathcal{X}$ . One can use a similar idea to bound all entries of the multilinear ranks by considering the various matricizations. However, one finds that the spectra of $A^{(1)},\dots,A^{(d)}$ need to be separated in a slightly different sense.

Definition 4.5.

We say that normal matrices $A_{1},\dots,A_{d}$ are Minkowski singly separated if there are disjoint sets $E_{j}$ and $F_{j}$ so that

[TABLE]

*where the set additions are Minkowski sums and $\Lambda(A_{j})$ denotes the spectrum of $A_{j}$ . *

Figure 4 illustrates the spectra of Minkowski singly separated matrices $A^{(1)}$ , $A^{(2)}$ , and $A^{(3)}$ along with their enclosed sets and Minkowski sums of the sets. Under this separation condition, we have the following theorem:

Theorem 4.6.

Suppose $\mathcal{X}\in\mathbb{C}^{n_{1}\times\dots\times n_{d}}$ satisfies Eq. 15, where $A^{(1)},\dots,A^{(d)}$ are Minkowski singly separated with disjoint sets $E_{j}$ and $F_{j}$ for $1\leq j\leq d$ . Then, for a fixed $0<\epsilon<1$ , we have

[TABLE]

*where $k_{j}$ is an integer so that $Z_{k_{j}}(E_{j},F_{j})\leq\epsilon/\sqrt{d}$ . *

Proof 4.7.

One can bound all the entries of the multilinear rank vector of $\mathcal{X}$ by the second entry of the tensor-train rank vector of the tensors $\mathcal{Y}^{1},\dots,\mathcal{Y}^{d}$ (see Eq. 5). Due to the way $\mathcal{Y}^{j}$ is constructed, it can be shown that $\mathcal{Y}^{j}$ satisfies

[TABLE]

*where $H^{j}_{(1)}=G_{(j)},\dots,H^{j}_{(d-j+1)}=G_{(d)},H^{j}_{(d-j+2)}=G_{(1)},\dots,H^{j}_{(d)}=G_{(j-1)}$ and $\mathcal{H}^{j}$ is constructed from $\mathcal{G}$ in the same way that $\mathcal{Y}^{j}$ is constructed from $\mathcal{X}$ . The result follows from Theorem 4.4 as the $j$ th element of the multilinear rank of $\mathcal{X}$ is bounded above by the bound of the second entry of the tensor-train rank of $\mathcal{Y}^{j}$ . *

As before, explicit bounds on the compressibility in Tucker format can be obtained from Theorem 4.6 by special choices of $E_{j}$ and $F_{j}$ such as when they are intervals or disks.

Intervals

If $\Lambda(A^{(j)})\subseteq[a,b]$ for $0<a<b<\infty$ , then one can take $E_{j}=[a,b]$ and $F_{j}=[-(d-1)b,-(d-1)a]$ . Therefore, we find that [4, Cor. 4.2]

[TABLE]

where $\gamma=(da+(b-a))(db-(b-a))/(abd^{2})$ and ${\rm rank}^{\rm ML}(\mathcal{G})=(\mu_{1},\dots,\mu_{d})$ .

Disks

If $\Lambda(A^{(j)})\subseteq\{z\in\mathbb{C}\ :\ |z-z_{0}|\leq\eta\}$ for $0<\eta<z_{0}$ and $z_{0},\eta\in\mathbb{R}$ , then one can take $E_{j}=\{z\in\mathbb{C}\ :\ |z-z_{0}|\leq\eta\}$ and $F_{j}=\{z\in\mathbb{C}\ :\ |z+(d-1)z_{0}|\leq(d-1)\eta\}$ . From [47, p. 123], we find that

[TABLE]

where $\rho=(2(d-1)\eta^{2})/(d^{2}z_{0}^{2}-((d-1)^{2}+1)\eta^{2}-\sqrt{\xi})$ , $\xi=(d^{2}z_{0}^{2}-((d-1)^{2}+1)\eta^{2})^{2}-4(d-1)^{2}\eta^{4}$ , and ${\rm rank}^{\rm ML}(\mathcal{G})=(\mu_{1},\dots,\mu_{d})$ .

4.4 The compressibility of tensors with displacement structure in the CP format

While deriving the bounds in this paper, we also considered including bounds on the compressibility of tensors with displacement structure in CP format. We were unable to come up with nontrivial bounds unless we introduced several additional and arbitrary assumptions in the statements of our theorem.

4.5 Worked examples

Here, we give two examples that illustrate how to use the displacement structure of a tensor to understand its compressibility. Since the bounds in tensor-train format and Tucker format are related through ranks, we only show results for the tensor-train format. As in the previous examples, we use the second element of the tensor-train rank and its bound to visualize the compressibility. We consider two tensors: (1) The 3D Hilbert tensor and (2) The solution tensor of a Poisson equation.

4.5.1 The 3D Hilbert tensor

Consider the Hilbert tensor $\mathcal{H}\in\mathbb{C}^{n\times n\times n}$ defined by

[TABLE]

This tensor is analogous to the notoriously ill-conditioned Hilbert matrix [20, 9]. It is easy to verify that the tensor possesses the following displacement structure:

[TABLE]

where $\mathcal{S}$ is the tensor of all ones and $D$ is a diagonal matrix with $D_{ii}=i-\frac{2}{3}$ . Thus, ${\rm rank}(\mathcal{S})=1$ and the ranks of the unfoldings of $\mathcal{S}$ are all 1.

Since the spectrum of $D$ is contained in $[\frac{1}{3},\frac{3n-2}{3}]$ , Eq. 21 tells us that for any $0<\epsilon<1$ we have

[TABLE]

That is, $s_{1}=\mathcal{O}(\log n\log(1/\epsilon))$ and means that the $n\times n\times n$ Hilbert tensor can be stored, up to an accuracy of $\epsilon$ in the Frobenius norm, in just $\mathcal{O}(n(\log n)^{2}(\log(1/\epsilon))^{2})$ degrees of freedom. Figure 5 (left) shows the compressibility of $\mathcal{H}$ with $n=100$ by computing the ratio of the storage costs using tensor-train format and explicit storage. Our theoretical results bound the savings well. Figure 5 (right) shows the compressibility of $\mathcal{H}$ by plotting $s_{1}$ and its bound in (23) for different values of $n$ . The actual tensor-train ranks of $\mathcal{H}$ are computed with the TT-SVD algorithm [41].

4.5.2 Tensor solution of a discretized Poisson equation

Tensor decompositions can be incorporated into efficient solvers of partial differential equations [7, 42, 2, 48, 24, 28]. Displacement structure arises for the solution tensor when one discretizes a Laplace operator, or any Laplace-like operator. Here, consider the 3D Poisson equation on $[-1,1]^{3}$ with zero Dirichlet conditions, i.e.,

[TABLE]

If one writes down a second-order finite difference discretization of Eq. 24 on an $n\times n\times n$ equispaced grid, then one obtains the following multidimensional Sylvester equation

[TABLE]

where $h=2/n$ and $\mathcal{F}_{ijk}=f(ih-1,jh-1,kh-1)$ for $1\leq i,j,k\leq n-1$ . The solution tensor $\mathcal{X}$ is unknown and for large $n$ , one hope that $\mathcal{X}_{ijk}\approx u(ih-1,jh-1,kh-1)$ for $1\leq i,j,k\leq n-1$ is a reasonably good approximation. The eigenvalues of $K$ are given by $4/h^{2}\sin^{2}(\pi k/(2n))$ for $1\leq k\leq n$ with $h=2/n$ [35, (2.23)]. Since $(2/\pi)x\leq\sin x\leq 1$ for $x\in[0,\pi/2]$ and $h=2/n$ , the eigenvalues of $K$ are contained in the interval $[1,n^{2}]$ .

We are interested in understanding the compressibility of $\mathcal{X}$ in tensor-train format when $f=1$ . Since $\Lambda(K)\subseteq[1,n^{2}]$ and ${\rm rank}^{\rm TT}(\mathcal{F})=(1,1,1,1)$ , Eq. 21 gives

[TABLE]

Figure 6 (left) shows the second element of the tensor-train rank, $s_{1}$ , and the bound of the approximate solution tensor to the Poisson equation via finite difference discretization.

One wonders if there is also a fast Poisson solver for spectral discretizations. This turns out to be feasible with a carefully constructed ultraspherical spectral discretization. The Poisson equation can be discretized to a tensor equation as [11]:

[TABLE]

where

[TABLE]

$\tilde{C}_{k}^{(3/2)}$ is the degree $k$ orthonormalized ultraspherical polynomial with parameter $\frac{3}{2}$ [39, Table 18.3.1], $\mathcal{G}=\mathcal{F}\times_{1}M^{-1}\times_{2}M^{-1}\times_{3}M^{-1}$ , $A=D^{-1}M$ , $D$ is a diagonal matrix, $M$ and $A$ are both symmetric pentadiagonal matrices, and the spectrum of $A$ satisfies $\Lambda(A)\in[-1,\ -1/(30n^{4})]$ . If $f=1$ , Eq. 21 gives

[TABLE]

Figure 6 (right) shows the second element of the tensor-train rank, $s_{1}$ , and the bound of the approximate solution tensor to the Poisson equation via ultraspherical spectral discretization. This spectral discretization indicates that the $n\times n\times n$ tensor discretization of the solution can be approximated with only $\mathcal{O}(dn(\log n)^{2}(\log(1/\epsilon))^{2})$ degrees of freedom. This is a significant reduction in the cost of storing the solution, with a relatively straightforward decomposition. Comparatively, one can achieve $\mathcal{O}(d\log n\log(1/\epsilon))$ with quantics tensor formats [27, 23], but those require more complicated representations.

Some special functions can be well-approximated by exponential sums of the form

[TABLE]

and these approximant can be used to represent the solution to PDEs with Laplace-like operators [14, 25]. In [25], the author uses exponential sums to show that the solution tensor to several 3D elliptic PDEs can be $\mathcal{O}(dn(\log n)^{2}(\log(1/\epsilon))^{2})$ degrees of freedom. However, the constants in these compressibility statements are left implicit. In general, both exponential sum approximation and Zolotarev numbers can be used to bound the $k$ th singular value of matrices with displacement structure and capture the geometric decay, but the Zolotarev bound tends to be tighter and does not involve an algebraic factor related to $k$ [50].

4.6 Solving for tensors in compressed formats

Since the proof of Theorem 4.4 and Theorem 4.6 are constructive, we can use their implicit algorithms to solve 3D tensor Sylvester equations of the form:

[TABLE]

where $A^{(1)}\in\mathbb{C}^{n_{1}\times n_{1}}$ , $A^{(2)}\in\mathbb{C}^{n_{2}\times n_{2}}$ , $A^{(3)}\in\mathbb{C}^{n_{3}\times n_{3}}$ , and $\mathcal{F}\in\mathbb{C}^{n_{1}\times n_{2}\times n_{3}}$ . In particular, we can compute approximate solutions to Eq. 28 in tensor-train or Tucker format when $\mathcal{F}$ is a low rank tensor and the spectra of $A^{(1)}$ , $A^{(2)}$ , and $A^{(3)}$ are well-separated. If $A^{(1)}$ , $A^{(2)}$ , and $A^{(3)}$ are Minkowski sum separated, and the unfoldings $F_{1}$ and $F_{2}$ of $\mathcal{F}$ have low rank decompositions $F_{1}=W_{1}Z_{1}^{*}$ , and $F_{2}=W_{2}Z_{2}^{*}$ with rank $r_{1}$ and $r_{2}$ , respectively, then we can solve for $\mathcal{X}$ in tensor-train format.

The tensor-train factors of $\mathcal{X}$ obtained by the TT-SVD algorithm are orthogonal matrices for the column and row spaces of unfoldings of $X$ . For example, the first tensor-train factor $U_{1}$ of $\mathcal{X}$ can be found as a matrix with orthonormal columns spanning the column space of the first unfolding $X_{1}$ . Since $\mathcal{X}$ satisfies Eq. 28, we find that $X_{1}$ satisfies the Sylvester equation

[TABLE]

We can use the factored alternating direction implicit (fADI) method to solve Eq. 29 for a matrix $V_{1}$ such that $X_{1}=V_{1}D_{1}Y_{1}^{*}$ [5]. One can then use the QR decomposition of $V_{1}$ , i.e., $V_{1}=U_{1}R_{1}$ , to calculate the first tensor-train core $U_{1}$ .

Second and third tensor-train factors can be computed by finding matrices with orthonormal columns for the column and row spaces associated to $C_{2}$ , where $C_{2}={\rm reshape}(R_{1}D_{1}Y_{1}^{*},s_{1}n_{2},n_{3})$ . It can be shown that $C_{2}$ satisfies the Sylvester equation

[TABLE]

One can, again, use fADI to solve for a low rank decomposition of $C_{2}$ , i.e., $C_{2}=V_{2}D_{2}Y_{2}^{*}$ . This low rank decomposition can be compressed by performing a QR factorization of $V_{2}$ and $Y_{2}$ and then doing a SVD to obtain $C_{2}\approx U_{2}\Sigma T_{2}^{*}$ , where $U_{2}$ and $T_{2}$ are matrices with $s_{2}$ orthonormal columns and $\Sigma$ is a diagonal matrix. In this way, the second tensor-train factor is $U_{2}={\rm reshape}(U_{2},[s_{1},n_{2},s_{2}])$ and the third factor $U_{3}=\Sigma T_{2}^{*}$ . Although the fADI method requires the solution of shifted linear systems with $I\otimes(U_{1}^{*}A^{(1)}U_{1})+A^{(2)}\otimes I$ , the Kronecker product structure allows one to reshape these linear systems into Sylvester equations, which can themselves be solved with the alternating direction implicit (ADI) method [5]. That is, one can completely avoid solving a huge linear system. As a result, if $n_{1}=n_{2}=n_{3}=n$ , and $A^{(1)}$ , $A^{(2)}$ , and $A^{(3)}$ have structures so that shifted linear systems can be solved in $\mathcal{O}(n)$ , then the solver has a complexity that is less than $\mathcal{O}(n^{3})$ . In summary, the ADI-based tensor Sylvester equation solver is the following:

Similarly, if all matricizations of $\mathcal{F}$ are low rank, and $A^{(1)}$ , $A^{(2)}$ , and $A^{(3)}$ are Minkowski singly separated, then we can solve for the solution in orthogonal Tucker format via the higher order singular value decomposition (HOSVD) method [8]. Each factor matrix of $\mathcal{X}$ is a matrix with orthonormal columns that span the column space of the matricization of $\mathcal{X}$ , which satisfies the Sylvester equation:

[TABLE]

where

[TABLE]

If solving shifted linear systems with $A^{(1)}$ , $A^{(2)}$ , and $A^{(3)}$ is fast, then we can use fADI to solve for the orthogonal column space of $X_{(j)}$ , and use a direct method, such as a 3D Bartels–Stewart algorithm to solve for the core tensor [3].

4.6.1 Poisson equation solver

Consider the example of Poisson equation in Section 4.5.2 with ultraspherical discretization Eq. 26. Since $A$ is a penta-diagonal matrix, we can solve shifted linear systems with $A^{-1}$ in $\mathcal{O}(n)$ time. Therefore, we can obtain a fast Poisson equation solver that computes the solution in tensor-train or orthogonal Tucker format. The complexity for the tensor-train format solver is $\mathcal{O}(n(\log n)^{3}(\log(1/\epsilon))^{3})$ , where $0<\epsilon<1$ is the accuracy.

Figure 7 shows the running time of different discretized Poisson solvers. The red line represents the direct solver that converts (27) into a huge linear system via Kronecker product. The green line represents an eigen decomposition solver, which computes the eigen decomposition of $A$ to diagonalize the equation, and solves each element of $\mathcal{X}$ directly by scaling. The blue line represents our fADI-based tensor-train solver. We can see as $n$ gets large, our algorithm is the winner.777The fADI solver is implemented in C++, while the direct and the eigen solvers are implemented in MATLAB. However, both backslash linear system solver and eigen decomposition are carried out in LAPACK, so our comparison of the three solvers is still fair. All timings are performed in MATLAB R2019a on the super computer of Cornell’s Math department.

Acknowledgements

We have had discussions with David Bindel, Dan Fortunato, Leslie Greengard, and Madeleine Udell about the results in this paper and appreciate their thoughts and comments. We are grateful to Nicolas Boulle and Heather Wilber for carefully reading an earlier draft of this manuscript.

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] N. I. Akhiezer , Elements of the theory of elliptic functions , vol. 79, Amer. Math. Soc., 1990.
2[2] J. Ballani and L. Grasedyck , A projection method to solve linear systems in tensor format , Numer. Lin. Alg. Appl., 20 (2013), pp. 27–43.
3[3] R. H. Bartels and G. W. Stewart , Solution of the matrix equation ax+ xb= c [f 4] , Communications of the ACM, 15 (1972), pp. 820–826.
4[4] B. Beckermann and A. Townsend , On the singular values of matrices with displacement structure , SIAM J. Matrix Anal. Appl., 38 (2017), pp. 1227–1248.
5[5] P. Benner, R.-C. Li, and N. Truhar , On the adi method for sylvester equations , J. Comput. Appl. Math., 233 (2009), pp. 1035–1045.
6[6] G. Beylkin and M. J. Mohlenkamp , Numerical operator calculus in higher dimensions , Proceedings of the National Academy of Sciences, 99 (2002), pp. 10246–10251.
7[7] , Algorithms for numerical analysis in high dimensions , SIAM J. Sci. Comput., 26 (2005), pp. 2133–2159.
8[8] L. De Lathauwer, B. De Moor, and J. Vandewalle , A multilinear singular value decomposition , SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On the compressibility of tensors††thanks: Submitted to the editors .

Abstract

keywords:

1 Introduction

1.1 Tensor notation

1.2 Summary of paper

2 Three tensor decompositions

2.1 Tensor-train decomposition

2.2 Orthogonal Tucker decomposition

2.3 Canonical Polyadic decomposition

3 Tensors derived by sampling smooth functions

3.1 Tensors constructed via sampling algebraically structured functions

3.1.1 Polynomials and algebraic structure

Lemma 3.1**.**

Proof 3.2**.**

3.1.2 Other special cases of algebraic structure

3.2 Tensors derived by sampling smooth functions

3.3 Worked examples

3.3.1 Fourier-like function

3.3.2 A sum of Gaussian bumps

4 Tensors with displacement structure

4.1 Zolotarev numbers

Lemma 4.1**.**

Proof 4.2**.**

4.2 The compressibility of tensors with displacement structure in the tensor-train format

Definition 4.3**.**

Theorem 4.4**.**

Intervals

Disks

4.3 The compressibility of tensors with displacement structure in the Tucker format

Definition 4.5**.**

Theorem 4.6**.**

Proof 4.7**.**

Intervals

Disks

4.4 The compressibility of tensors with displacement structure in the CP format

4.5 Worked examples

4.5.1 The 3D Hilbert tensor

4.5.2 Tensor solution of a discretized Poisson equation

4.6 Solving for tensors in compressed formats

4.6.1 Poisson equation solver

Acknowledgements

Lemma 3.1.

Proof 3.2.

Lemma 4.1.

Proof 4.2.

Definition 4.3.

Theorem 4.4.

Definition 4.5.

Theorem 4.6.

Proof 4.7.