Analysis of tensor approximation schemes for continuous functions
Michael Griebel, Helmut Harbrecht

TL;DR
This paper investigates tensor approximation methods for continuous functions within Sobolev spaces, demonstrating that certain tensor formats can achieve dimension-robust approximation costs under specific conditions.
Contribution
It provides a detailed analysis of tensor approximation schemes for continuous functions, highlighting conditions for dimension-robustness in Tucker and tensor train formats.
Findings
Cost of tensor approximations is dimension-robust with proper weights.
Analysis applies to functions in isotropic Sobolev spaces.
Both Tucker and tensor train formats are effective under these conditions.
Abstract
In this article, we analyze tensor approximation schemes for continuous functions. We assume that the function to be approximated lies in an isotropic Sobolev space and discuss the cost when approximating this function in the continuous analogue of the Tucker tensor format or of the tensor train format. We especially show that the cost of both approximations are dimension-robust when the Sobolev space under consideration provides appropriate weights.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Analysis of tensor approximation schemes for continuous functions
Michael Griebel
Michael Griebel, Institut für Numerische Simulation, Universität Bonn, Endenicher Allee 19b, 53115 Bonn, and Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany
and
Helmut Harbrecht
Helmut Harbrecht, Departement Mathematik und Informatik, Universität Basel, Spiegelgasse 1, 4051 Basel, Switzerland
Abstract.
In this article, we analyze tensor approximation schemes for continuous functions. We assume that the function to be approximated lies in an isotropic Sobolev space and discuss the cost when approximating this function in the continuous analogue of the Tucker tensor format or of the tensor train format. We especially show that the cost of both approximations are dimension-robust when the Sobolev space under consideration provides appropriate dimension weights.
Key words and phrases:
Tensor format, approximation error, rank complexity, Sobolev space with dimension weights
1. Introduction
The efficient approximate representation of multivariate functions is an important task in numerical analysis and scientific computing. In this article, we hence consider the approximation of functions which live on the product of bounded domains , each of which satisfies . Besides a sparse grid approximation of the function under consideration, being discussed in, e.g., [8, 17, 18, 50], one can also apply a low-rank approximation by means of a tensor approximation scheme, see, e.g., [15, 21, 22, 34, 35] and the references therein.
The low-rank approximation in the situation of the product of domains is well understood. It is related to the singular value decomposition and has been studied for arbitrary product domains in, e.g., [19, 20], see also [46, 47, 48] for the periodic case. However, the situation is not that clear for the product of domains, where one ends up with tensor decompositions. Such tensor decompositions are generalizations of the well known singular value decomposition and the corresponding low-rank matrix approximation methods of two dimensions to the higher-dimensional setting. There, besides the curse of dimension, we encounter – due to the non-existence of an Eckart-Young-Mirsky theorem – that the concepts of singular value decomposition and low-rank approximation can be generalized to higher dimensions in more than one way. Consequently, there exist many generalizations of the singular value decomposition of a function and of low-rank approximations to tensors. To this end, various schemes have been developed over the years in different areas of the sciences and have successfully been applied to various high-dimensional problems ranging from quantum mechanics and physics via biology and econometrics, computer graphics and signal processing to numerical analysis. Recently, tensor methods have even been recognized as special deep neural networks in machine learning and big data analysis [11, 28]. As tensor approximation schemes, we have, for example, matrix product states, DMRG, MERA, PEPS, CP, CANDECOMP, PARAFAC, Tucker, tensor train, tree tensor networks and hierarchical Tucker, to name a few. A mathematical introduction into tensor methods is given in the seminal book [21], while a survey on existing methods and their literature can be found in [16]. Also various software packages have been developed for an algebra of operators dealing with tensors.
Tensor methods are usually analyzed as low-rank approximations to a full discrete tensor of data with respect to the -norm or Frobenius-norm. In this respective, they can be seen as compression methods which may avoid the curse of dimensionality, which is inherent in the full tensor representation. The main tool for studying tensor compression schemes is the so-called tensor-rank, compare [12, 13, 21]. Thus, instead of storage, as less as or even only storage is needed, where denotes the number of data points in one coordinate direction, denotes the dimension of the tensor under consideration and denotes the respective tensor rank of the data. The cost complexity of the various algorithms working with sparse tensor representations is correspondingly reduced and working in a sparse tensor format allows to alleviate or to completely break the curse of dimension for suitable tensor data classes, i.e., for sufficiently small .
However, the question where the tensor data stem from and the issue of the accuracy of the full tensor approximation, i.e., the discretization error of the full tensor itself and its relation to the error of a subsequent low-rank tensor approximation, is usually not adequately addressed.111We are only aware of [2, 3, 5, 39], where this question has been considered so far. Instead, only the approximation property of a low-rank tensor scheme with respect to the full tensor data is considered. But the former question is important since it clearly makes no sense to derive a tensor approximation with an error that is substantially smaller than the error which is already inherent in the full tensor data due to some discretization process for a continuous high-dimensional function which stems from some certain function class.
The approximation rates to continuous functions can be determined by a recursive use of the singular value decomposition, which is successively applied to convert the function into a specific continuous tensor format. We studied the singular value decomposition for arbitrary domains in [19, 20] and we now can apply these results to discuss approximation rates of continuous tensor formats. In the present article, given a function , we study the continuous analogues of the Tucker tensor decomposition and of the tensor train decomposition. We give bounds on the ranks required to ensure that the tensor decomposition admits a prescribed target accuracy. Especially, our analysis takes into account the influence of errors induced by truncating infinite expansions to finite ones. We therefore study an algorithm that computes the desired tensor expansion which is in contrast to the question of the smallest tensor-rank. We finally show that (isotropic) Sobolev spaces with dimension weights help to beat the curse of dimension when the number of product domains tends to infinity.
Besides the simple situation of , which is usually considered in case of tensor decompositions, there are many more applications of our general setting. For example, non-Newtonian flow can be modeled by a coupled system which consists of the Navier Stokes equation for the flow in a three-dimensional geometry described by and of the Fokker-Planck equation in a -dimensional configuration space , consisting of spheres. Here, denotes the number of atoms in a chain-like molecule which constitutes the non-Newtonian behavior of the flow, for details see [4, 29, 31, 37]. Another example is homogenization. After unfolding [10], a two-scale homogenization problem gives raise to the product of the macroscopic physical domain and the periodic microscopic domain of the cell problem, see [32]. For multiple scales, several periodic microscopic domains appear which reflect the different separable scales, see e.g. [27]. Also the -th moment of linear elliptic boundary value problems with random source terms, i.e. in , are known to satisfy a deterministic partial differential equation on the -fold product domain . There, the solution’s -th moment is given by the equation
[TABLE]
see [40, 41]. This approach extends to boundary value problems with random diffusion and to random domains as well [9, 25]. Moreover, we find the product of several domains in quantum mechanics for e.g. the Schrödinger equation or the Langevin equation, where each domain is three-dimensional and corresponds to a single particle. Finally, we encounter it in uncertainty quantification, where one has the product of the physical domain and of in general infinitely many intervals for the random input parameter, which reflects its series expansion by the Karhunen-Lòeve decomposition or the Lévy-Ciesielski decomposition.
The remainder of this article is organized as follows: In Section 2, we give a short introduction to our results on the singular value decomposition, which are needed to derive the estimates for the continuous tensor decompositions. Then, in Section 3, we study the continuous Tucker tensor format, computed by means of the higher-oder singular value decomposition. Next, we study the continuous tensor train decomposition in Section 4, computed by means of a repeated use of a vector-valued singular value decomposition. Finally, Section 5 concludes with some final remarks.
Throughout this article, to avoid the repeated use of generic but unspecified constants, we denote by that is bounded by a multiple of independently of parameters which and may depend on. Obviously, is defined as , and as and . Moreover, given a Lipschitz-smooth domain , means the space of square integrable functions on . For real numbers , the associated Sobolev space is denoted by , where its norm is defined in the standard way, compare [33, 45]. As usual, we have . The seminorm in is denoted by . Although not explicitly written, our subsequent analysis covers also the situation of being not a domain but a (smooth) manifold.
2. Singular value decomposition
2.1. Definition and calculation
Let and be Lipschitz-smooth domains. To represent functions on the tensor product domain in an efficient way, we will consider low-rank approximations which separate the variables and in accordance with
[TABLE]
It is well known (see e.g. [30, 38, 42]) that the best possible representation (2.1) in the -sense is given by the singular value decomposition, also called Karhunen-Lòeve expansion.222We refer the reader to [44] for a comprehensive historical overview on the singular value decomposition. Then, the coefficients are the singular values and the and are the left and right (-normalized) eigenfunctions of the integral operator
[TABLE]
This means that
[TABLE]
where
[TABLE]
is the adjoint of . Especially, the left and right eigenfunctions and form orthonormal bases in and , respectively.
In order to compute the singular value decomposition, we need to solve the eigenvalue problem
[TABLE]
for the integral operator
[TABLE]
Since , the kernel
[TABLE]
is a symmetric Hilbert-Schmidt kernel. Hence, there exist countably many eigenvalues
[TABLE]
and the associated eigenfunctions constitute an orthonormal basis in .
Likewise, to obtain an orthonormal basis of , we can solve the eigenvalue problem
[TABLE]
for the integral operator
[TABLE]
with symmetric Hilbert-Schmidt kernel
[TABLE]
It holds and the sequences and are related by (2.2).
2.2. Regularity of the eigenfunctions
Now, we consider functions . In the following, we collect results from [19, 20] concerning the singular value decomposition of such functions. We repeat the proof whenever needed for having explicit constants. To this end, we define the mixed Sobolev space as a tensor product of Hilbert spaces
[TABLE]
which we equip with the usual cross norm
[TABLE]
Note that
[TABLE]
Lemma 2.1**.**
Assume that for some fixed . Then, the operators
[TABLE]
are continuous with
[TABLE]
Proof.
From it follows for that . Therefore, the operator is continuous since
[TABLE]
Note that we have used here . Proceeding likewise for completes the proof. ∎
Lemma 2.2**.**
Assume that for some fixed . Then, it holds and for all with
[TABLE]
Proof.
According to (2.2) and Lemma 2.1, we have
[TABLE]
This proves the first assertion. The second assertion follows by duality. ∎
As an immediate consequence of Lemma 2.2, we obtain
[TABLE]
and
[TABLE]
We will show later in Lemma 2.6 how to improve this estimate by sacrificing some regularity.
2.3. Truncation error
We next give estimates on the decay rate of the eigenvalues of the integral operator with kernel (2.4). To this end, we exploit the smoothness in the function’s first variable and assume hence . We introduce finite element spaces , which consist of discontinuous, piecewise polynomial functions of total degree on a quasi-uniform triangulation of with mesh width . Then, given a function , the -orthogonal projection satisfies
[TABLE]
uniformly in due to the Bramble-Hilbert lemma, see e.g., [6, 7].
For the approximation of in the first variable, i.e. \big{(}(P_{r}\otimes I)f\big{)}(\boldsymbol{x},\boldsymbol{y}), we obtain the following approximation result for the present choice of , see [20] for the proof.
Lemma 2.3**.**
Assume that for some fixed . Let be the eigenvalues of the operator and those of . Then, it holds
[TABLE]
By combining this lemma with the approximation estimate (2.6) and in view of for all according to the min-max theorem of Courant-Fischer, see [1] for example, we conclude that the truncation error of the singular value decomposition can be bounded by
[TABLE]
Since the eigenvalues of the integral operator and its adjoint are the same, we can also exploit the smoothness of in the second coordinate by interchanging the roles of and in the above considerations. We thus obtain the following theorem:
Theorem 2.4**.**
Let for some fixed and let
[TABLE]
Then, it holds
[TABLE]
Remark 2.5**.**
Theorem 2.4 implies that the eigenvalues in case of a function decay like
[TABLE]
Having the decay rate of the eigenvalues at hand, we are able to improve the result of Lemma 2.2 by sacrificing some regularity. Note that the proof of this result is based upon an argument from [43].
Lemma 2.6**.**
Assume that for some fixed . Then, it holds
[TABLE]
and
[TABLE]
Proof.
Without loss of generality, we assume . Then, since , we conclude from (2.8) that
[TABLE]
where we used that . Moreover, by interpolating between and , compare [33, 45] for example, we find
[TABLE]
that is
[TABLE]
As a consequence, we infer that
[TABLE]
with . Therefore, it holds
[TABLE]
for any . Hence, the series
[TABLE]
converges for almost all , provided that . Likewise, the series
[TABLE]
converges for almost all . Thus, the series
[TABLE]
converges for almost all and , provided that . Because of Egorov’s theorem, the pointwise absolute convergence almost everywhere implies uniform convergence. Hence, we can switch differentiation and summation to get
[TABLE]
Finally, we exploit the product structure of and the orthonormality of to derive the first assertion, i.e.,
[TABLE]
The second assertion follows in complete analogy. ∎
2.4. Vector-valued functions
In addition to the aforementioned results, we will also need the following result which concerns the approximation of vector-valued functions. Here and in the sequel, the vector-valued function is an element of and for some domain , respectively, if the norms
[TABLE]
are finite. Likewise, the seminorm is defined in .
Consider now a vector-valued function of dimension . Then, instead of (2.6), we find
[TABLE]
since consists of components and we thus need -times as many ansatz functions for our approximation argument. Hence, in case of a vector-valued function , we conclude by exploiting the smoothness in the first variable333Note that the kernel function of is matrix-valued while the kernel function of is scalar-valued. that the truncation error of the singular value decomposition can be estimated by
[TABLE]
Hence, the decay rate of the singular values is considerably reduced. Finally, we like to remark that Lemma 2.6 holds also in the vector case, i.e.,
[TABLE]
and
[TABLE]
provided that has extra regularity in terms of . Here, analogously to above, .
After these preparations, we now introduce and analyze two types of continuous analogues of tensor formats, namely of the Tucker format [26, 49] and of the tensor train format [36, 34], and discuss their approximation properties for functions .
3. Tucker tensor format
3.1. Tucker decompostion
We shall consider from now on a product domain which consists of different domains , . For given and , we apply the singular value decomposition to separate the variables and . We hence get
[TABLE]
where the left eigenfunctions form an orthonormal basis in . Consequently, if we iterate over all , this yields an orthonormal basis of , and we arrive at the representation
[TABLE]
Herein, the tensor \big{[}\omega(\boldsymbol{\alpha})\big{]}_{\boldsymbol{\alpha}\in\mathbb{N}^{m}} is the core tensor, where a single coefficient is given by
[TABLE]
3.2. Truncation error
If we intend to truncate the singular value decomposition (3.11) after terms such that the truncation error is bounded by , we have to choose
[TABLE]
according to Theorem 2.4. Doing so for all , we obtain the approximation
[TABLE]
We have the following result on the Tucker decomposition:
Theorem 3.1**.**
Let for some fixed and . If the ranks are chosen according to for all . Then, the truncation error of the truncated Tucker decomposition is
[TABLE]
while the storage cost for the core tensor of are .
Proof.
For the approximation of the core tensor, the sets of the univariate eigenfunctions are used for all , cf. (3.12). Due to orthonormality, we find
[TABLE]
where we obtain in case of . Since
[TABLE]
for all , we arrive with (3.13) and the summation over at the desired error estimate. This completes the proof, since the estimate on the number of coefficients in the core tensor is obvious. ∎
3.3. Sobolev spaces with dimension weights
The cost of the core tensor of the Tucker decomposition exhibit the curse of dimension as the number of subdomains increases. This can be seen most simply for the example . Then, the cost are , which expresses the curse of dimension as long as is not proportional to . Nonetheless, in case of Sobolev spaces with dimension weights, the curse of dimension can be beaten.
For , we shall discuss the situation in more detail. To this end, we assume that all subdomains are identical to a single domain of dimension and note that the limit only makes sense when weights are included in the underlying Sobolev spaces which ensure that higher dimensions become less important. For our proofs we choose as usual arbitrary but fixed and show the existence of -independent constants in the convergence and cost estimates.
The Sobolev spaces with dimension weights we consider are given by all functions such that
[TABLE]
The definition in (3.14) means that, given a function with norm , the partial derivatives with respect to become less important as the dimension increases. Such functions appear for example in uncertainty quantification. Let be given a Karhunen-Loève expansion
[TABLE]
and insert it into a function of finite smoothness . Then, the function b\big{(}u({\bf x},{\bf y})\big{)} satisfies (3.14) with respect to the -variable, where . Hence, the solution of a given partial differential equation would satisfy a decay estimate similar to (3.14) whenever the stochastic field enters the partial differential equation through a non-smooth coefficient function , compare [14, 23, 24] for example.
It turns out that algebraically decaying weights (3.15) are sufficient to beat the curse of dimension in case of the Tucker tensor decomposition.444In Theorem 3.2, no truncation of the dimension is applied, as it would be required in practice if the number of domains tends to infinity. Note that the dimension truncation is indeed here the same as for the tensor train decomposition later on, see also Theorem 4.3.
Theorem 3.2**.**
Given , let for some fixed with weights (3.14) that decay like
[TABLE]
Then, for all , the error of the continuous Tucker decomposition with ranks
[TABLE]
is of order while the storage cost for the core tensor of are bounded by independent of the dimension .
Proof.
In view of Theorem 2.4 and (3.14), we deduce by choosing the ranks as in (3.13) that
[TABLE]
Therefore, we reach the desired over-all truncation error
[TABLE]
When the weights decay as in (3.16), then the cost of the core tensor are
[TABLE]
with . Hence, the cost of the core tensor stay bounded independently of since
[TABLE]
∎
4. Tensor train format
4.1. Tensor train decomposition
For the discussion of the continuous tensor train decomposition, we should assume that the domains , , are arranged in such a way that it holds .555The considerations in this section are based upon [5]. Nonetheless, the results derived there are not correct. The authors did not consider the impact of the vector-valued singular value decomposition in a proper way, which indeed does result in the curse of dimension.
Now, consider and separate the variables and by the singular value decomposition
[TABLE]
Since
[TABLE]
we can separate from by means of a second singular value decomposition and arrive at
[TABLE]
By repeating the last step and successively separating from for we finally arrive at the representation
[TABLE]
where
[TABLE]
In contrast to the Tucker format, we do not obtain a huge core tensor since each of the singular value decompositions of the tensor train decomposition removes the actual first spatial domain from the approximant. We just obtain a product of matrix-valued functions (except for the first and last factor which are vector-valued functions), each of which is related with a specific domain . This especially results in only sums in contrast to the sums for the Tucker format.
4.2. Truncation error
In practice, we truncate the singular value decomposition in step after terms, thus arriving at the representation
[TABLE]
One readily infers by using again Pythogoras’ theorem that the truncation error is bounded by
[TABLE]
see also [35]. Note that, for , the singular values in this estimate do not coincide with the singular values from the original continuous tensor train decomposition due to the truncation.
We next shall give bounds on the truncation error. In the -th step of the algorithm, , one needs to approximate the vector-valued function
[TABLE]
by a vector-valued singular value decomposition. This means that we consider the singular value decomposition (2.9) for a vector-valued function in case of the domains and .
For , it holds and
[TABLE]
according to Lemma 2.6, precisely in its vectorized version (2.10). It follows and, again by (2.10),
[TABLE]
We hence conclude recursively and
[TABLE]
Estimate (4.19) shows that the -seminorm of the vector-valued functions stays bounded by . But according to (2.9), we have in the -th step only the truncation error estimate
[TABLE]
Hence, in view of (4.19), to achieve the target accuracy per truncation, the truncation ranks need to be increased in accordance with
[TABLE]
We summarize our findings in the following theorem, which holds in this form also if the subdomains are not ordered in such a way that .
Theorem 4.1**.**
Let for some fixed and . Then, the over-all truncation error of the tensor train decomposition with truncation ranks (4.20) is
[TABLE]
The storage cost for are given by
[TABLE]
and hence are bounded by .
Remark 4.2**.**
If , then the cost of the tensor train decomposition are . Thus, the cost are quadratic compared to the cost of the Tucker decomposition. However, in practice, one performs forward steps and backward steps. This means one computes steps as described above to successively separate from the other variables. Then, one performs the algorithm in the opposite direction, i.e., one successively separates from the other variables. This way, the over-all cost are reduced to the order .666If the spatial dimensions , , of the subdomains are different, one can balance the number of forward and backward steps in a better way to reduce the cost further.
4.3. Sobolev spaces with dimension weights
Like for the Tucker decomposition, the cost of the tensor train decomposition suffer from the curse of dimension as the number of subdomains increases. We therefore discuss again appropriately Sobelev spaces with dimension weights, where we assume for reasons of simplicity that all subdomains are identical to a single domain of dimension .
Theorem 4.3**.**
Given , let for some fixed with weights (3.14) that decay like (3.15). For , choose the ranks successively in accordance with
[TABLE]
if and if . Here, is given by
[TABLE]
Then, the error of the continuous tensor train decomposition is of order while the storage cost of stay bounded by independent of the dimension .
Proof.
The combination of Theorem 2.4, (3.14) and (4.22) implies
[TABLE]
and
[TABLE]
Hence, as in the proof of Theorem 3.2, the approximation error of the continuous tensor train decomposition is bounded by a multiple of independent of .
Next, we observe for all that
[TABLE]
This recursively yields
[TABLE]
Hence, by using that , we obtain
[TABLE]
Therefore, the cost (4.21) are
[TABLE]
and, hence, are bounded independently of in view of (4.23). ∎
5. Discussion and conclusion
In the present article, we considered the continuous versions of the Tucker tensor format and of the tensor train format for the approximation of functions which live on an -fold product of arbitrary subdomains. By considering (isotropic) Sobolev smoothness, we derived estimates on the ranks to be chosen in order to realize a prescribed target accuracy. These estimates exhibit the curse of dimension.
Both tensor formats have in common that always only the variable with respect to a single domain is separated from the other variables by means of the singular value decomposition. This enables cheaper storage schemes, while the influence of the over-all dimension of the product domain is reduced to a minimum.
We also examined the situation of Sobolev spaces with dimension weights. Having sufficiently fast decaying weights helps to beat the curse of dimension as the number of subdomains tends to infinity. It turned out that algebraically decaying weights are appropriate for both, the Tucker tensor format and the tensor train format.
We finally remark that we considered here only the ranks of the tensor decomposition in the continuous case, i.e., for functions and not for tensors of discrete data. Of course, an additional projection step onto suitable finite dimensional trial spaces on the individual domains would be necessary to arrive at a fully discrete approximation scheme that can really be used in computer simulations. This would impose a further error of discretization type which needs to be balanced with the truncation error of the particular continuous tensor format.
Acknowledgement
Michael Griebel was partially supported by the Sonderforschungsbereich 1060 The Mathematics of Emergent Effects funded by the Deutsche Forschungsgemeinschaft. Both authors like to thank Reinhold Schneider (Technische Universität Berlin) very much for fruitful discussions about tensor approximation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Babuška, I. & Osborn, J. (1991) Eigenvalue Problems. In Handbook of Numerical Analysis , vol. II, North-Holland, Amsterdam, pages 641–784.
- 2[2] Bachmayr, M. & Dahmen, W. (2015) Adaptive near-optimal rank tensor approximation for high-dimensional operator equations. Found. Comput. Math. , 15 (4), 839–898.
- 3[3] Bachmayr, M. & Dahmen, W. (2016) Adaptive low-rank methods for problems on Sobolev spaces with error control in L 2 subscript 𝐿 2 L_{2} . ESAIM Math. Model. Numer. Anal. , 50 (4), 1107–1136.
- 4[4] Barrett, J. & Knezevic, D. & Süli, E. (2009) Kinetic models of dilute polymers. Analysis, approximation and computation. 11th School on Mathematical Theory in Fluid Mechanics 22–29 May 2009, Kacov, Czech Republic, Necas Center for Mathematical Modeling, Prague.
- 5[5] D. Bigoni, A. Engsig-Karup, and Y. Marzouk. (2016) Spectral tensor-train decomposition. SIAM J. Sci. Comput. , 48 (4), A 2405–A 2439.
- 6[6] Braess, D. (2001) Finite Elements. Theory, Fast Solvers, and Applications in Solid Mechanics . Cambridge University Press, Cambridge.
- 7[7] Brenner, S. & Scott, L. (2008) The Mathematical Theory of Finite Element Methods . Springer, Berlin.
- 8[8] Bungartz, H.-J. & Griebel, M. (2004) Sparse grids. Acta Numerica , 13 , 147–269.
