An adaptive stochastic Galerkin tensor train discretization for randomly   perturbed domains

Martin Eigel; Manuel Marschall; Michael Multerer

arXiv:1902.07753·math.NA·February 22, 2019·SIAM/ASA J. Uncertain. Quantification

An adaptive stochastic Galerkin tensor train discretization for randomly perturbed domains

Martin Eigel, Manuel Marschall, Michael Multerer

PDF

TL;DR

This paper introduces an adaptive stochastic Galerkin method using tensor train formats to efficiently solve PDEs on randomly perturbed domains, with error estimation and refinement capabilities.

Contribution

It develops a novel tensor train-based adaptive Galerkin framework for high-dimensional PDEs on random domains, including an a posteriori error estimator for refinement.

Findings

01

Efficient handling of high-dimensional randomness via tensor train compression.

02

Successful numerical benchmarks demonstrating accuracy and adaptivity.

03

Effective error estimation enabling iterative refinement.

Abstract

A linear PDE problem for randomly perturbed domains is considered in an adaptive Galerkin framework. The perturbation of the domain's boundary is described by a vector valued random field depending on a countable number of random variables in an affine way. The corresponding Karhunen-Lo\`eve expansion is approximated by the pivoted Cholesky decomposition based on a prescribed covariance function. The examined high-dimensional Galerkin system follows from the domain mapping approach, transferring the randomness from the domain to the diffusion coefficient and the forcing. In order to make this computationally feasible, the representation makes use of the modern tensor train format for the implicit compression of the problem. Moreover, an a posteriori error estimator is presented, which allows for the problem-dependent iterative refinement of all discretization parameters and the…

Equations192

∥ v ∥_{L^{p} (Ω, Σ, P; X)} : = ⎩ ⎨ ⎧ (\int_{Ω} ∥ v (\cdot, ω) ∥_{X}^{p} d P (ω))^{1/ p}, ω \in Ω ess sup ∥ v (\cdot, ω) ∥_{X}, p < \infty p = \infty.

∥ v ∥_{L^{p} (Ω, Σ, P; X)} : = ⎩ ⎨ ⎧ (\int_{Ω} ∥ v (\cdot, ω) ∥_{X}^{p} d P (ω))^{1/ p}, ω \in Ω ess sup ∥ v (\cdot, ω) ∥_{X}, p < \infty p = \infty.

\displaystyle-\operatorname{div}\big{(}\nabla u(\omega)\big{)}

\displaystyle-\operatorname{div}\big{(}\nabla u(\omega)\big{)}

u (ω)

D : = ω \in Ω ⋃ D (ω) .

D : = ω \in Ω ⋃ D (ω) .

V : D_{ref} \times Ω \to R^{d}

V : D_{ref} \times Ω \to R^{d}

∥ V (ω) ∥_{C^{1} (\overline{D_{ref}}; R^{d})}, ∥ V^{- 1} (ω) ∥_{C^{1} (\overline{D_{ref}}; R^{d})} \leq C_{uni} for P -a.e. ω \in Ω.

∥ V (ω) ∥_{C^{1} (\overline{D_{ref}}; R^{d})}, ∥ V^{- 1} (ω) ∥_{C^{1} (\overline{D_{ref}}; R^{d})} \leq C_{uni} for P -a.e. ω \in Ω.

V (\overset{x}{^}, ω) = E [V] (\overset{x}{^}) + k = 1 \sum \infty V_{k} (\overset{x}{^}) Y_{k} (ω) .

V (\overset{x}{^}, ω) = E [V] (\overset{x}{^}) + k = 1 \sum \infty V_{k} (\overset{x}{^}) Y_{k} (ω) .

E [V] (\overset{x}{^}) : = \int_{Ω} V (\overset{x}{^}, ω) d P (ω) .

E [V] (\overset{x}{^}) : = \int_{Ω} V (\overset{x}{^}, ω) d P (ω) .

γ_{k} : = ∥ V_{k} ∥_{W^{1, \infty} (D_{ref}; R^{d})} .

γ_{k} : = ∥ V_{k} ∥_{W^{1, \infty} (D_{ref}; R^{d})} .

Cov [V] (\overset{x}{^}, \overset{x}{^}^{'}) : = \int_{Ω} \overline{V} (\overset{x}{^}, ω) \overline{V}^{⊺} (\overset{x}{^}^{'}, ω) d P (ω)

Cov [V] (\overset{x}{^}, \overset{x}{^}^{'}) : = \int_{Ω} \overline{V} (\overset{x}{^}, ω) \overline{V}^{⊺} (\overset{x}{^}^{'}, ω) d P (ω)

\overline{V} (\overset{x}{^}, ω) : = V (\overset{x}{^}, ω) - E [V] (\overset{x}{^})

\overline{V} (\overset{x}{^}, ω) : = V (\overset{x}{^}, ω) - E [V] (\overset{x}{^})

E [V] (\overset{x}{^}) = \overset{x}{^} .

E [V] (\overset{x}{^}) = \overset{x}{^} .

V (\overset{x}{^}, y) = \overset{x}{^} + k = 1 \sum \infty V_{k} (\overset{x}{^}) y_{k} .

V (\overset{x}{^}, y) = \overset{x}{^} + k = 1 \sum \infty V_{k} (\overset{x}{^}) y_{k} .

J (\overset{x}{^}, y) = I + k = 1 \sum \infty V_{k}^{'} (\overset{x}{^}) y_{k} .

J (\overset{x}{^}, y) = I + k = 1 \sum \infty V_{k}^{'} (\overset{x}{^}) y_{k} .

x = V (\overset{x}{^}, y),

x = V (\overset{x}{^}, y),

\displaystyle-\operatorname{div}_{{\hat{x}}}\big{(}{{\boldsymbol{A}}}({{\boldsymbol{y}}})\nabla_{{\hat{x}}}\hat{u}({{\boldsymbol{y}}})\big{)}

\displaystyle-\operatorname{div}_{{\hat{x}}}\big{(}{{\boldsymbol{A}}}({{\boldsymbol{y}}})\nabla_{{\hat{x}}}\hat{u}({{\boldsymbol{y}}})\big{)}

\overset{u}{^} (y)

A (\overset{x}{^}, y) : = (J^{⊺} J)^{- 1} (\overset{x}{^}, y) det J (\overset{x}{^}, y), \hat{f} (\overset{x}{^}, y) : = (f \circ V) (\overset{x}{^}, y) det J (\overset{x}{^}, y)

A (\overset{x}{^}, y) : = (J^{⊺} J)^{- 1} (\overset{x}{^}, y) det J (\overset{x}{^}, y), \hat{f} (\overset{x}{^}, y) : = (f \circ V) (\overset{x}{^}, y) det J (\overset{x}{^}, y)

\overset{u}{^} (\overset{x}{^}, y) : = (u \circ V) (\overset{x}{^}, y) .

\overset{u}{^} (\overset{x}{^}, y) : = (u \circ V) (\overset{x}{^}, y) .

0 < \frac{1}{C _{uni}^{d}} \leq det J (\overset{x}{^}, y) \leq C_{uni}^{d} < \infty

0 < \frac{1}{C _{uni}^{d}} \leq det J (\overset{x}{^}, y) \leq C_{uni}^{d} < \infty

0 < \overset{c}{ˇ} := \frac{1}{C _{uni}^{d + 2}} \leq ∥ A (\overset{x}{^}, y) ∥_{2} \leq C_{uni}^{d + 2} =: \overset{c}{^} < \infty

0 < \overset{c}{ˇ} := \frac{1}{C _{uni}^{d + 2}} \leq ∥ A (\overset{x}{^}, y) ∥_{2} \leq C_{uni}^{d + 2} =: \overset{c}{^} < \infty

F : = {μ \in N_{0}^{\infty}; ∣ supp μ ∣ < \infty} where supp μ : = {m \in N; μ_{m} \neq = 0} .

F : = {μ \in N_{0}^{\infty}; ∣ supp μ ∣ < \infty} where supp μ : = {m \in N; μ_{m} \neq = 0} .

∥ \partial_{x}^{α} f ∥_{L^{\infty} (D)} \leq c_{f} α! ρ^{- ∣ α ∣}

∥ \partial_{x}^{α} f ∥_{L^{\infty} (D)} \leq c_{f} α! ρ^{- ∣ α ∣}

\displaystyle\big{\|}\partial^{{\boldsymbol{\alpha}}}_{{\boldsymbol{y}}}{{\boldsymbol{A}}}({{\boldsymbol{y}}})\big{\|}_{L^{\infty}(D_{\operatorname{ref}};\mathbb{R}^{d\times d})}

\displaystyle\big{\|}\partial^{{\boldsymbol{\alpha}}}_{{\boldsymbol{y}}}{{\boldsymbol{A}}}({{\boldsymbol{y}}})\big{\|}_{L^{\infty}(D_{\operatorname{ref}};\mathbb{R}^{d\times d})}

\displaystyle\big{\|}\partial^{{\boldsymbol{\alpha}}}_{{\boldsymbol{y}}}\hat{f}({{\boldsymbol{y}}})\big{\|}_{L^{\infty}(D_{\operatorname{ref}})}

\displaystyle\big{\|}\partial^{{\boldsymbol{\alpha}}}_{{\boldsymbol{y}}}\hat{u}({{\boldsymbol{y}}})\big{\|}_{H^{1}(D_{\operatorname{ref}})}

Λ : = {(μ_{1}, \dots, μ_{M}, 0, \dots) \in F : μ_{m} = 0, \dots, d_{m} - 1, m = 1, \dots, M}

Λ : = {(μ_{1}, \dots, μ_{M}, 0, \dots) \in F : μ_{m} = 0, \dots, d_{m} - 1, m = 1, \dots, M}

P_{μ} (y) = m = 1 \prod \infty P_{μ_{m}} (y_{m}) = m \in supp μ \prod P_{μ_{m}} (y_{m}) .

P_{μ} (y) = m = 1 \prod \infty P_{μ_{m}} (y_{m}) = m \in supp μ \prod P_{μ_{m}} (y_{m}) .

[[χ]]_{S} : = χ ∣_{T_{1}} \cdot n_{S} - χ ∣_{T_{2}} \cdot n_{S} .

[[χ]]_{S} : = χ ∣_{T_{1}} \cdot n_{S} - χ ∣_{T_{2}} \cdot n_{S} .

∥(id - I) v ∥_{L^{2} (T)} \leq c_{T} h_{T} ∣ v ∣_{X, ω_{T}}, ∥(id - I) v ∥_{L^{2} (S)}

∥(id - I) v ∥_{L^{2} (T)} \leq c_{T} h_{T} ∣ v ∣_{X, ω_{T}}, ∥(id - I) v ∥_{L^{2} (S)}

{\mathcal{V}}_{N}\mathrel{\mathrel{\mathop{:}}=}{\mathcal{V}}_{N}(\Lambda,{\mathcal{T}},p)\mathrel{\mathrel{\mathop{:}}=}\bigg{\{}v_{N}(x,y)=\sum_{\mu\in\Lambda}v_{N,\mu}(x)P_{\mu}(y):\ v_{N,\mu}\in{\mathcal{X}}_{p}({\mathcal{T}})\bigg{\}},

{\mathcal{V}}_{N}\mathrel{\mathrel{\mathop{:}}=}{\mathcal{V}}_{N}(\Lambda,{\mathcal{T}},p)\mathrel{\mathrel{\mathop{:}}=}\bigg{\{}v_{N}(x,y)=\sum_{\mu\in\Lambda}v_{N,\mu}(x)P_{\mu}(y):\ v_{N,\mu}\in{\mathcal{X}}_{p}({\mathcal{T}})\bigg{\}},

I v : = μ \in Λ \sum (I v_{μ}) P_{μ} .

I v : = μ \in Λ \sum (I v_{μ}) P_{μ} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Abstract.

A linear PDE problem for randomly perturbed domains is considered in an adaptive Galerkin framework. The perturbation of the domain’s boundary is described by a vector valued random field depending on a countable number of random variables in an affine way. The corresponding Karhunen-Loève expansion is approximated by the pivoted Cholesky decomposition based on a prescribed covariance function. The examined high-dimensional Galerkin system follows from the domain mapping approach, transferring the randomness from the domain to the diffusion coefficient and the forcing. In order to make this computationally feasible, the representation makes use of the modern tensor train format for the implicit compression of the problem. Moreover, an a posteriori error estimator is presented, which allows for the problem-dependent iterative refinement of all discretization parameters and the assessment of the achieved error reduction. The proposed approach is demonstrated in numerical benchmark problems.

Key words and phrases:

P

artial differential equations with random coefficients, random domain, tensor train, uncertainty quantification, stochastic finite element methods, adaptive methods, ALS, low-rank, reduced basis methods

{AMS}

35R60, 47B80, 60H35, 65C20, 65N12, 65N22, 65J10

1. Introduction

Uncertainties in the data for mathematical models are found naturally when dealing with real-world applications in science and engineering. Being able to quantify such uncertainties can greatly improve the relevance and reliability of computer simulations and moreover provide valuable insights into statistical properties of quantities of interest (QoI). This is one of the main motivations for the thriving field of Uncertainty Quantification (UQ).

In the application considered in this work, the computational domain is assumed as randomly perturbed. This e.g. can be an appropriate model to incorporate production tolerances into simulations and extract statistical information about how such uncertainties get transported through the assumed model. Random domain problems have been examined before, see for instance [31, 2, 20]. Often, sampling approaches are used to evaluate QoI as e.g. has been investigated with a multilevel quadrature for the the domain mapping method in [20]. As an alternative, we propose to employ a stochastic Galerkin FEM (SGFEM) to obtain a functional representation of the stochastic solution on the reference domain, which can then be used to evaluate statistical quantities. For the discretization, a Legendre polynomial chaos basis and first order FE are chosen. The expansion of the perturbation vector field in a (finite) countable sequence of random variables gives rise to a high-dimensional coupled algebraic system, which easily becomes intractable to numerical methods or results in very slow convergence. A way to overcome this problem is to utilize model order reduction techniques. In this work, we make use of the modern tensor train (TT) format [26], which provides an efficient hierarchical tensor representation and is able to exploit low-rank properties of the problem at hand. Another important technique to reduce computational cost is the use of an adaptive discretization. In our case, this is based on a reliable a posteriori error estimator, afforded by the quasi-orthogonal approximation obtained by the SGFEM. With the described error estimator, an iterative adaptive selection of optimal discretization parameters (steering mesh refinement, anisotropic polynomial chaos and tensor ranks) is possible.

For the Karhunen-Loève expansion of the random vector field, we employ the pivoted Cholesky decomposition derived in [18, 19]. The random coefficient and right-hand side which arise due to the integral transformation are tackled with a tensor reconstruction method. All evaluations are carried out in the TT format, which in particular allows for the efficient computation of the error estimator as part of the adaptive algorithm.

The paper is structured as follows: The next section introduces the setting and the required assumptions of the random linear model problem. In particular, a description of the perturbation vector field and the variable transformation is given, converting the random domain problem to a problem with random coefficient and forcing. Section 3 defines the Galerkin finite element discretization of the random coefficient problem in Legendre chaos polynomials. Moreover, the framework for residual based a posteriori error estimation is described. The tensor train format used for the efficient computation of the problem is introduced in Section 4. Section 5 lays out the refinement strategy for the Galerkin method, which is based on the evaluation of a reliable a posteriori error estimate in the tensor representation and an appropriate adaptive algorithm. Numerical examples are discussed in Section 6.

2. Diffusion problems on random domains

In this section, we formulate the stationary diffusion problem on random domains as introduced in [20]. Let $(\Omega,\Sigma,\mathbb{P})$ denote a complete and separable probability space with $\sigma$ -algebra $\Sigma$ and probability measure $\mathbb{P}$ . Here, complete means that $\Sigma$ contains all $\mathbb{P}$ -null sets. Moreover, for a given Banach space $E$ , we introduce the Lebesgue-Bochner space $L^{p}(\Omega,\Sigma,\mathbb{P};{\mathcal{X}})$ , $1\leq p\leq\infty$ , which consists of all equivalence classes of strongly measurable functions $v\colon\Omega\to{\mathcal{X}}$ with bounded norm

[TABLE]

Note that for $p=2$ and ${\mathcal{X}}$ a separable Hilbert space, $L^{p}(\Omega,\Sigma,\mathbb{P};{\mathcal{X}})$ is isomorphic to the tensor product space ${\mathcal{X}}\otimes L^{2}(\Omega,\Sigma,\mathbb{P})$ . We henceforth neglect the dependence on the $\sigma$ -algebra to simplify the notation. For an exposition of Lebesgue-Bochner spaces we refer to [21].

In this article, we are interested in computing quantities of interest of the solution to the elliptic diffusion problem

[TABLE]

for $\mathbb{P}$ -almost every $\omega\in\Omega$ . Note that, the randomness is carried by the open and bounded Lipschitz domain $D\colon\Omega\to\mathbb{R}^{d}$ . It is also possible to consider non-trivial diffusion coefficients or boundary data, see e.g. [13] for the treatment of non-homogenous Dirichlet data and [25] for random diffusion coefficients. However, we emphasize that, in order to derive regularity results that allow for the data sparse approximation of quantities of interest, the data have to be analytic functions, cf. [20].

In order to guarantee the well posedness of (1), we assume that all data, i.e. the loading $f$ and a possible non-trivial diffusion coefficient, are defined with respect to the hold-all domain

[TABLE]

For the modelling of random domains, we employ the concept of random vector fields. To that end, we assume that there exists a reference domain $D_{\operatorname{ref}}\subset\mathbb{R}^{d}$ for $d=2,3$ with Lipschitz continuous boundary $\partial D_{\operatorname{ref}}$ and a random vector field

[TABLE]

such that $D(\omega)={{\boldsymbol{V}}}(D_{\operatorname{ref}},\omega)$ . In addition, we require that ${{\boldsymbol{V}}}$ is a uniform $C^{1}$ -diffeomorphism, i.e. there exists a constant $C_{\operatorname{uni}}>1$ such that

[TABLE]

In particular, since ${{\boldsymbol{V}}}\in L^{\infty}\big{(}\Omega;C^{1}(\overline{D_{\operatorname{ref}}})\big{)}\subset L^{2}\big{(}\Omega;C^{1}(\overline{D_{\operatorname{ref}}})\big{)}$ , the random vector field ${{\boldsymbol{V}}}$ exhibits a Karhunen-Loève expansion of the form

[TABLE]

Herein, the expectation is given in terms of the Bochner integral

[TABLE]

Note that here and henceforth, we denote ${\hat{x}}\in D_{\operatorname{ref}}$ as material coordinates, in contrast to spatial coordinates $x\in D(\omega)$ . In particular, there holds $x={{\boldsymbol{V}}}({\hat{x}},\omega)$ for some ${\hat{x}}\in D_{\operatorname{ref}}$ . The anisotropy, which is induced by the spatial contributions $\{{{\boldsymbol{V}}}_{k}\}_{k}$ , describing the fluctuations around the nominal value $\mathbb{E}[{{\boldsymbol{V}}}]({\hat{x}})$ , is encoded by

[TABLE]

In our model, we shall also make the following common assumptions. {assumption}

(i)

The random variables $Y=\{Y_{k}\}_{k}$ take values in $\Gamma_{1}\mathrel{\mathrel{\mathop{:}}=}[-1,1]$ .

(ii)

The random variables $\{Y_{k}\}_{k}$ are independent and identically distributed.

(iii)

The sequence ${\boldsymbol{\gamma}}\mathrel{\mathrel{\mathop{:}}=}\{{\gamma}_{k}\}_{k}$ is at least in $\ell^{1}(\mathbb{N})$ .

In view of this assumption, the Karhunen-Loève expansion (3) can always be computed if the expectation $\mathbb{E}[{{\boldsymbol{V}}}]$ and the matrix-valued covariance function

[TABLE]

are known. Herein,

[TABLE]

denotes the centered random vector field. The Karhunen-Loève expansion is based on the spectral decomposition of the integral operator associated to the covariance function, which can be computed efficiently by means of the pivoted Cholesky decomposition, if the covariance function is sufficiently smooth, cf. [18, 19].

By an appropriate reparametrization, we can always guarantee that

[TABLE]

Moreover, if we identify the random variables $\{Y_{k}\}_{k}$ by their image ${{\boldsymbol{y}}}\in\Gamma_{\infty}\mathrel{\mathrel{\mathop{:}}=}\bigtimes_{m\in\mathbb{N}}\Gamma_{m}=[-1,1]^{\mathbb{N}}$ , we end up with the representation

[TABLE]

For later reference, we also introduce the push-forward measure $\pi_{\infty}=\mathbb{P}_{\#}Y$ on $\Gamma_{\infty}$ , which will be assumed as a tensor product measure $\pi_{\infty}=\bigotimes_{m\in\mathbb{N}}\pi_{m}$ , where $\pi_{m}$ is a probability measure on $\Gamma_{m}=[-1,1]$ .

The Jacobian of ${{\boldsymbol{V}}}$ with respect to the material coordinate $\hat{x}$ is given by

[TABLE]

Introducing the parametric domains $D({{\boldsymbol{y}}})\mathrel{\mathrel{\mathop{:}}=}{\bf V}(D_{\operatorname{ref}},{{\boldsymbol{y}}})$ , i.e.

[TABLE]

we may now introduce the model problem transported to the reference domain which reads for every ${\boldsymbol{y}}\in\Gamma_{\infty}$ :

[TABLE]

Herein, we have

[TABLE]

and

[TABLE]

{remark}

The uniformity condition in (2) implies that the functional determinant $\det{{\boldsymbol{J}}}({\hat{x}},{{\boldsymbol{y}}})$ in (7) is either uniformly positive or negative, see [20] for the details. We shall assume without loss of generality $\det{{\boldsymbol{J}}}({\hat{x}},{{\boldsymbol{y}}})>0$ and hence $|\det{{\boldsymbol{J}}}({\hat{x}},{{\boldsymbol{y}}})|=\det{{\boldsymbol{J}}}({\hat{x}},{{\boldsymbol{y}}})$ , i.e. we may just drop the modulus. More precisely, due to (2), we can bound the determinant according to

[TABLE]

for every ${\hat{x}}\in D_{\operatorname{ref}}$ and almost every ${{\boldsymbol{y}}}\in\Gamma_{\infty}$ . In addition, all singular values of ${{\boldsymbol{J}}}^{-1}({\hat{x}},{{\boldsymbol{y}}})$ are bounded from below by $C_{\operatorname{uni}}^{-1}$ and from above by $C_{\operatorname{uni}}$ . From this, we obtain the bound

[TABLE]

for every ${\hat{x}}\in D_{\operatorname{ref}}$ and almost every ${{\boldsymbol{y}}}\in\Gamma_{\infty}$ . Hence, the transported model problem is uniformly elliptic.

We conclude this section by summarizing the regularity results for ${{\boldsymbol{A}}},\hat{f},\hat{u}$ , cp. (6), with respect to the parameter ${{\boldsymbol{y}}}\in\Gamma_{\infty}$ from [20]. For this, denote by ${\mathcal{F}}$ the set of finitely supported multi-indices

[TABLE]

{theorem}

Let the right-hand side $f$ from (1) satisfy

[TABLE]

for some constants $c_{f},\rho>0$ and $\alpha\in{\mathcal{F}}$ . Then, for every ${{\boldsymbol{\alpha}}}\in\mathcal{F}$ it holds

[TABLE]

for some constants $C>0$ , which depend on $c_{f},\rho,C_{\operatorname{uni}},d,D_{\operatorname{ref}},\|{\boldsymbol{\gamma}}\|_{\ell^{1}}$ but are independent of the multi-index ${\boldsymbol{\alpha}}$ .

3. Adaptive Galerkin discretisation

In this section we describe the Galerkin discretization of the considered random PDE (6) in a finite dimensional subspace $\mathcal{V}_{N}\subset\mathcal{V}=\mathcal{X}\otimes\mathcal{Y}$ . Determined by the elliptic problem type with homogeneous boundary condition, we assume ${\mathcal{X}}=H^{1}_{0}(D_{\operatorname{ref}})$ is discretized by a first order Lagrange FE basis on a mesh representing $D_{\operatorname{ref}}$ . Moreover, the randomness is modelled in a truncated version of ${\mathcal{Y}}=L^{2}(\Gamma_{\infty},\pi_{\infty})$ and represented by Legendre chaos polynomials orthonormal with respect to the joint probability measure $\pi_{\infty}$ associated with the parameter ${\boldsymbol{y}}$ . Consequently, ${\mathcal{V}}_{N}$ with norm $\lVert v\rVert^{2}_{\mathcal{V}}=\mathbb{E}_{\pi_{M}}[\lVert v\rVert^{2}_{\mathcal{X}}]$ is spanned by the respective tensor basis. Moreover, the residual based a posteriori error estimator of [5, 6] is recalled for the problem at hand.

For efficient computations of the Galerkin projection and the error estimator, the resulting system with inhomogeneous coefficient and right-hand side (7) is represented in the tensor train format as presented in Section 4.

3.1. Parametric and deterministic discretization

To determine a multivariate polynomial basis of $\mathcal{Y}$ , we first define the full tensor index set of order $M\in\mathbb{N}$ and maximal degree $(d_{1},\ldots,d_{M})\in\mathbb{N}^{M}$ by

[TABLE]

For any such subset $\Lambda\subset{\mathcal{F}}$ , we define $\operatorname{supp}\Lambda\mathrel{\mathrel{\mathop{:}}=}\bigcup_{\mu\in\Lambda}\operatorname{supp}\mu\subset\mathbb{N}$ . Let $(P_{n})_{n=0}^{\infty}$ denote a basis of $L^{2}([-1,1])$ , orthogonal with respect to the Lebesgue measure, consisting of Legendre polynomials $P_{n}$ of degree $n\in\mathbb{N}_{0}\$ on $\mathbb{R}$ . Moreover, to obtain a finite dimensional setting, we define the truncated parameter domain $\Gamma_{M}\mathrel{\mathrel{\mathop{:}}=}[-1,1]^{M}$ and probability measure $\pi_{M}\mathrel{\mathrel{\mathop{:}}=}\bigotimes_{m=1}^{M}\pi_{m}$ . By tensorization of the univariate polynomials, an orthogonal basis of $L^{2}(\Gamma_{M})=\bigotimes_{m=1}^{M}L^{2}([-1,1])$ is obtained. Then, for any multi-index $\mu\in{\mathcal{F}}$ , the tensor product polynomial $P_{\mu}\mathrel{\mathrel{\mathop{:}}=}\bigotimes_{m=1}^{\infty}P_{\mu_{m}}$ in ${\boldsymbol{y}}\in\Gamma_{M}$ is expressed as the finite product

[TABLE]

Assuming that $\pi_{M}=\mathcal{U}(\Gamma_{M})$ , after suitable rescaling we can consider $(P_{\mu})_{\mu\in{\mathcal{F}}}$ as an orthonormal basis of $L^{2}(\Gamma_{M},\pi_{M})=\bigotimes_{m=1}^{M}L^{2}([-1,1],\frac{1}{2}\lambda)$ , where $\lambda$ denotes the Lebesgue measure and hence $\frac{1}{2}\lambda$ is the uniform measure on $[-1,1]$ , see [28].

A discrete subspace of ${\mathcal{X}}$ is given by the conforming finite element space ${\mathcal{X}}_{p}({\mathcal{T}})\mathrel{\mathrel{\mathop{:}}=}\mathrm{span}\{\varphi_{i}\}_{i=1}^{N}\subset{\mathcal{X}}$ of degree $p\geq 0$ on some simplicial regular mesh ${\mathcal{T}}$ of the domain $D_{\operatorname{ref}}$ with the set of faces ${\mathcal{S}}$ (i.e. edges for $d=2$ ) and basis functions $\varphi_{i}$ . For a convenient presentation, we denote the piecewise constant basis functions of ${\mathcal{X}}_{0}({\mathcal{T}})$ by $\{\psi_{\ell}\}_{\ell=1}^{N_{0}}$ , where $N_{0}=\dim{\mathcal{X}}_{0}$ is the number of elements in ${\mathcal{T}}$ . In order to circumvent complications due to an inexact approximation of boundary values, we assume that $D_{\operatorname{ref}}$ is a polytope. By denoting $P_{p}({\mathcal{T}})$ the space of piecewise polynomials of degree $p\geq 0$ on ${\mathcal{T}}$ , the assumed FE discretization with Lagrange elements then satisfies ${\mathcal{X}}_{p}({\mathcal{T}})\subset P_{p}({\mathcal{T}})\cap C(\overline{{\mathcal{T}}})$ . For any element $T\in{\mathcal{T}}$ and face $S\in{\mathcal{S}}$ , we set the entity sizes $h_{T}\mathrel{\mathrel{\mathop{:}}=}\operatorname{diam}T$ and $h_{S}\mathrel{\mathrel{\mathop{:}}=}\operatorname{diam}S$ . Let $n_{S}$ denote the exterior unit normal on any face $S$ . The jump of some $\chi\in H^{1}(D_{\operatorname{ref}};\mathbb{R}^{d})$ on $S=\overline{T_{1}}\cap\overline{T_{2}}$ in normal direction is then defined by

[TABLE]

By $\omega_{T}$ and $\omega_{S}$ we denote the element and facet patches defined by the union of all elements which share at least a vertex with $T$ or $S$ , respectively. Consequently, the Clément interpolation operator $I\colon{\mathcal{X}}\to{\mathcal{X}}_{p}({\mathcal{T}})$ satisfies, respectively for $T\in{\mathcal{T}}$ and $S\in{\mathcal{S}}$ ,

[TABLE]

where the seminorms $\lvert\;\cdot\;\rvert_{{\mathcal{X}},\omega_{T}}$ and $\lvert\;\cdot\;\rvert_{{\mathcal{X}},\omega_{S}}$ are the restrictions of $\lVert\;\cdot\;\rVert_{{\mathcal{X}}}$ to $\omega_{T}$ and $\omega_{S}$ ,

The fully discrete approximation space subject to some mesh $\mathcal{T}$ with FE order $p\geq 0$ and active set $\Lambda$ with $\lvert\Lambda\rvert<\infty$ is given by

[TABLE]

and it holds ${\mathcal{V}}_{N}\subset{\mathcal{V}}$ . We define a tensor product interpolation operator ${\mathcal{I}}\colon L^{2}(\Gamma_{\infty},\pi_{\infty};{\mathcal{X}})\to{\mathcal{V}}_{N}$ for $v=\sum_{\mu\in{\mathcal{F}}}v_{\mu}P_{\mu}\in{\mathcal{V}}=L^{2}(\Gamma_{\infty},\pi_{\infty};{\mathcal{X}})$ by setting

[TABLE]

For $v\in{\mathcal{V}}$ and all $T\in{\mathcal{T}},\ S\in{\mathcal{S}}$ , it holds

[TABLE]

3.2. Random field discretisation

In this paragraph, we highlight the special structure of the random field discretization. We aim at an efficient way to discretize the transformed and parametrized random fields (7) in terms of the piecewise constant finite element functions $\{\psi_{i}\}_{i=1}^{N_{0}}\subset{\mathcal{X}}_{0}\subset{\mathcal{X}}$ , pointwise for every ${\boldsymbol{y}}\in\Gamma_{M}$ .

In [25], it has been shown how the random vector field (5) can efficiently be approximated by means of finite elements. This results in a truncated representation with $M\in\mathbb{N}$ terms of the form

[TABLE]

where ${{\boldsymbol{e}}}_{1},\ldots,{{\boldsymbol{e}}}_{d}$ denotes the canonical basis of $\mathbb{R}^{d}$ , $\varphi_{1},\ldots,\varphi_{N}$ is a basis for ${\mathcal{X}}_{1}({\mathcal{T}})$ and $c_{i,k,m}\in\mathbb{R}$ are the coefficients in the basis representation of ${{\boldsymbol{V}}}_{m,h}$ . The length $M$ of this expansion depends on the desired approximation error of the random field, which can be rigorously controlled in terms of operator traces, see [19, 29].

For the corresponding Jacobian, we obtain

[TABLE]

More explicitly, the Jacobians ${{\boldsymbol{V}}}_{m,h}^{\prime}(\hat{x})$ are given by

[TABLE]

Since $\partial_{i}\varphi_{k}(\hat{x})$ , $i=1,\ldots,d$ , $k=1,\ldots,N$ are piecewise constant functions, we can represent ${{\boldsymbol{V}}}_{m,h}^{\prime}$ in an element based fashion according to

[TABLE]

where $\psi_{\ell}$ denotes the characteristic function on the element $T_{\ell}\in\mathcal{T}$ and $\tilde{c}_{\ell,m,i,j}\in\mathbb{R}$ are the corresponding coefficients. Hence, we end up with a piecewise constant representation of ${{\boldsymbol{V}}}_{h}^{\prime}$ , which reads

[TABLE]

From this representation, it is straightforward to calculate $\det{{\boldsymbol{J}}}_{h}({\hat{x}},{{\boldsymbol{y}}})$ for a given ${{\boldsymbol{y}}}\in\Gamma_{M}$ , also in an element based fashion. Having ${{\boldsymbol{V}}}_{h}({\hat{x}},{{\boldsymbol{y}}})$ , ${{\boldsymbol{J}}}_{h}({\hat{x}},{{\boldsymbol{y}}})$ , $\det{{\boldsymbol{J}}}_{h}({\hat{x}},{{\boldsymbol{y}}})$ at our disposal, it is then easy to evaluate ${{\boldsymbol{A}}}({\hat{x}},{{\boldsymbol{y}}})$ and $\hat{f}({\hat{x}},{{\boldsymbol{y}}})$ , as well.

This procedure can be extended to the general case of order $p>0$ ansatz functions for the random vector field (5), resulting in an order $p-1$ approximation of the desired quantities in (7).

3.3. Variational formulation

Using the transformation in (7), the weak formulation of the model problem (6) reads: find $u\in{\mathcal{V}}$ , such that for all $v\in{\mathcal{V}}$ there holds

[TABLE]

This characterizes the operator ${\mathcal{A}}:L^{2}(\Gamma_{\infty},\pi_{\infty};V)\to L^{2}(\Gamma_{\infty},\pi_{\infty};V^{\ast})$ , which gives rise to the energy norm $\lVert v\rVert^{2}_{\mathcal{A}}\mathrel{\mathrel{\mathop{:}}=}\langle{\mathcal{A}}(v),v\rangle$ . Employing the finite dimensional spaces of the previous section leads to the discrete weak problem: find $u=\sum_{\mu\in\Lambda}\sum_{i=1}^{N}U(i,\mu)\varphi_{i}P_{\mu}\in{\mathcal{V}}_{N}$ , such that for all $i^{\prime}=1,\ldots,N$ and $\alpha^{\prime}\in\Lambda$

[TABLE]

Here, we define the discrete linear operator

[TABLE]

and the discrete right-hand side

[TABLE]

3.4. Residual based a posteriori error estimates

In the following, we recall the residual based a posteriori error estimator derived in [5, 6], adopted for the problem at hand. An efficient reformulation in the tensor train format is postponed to Section 4. The basis for the estimator is the residual ${\mathcal{R}}(w_{N})\in L^{2}(\Gamma_{\infty},\pi_{\infty};{\mathcal{X}}^{\ast})={\mathcal{V}}^{\ast}$ with respect to some $w_{N}\in{\mathcal{V}}_{N}$ and the solution $u\in{\mathcal{V}}$ of (6) given by

[TABLE]

It has an $L^{2}(\Gamma_{\infty},\pi_{\infty})$ -convergent expansion in $(P_{\nu})_{\nu\in{\mathcal{F}}}$ given by

[TABLE]

with coefficients $r_{\nu}\in{\mathcal{X}}^{\ast}$ characterized by

[TABLE]

Here, $\hat{f}_{\nu}$ , ${\boldsymbol{A}}_{\mu}$ and $w_{N,\kappa}$ denote the coefficients in the Legendre chaos expansion of $\hat{f}=\sum_{\nu\in{\mathcal{F}}}\hat{f}_{\nu}P_{\mu}$ , ${\boldsymbol{A}}=\sum_{\mu\in{\mathcal{F}}}{\boldsymbol{A}}_{\mu}P_{\mu}$ and $w_{N}=\sum_{\kappa\in{\mathcal{F}}}w_{N,\kappa}P_{\kappa}$ and

[TABLE]

is the $\nu$ -relevant triple product tuple set.

We recall a central theorem from [5], which enables the derivation of an error bound based on an approximation $w_{N}$ of the Galerkin projection $u_{N}$ of the solution $u$ in the energy norm. {theorem}

Let ${\mathcal{V}}_{N}\subset{\mathcal{V}}$ be a closed subspace and $w_{N}\in{\mathcal{V}}_{N}$ , and let $u_{N}\in{\mathcal{V}}_{N}$ denote the ${\mathcal{A}}$ Galerkin projection of $u\in{\mathcal{V}}$ onto ${\mathcal{V}}_{N}$ . Then, for some $c_{\mathcal{A}},c_{\mathcal{I}}>0$ , it holds

[TABLE]

{remark}

The constant $c_{\mathcal{I}}$ is related to the Clément interpolation operator in $V$ and $\check{c}$ stems from the spectral equivalence such that $\lVert v\rVert_{\mathcal{A}}\geq\check{c}\lVert v\rVert_{\mathcal{V}}$ . We refer to [5] for further details.

{remark}

We henceforth assume that the data $\hat{f}$ and ${\boldsymbol{A}}$ are exactly expanded in a finite set $\Delta$ with $\Lambda\subset\Delta\subset{\mathcal{F}}$ , i.e. with the approximation, there is no significant contribution from the neglected modes ${\mathcal{F}}\setminus\Delta$ . The residual can then be split into approximation and truncation contributions

[TABLE]

where ${\mathcal{R}}_{\Xi}$ denotes the restriction of the expansion to the set $\Xi\subset{\mathcal{F}}$ . Computable upper bounds for the two residual terms and the algebraic error $\lVert u_{N}-w_{N}\rVert_{\mathcal{A}}$ are recalled in the following.

For any discrete $w_{N}\in{\mathcal{V}}_{N}$ , we define the following error estimators in analogy to the presentation in [5, 6] and [8]:

•

A deterministic residual estimator for ${\mathcal{R}}_{\Lambda}$ steering the adaptivity of the mesh ${\mathcal{T}}$ is given by

[TABLE]

with volume contribution for any $T\in{\mathcal{T}}$

[TABLE]

and facet contribution for any $S\in{\mathcal{S}}$

[TABLE]

•

The stochastic truncation error estimator stems from splitting the residual in (18), while considering the inactive part over $\Delta\setminus\Lambda$ . It is possible to construct the estimator, as in the deterministic case, for every element of the triangulation and consider different mesh discretisations for every stochastic multi-index. Since we want to focus on a closed formulation and avoid technical details, the stochastic estimator is formulated on the whole domain $D_{\operatorname{ref}}$ . Nevertheless, for more insight we introduce a collection of suitable tensor sets, which indicate the error portion of every active stochastic dimension $m=1,\ldots,M$ (in fact, we could even consider $m>M$ ),

[TABLE]

Then, for every $w_{N}\in\mathcal{V}_{N}$ , the stochastic tail estimator on $\Delta_{m}$ is given by

[TABLE]

where we define for every multi index $\nu\in{\mathcal{F}}$ the residual portion

[TABLE]

The collection of sets $\{\Delta_{n}\}_{n=1}^{M}$ is beneficial in the adaptive refinement procedure but it does not cover the whole stochastic contributions of the residual. For this, we need to compute the global stochastic tail estimator over ${\mathcal{F}}\setminus\Lambda$

[TABLE]

which incorporates an infinite sum that becomes finite due to remark 3.4.

•

The algebraic error denotes the distance of $w_{N}$ to the ${\mathcal{V}}_{N}$ best approximation $u_{N}$ . In particular, this distance can e.g. occur due to an early termination of an iterative solver or an restriction to another solution manifold $\mathcal{M}\subset{\mathcal{V}}_{N}$ . This error can be bounded by

[TABLE]

where

[TABLE]

Here, $W\in\mathbb{R}^{N,d_{1},\ldots,d_{M}}$ denotes the coefficient tensor of $w_{N}\in\mathcal{V}_{N}$ , ${\boldsymbol{L}}$ is the discrete operator from (16) and $\lVert\cdot\rVert_{F}$ is the Frobenius norm. Note that the rank-1 operator ${\boldsymbol{H}}$ is a base change operator to orthonormalize the physical basis functions, i.e.

[TABLE]

The combination of these estimators in the context of Theorem 3.4 yields an overall bound $\Theta$ for the energy error similar to the references [5, 6, 10] and [8] {corollary}

For any $w_{N}\in\mathcal{V}_{N}$ , the solution $u\in\mathcal{V}$ of the model problem (1) and the Galerkin approximation $u_{N}\in\mathcal{V}_{N}$ in (15), there exists constants $c_{\eta},c_{\zeta},c_{\iota}>0$ such that it holds

[TABLE]

{remark}

Observing the residual in (17) it becomes clear that the derived error estimators suffer from the “curse of dimensionality” and are hence not computable for larger problems. However, the hierarchical low-rank tensor representation introduced in the next section alleviates this substantial obstacle and makes possible the adaptive algorithms described in Section 5.

4. Tensor trains

The inherent tensor structure of the involved Bochner function space $\mathcal{V}=\mathcal{X}\bigotimes_{m=1}^{M}L^{2}([-1,1],\pi_{m})$ and the corresponding finite dimensional analogue $\mathcal{V}_{N}$ motivates the use of hierarchical tensor formats which aim at an implicit model order reduction, effectively breaking the curse of dimensionality in case of low-rank approximability of operator and solution.

A representative $v\in\mathcal{V}_{N}$ can be written as

[TABLE]

where $V\in\mathbb{R}^{N,d_{1},\ldots,d_{M}}$ is a high dimensional tensor containing for example the projection coefficients

[TABLE]

Setting $d=\max\{d_{1},\ldots,d_{M}\}$ , the storage cost of $V$ is $\mathcal{O}(Nd^{M})$ , which grows exponentially with the number of dimensions $M\in\mathbb{N}$ in the stochastic parameter space. To alleviate this major problem for numerical methods, we impose a low-rank assumption on the involved objects and introduce a popular tensor format as follows.

A tensor $V\in\mathbb{R}^{N,d_{1},\ldots,d_{M}}$ is called in tensor train (TT) format if every entry can be represented as the result of a matrix-vector multiplication of the form

[TABLE]

To simplify notation, set $r_{M}=1$ . If the vector $\mathbf{r}=(r_{0},\ldots,r_{M})$ is minimal in some sense, we call $\mathbf{r}$ the TT-rank and (30) is the TT-decomposition of $V$ . It can be observed that the complexity of $V$ now depends only linearly on the number of dimensions, namely $V=\mathcal{O}(dM\max\{\mathbf{r}\}^{2})$ . In [27, 24] it was shown that many functions in numerical applications admit a low-rank representation.

Given the full tensor description of $V$ , one could compute the tensor train representation by a hierarchical singular value decomposition (HSVD) as described in [14]. However, this is usually unfeasible due to the high dimensionality of $V$ or because it is known only implicitly. In that case, the utilization of high dimensional interpolation or regression algorithm is advisible, see e.g. [26, 15].

In this work, we rely on a TT reconstruction approach and employ it to obtain the representation of the transformed coefficient function and the right-hand side (7). Opposite to an explicit (intrusive) discretisation of the linear system in tensor format as e.g. carried out in [8, 23], the reconstruction method relies on a set of random samples of the solution. The non-intrusive algorithm used in the numerical experiments is described in [9]. Similar ideas were presented in [26, 3, 4], where a tensor cross approximation was used for the construction of the algebraic system. In contrast to the tensor reconstruction, a selective sampling of strides in the tensor has to be available to perform a cross approximation. Consider [16] for a survey on the topic of low-rank approximations methods.

To sketch the reconstruction approach, we assume a set $\{y^{(k)}\}_{k=1}^{K}$ of $K$ parameter realisations and corresponding measurements of a function $\{v(\cdot,y^{k})\in{\mathcal{X}}\}_{k=1}^{K}$

[TABLE]

Recall that $N$ is the dimension of the finite element space. We define a linear measurement operator $\hat{\mathcal{A}}:\mathbb{R}^{N\times d_{1}\times\cdots\times d_{M}}\to\mathbb{R}^{NK}$ acting on a tensor $W\in\mathbb{R}^{N\times d_{1}\times\cdots\times d_{M}}$ by

[TABLE]

with a contraction $\circ_{M}$ over the $M$ stochastic modes and

[TABLE]

The reconstruction problem is to find a tensor $W$ with minimal TT-rank such that $\hat{\mathcal{A}}(W)={\bf b}$ . Details in particular of the numerical solution algorithm of the optimisation problem by an Alternating Steepest Descent (ASD) can be found in [9].

4.1. Galerkin discretization in tensor train format

In the following, we assume an acessible tensor representations of the right-hand side $\hat{f}$ and the coefficient function ${\boldsymbol{A}}$ . To make this more precise, we denote the low-rank approximations of e.g. $\hat{f}$ in (7) by

[TABLE]

where $F$ admits a TT representation of rank $\mathbf{r}^{f}$ and $\Lambda_{f}$ is a tensor multi-index set with local dimension cap $\mathbf{d}^{f}=(d_{1}^{f},\ldots,d_{M}^{f})$ . Analogously, every component of the symmetric matrix coefficient

[TABLE]

is approximated by $a^{\mathrm{TT}}_{i,j}$ , $i,j\in\{1,\ldots,d\}$ as in (31) with TT-ranks $\mathbf{r}^{i,j}$ . Here, the order three component tensors in the TT-representation of the approximated matrix entry ${\boldsymbol{A}}^{\mathrm{TT}}|_{i,j}=a_{i,j}^{\mathrm{TT}}$ are denoted by $\{a_{i,j}^{m}\}_{m=0}^{M}$ .

{remark}

Since, for the coefficient, the TT reconstruction is carried out for every matrix entry in (32), the local dimensions $d^{i,j}=(d_{1}^{i,j},\ldots,d_{M}^{i,j})$ and tensor ranks can vary among those $d^{2}$ tensor trains. Here, we assume that every approximation has the same local dimensions and the tensor multi-index set covering those indices is denoted by $\Xi\subset{\mathcal{F}}$ , possibly different from (but larger than) the solution active set $\Lambda$ . As stated in [8], it is beneficial (and in fact necessary) to chose $\Xi$ such that for all $\mu\in\Lambda$ also $2\mu=(2\mu_{1},\ldots,2\mu_{M},0,\ldots)\in\Xi$ . Due to the orthogonality of the polynomial basis $\{P_{\nu}\}$ , this feature guarantees a well-posed discrete problem since additional approximations are avoided and enables quasi-optimal convergence rates of the Galerkin method.

On $\mathcal{V}_{N}$ , the Galerkin operator resulting from the transformed weak problem in TT format is given as the sum of $d^{2}$ TT operators such that for all $i,i^{\prime}=1,\ldots,N$ , and $\alpha,\alpha^{\prime}\in\Lambda$ ,

[TABLE]

each corresponding to one addend of the resulting matrix-vector product in (16).

In the following, we illustrate the explicit construction of the TT operator for the term $\mathbf{L}_{1}$ . By denoting $\nabla^{i}g$ the $i$ -th component of the gradient of a function $g$ , for the first low-rank approximated bilinear form addend one obtains

[TABLE]

Using the multi-linear structure of $a^{\mathrm{TT}}_{1,1}$ , one can write $\mathbf{L}_{1}$ as

[TABLE]

where the first component tensor $\mathbf{L}^{1}_{0}$ depends on the physical discretization in piecewise constant FE functions $\{\psi_{i}\}_{i=1}^{N_{0}}$ only, i.e.,

[TABLE]

The remaining tensor operator parts decompose into one dimensional integrals over triple products of orthogonal polynomials of the form

[TABLE]

The evaluation is known explicitly thanks to the recursion formula for orthogonal polynomials, cf. [1, 11].

{remark}

Due to the sum of TT operators in (33), the result can be represented by a tensor with TT-rank: $d^{2}\max\{\mathbf{r}^{i,j}\;|\;i,j\in\{1,\ldots,d\}\}$ .

With the TT approximations of the data $f_{\mathrm{TT}}\approx\hat{f}$ and ${\boldsymbol{a}}_{i,j}\approx a_{i,j}^{\mathrm{TT}}$ , we replace the original system of equations that have to be solved for $U\in\mathbb{R}^{N\times d_{1}\times\ldots\times d_{M}}$ , namely

[TABLE]

with a constrained minimization problem on the low-rank manifold $\mathcal{M}_{{\boldsymbol{r}}}$ containing all tensor trains of dimensionality represented by $\Lambda$ and fixed rank ${\boldsymbol{r}}$ ,

[TABLE]

Here, we take ${\boldsymbol{L}}^{\mathrm{TT}}$ and ${\boldsymbol{F}}^{\mathrm{TT}}$ as the TT approximations of ${\boldsymbol{L}}$ and ${\boldsymbol{F}}$ , respectively and $\lVert\cdot\rVert_{F}$ is the Frobenius norm.

To solve (39), we chose a preconditioned alternating least squares (ALS) algorithm as described in [22, 10]. This eventually results in an approximation of the Galerkin solution of (15)

[TABLE]

where $\tau$ is a place-holder for the inscribed parameters of the numerical algorithm and ${\boldsymbol{r}}$ is the desired and predefined TT-rank of $W^{\mathrm{TT}}$ .

5. Adaptive algorithm

The error estimator of Section 3.4 is formulated in a computable TT representation in Section 5.1. It gives rise to an adaptive algorithm, which refines the spatial discretization, the anisotropic stochastic polynomial set and the representation format iteratively based on local error estimators and indicators. This enables the assessment of the development of the actual (unknown) error $\lVert u-w_{N}\rVert_{\mathcal{A}}$ . The inherently high computational cost of the error estimators can be overcome by means of the tensor train formalism. In what follows, we examine the efficient computation of the individual error estimator components in the TT format and describe the marking and refinement procedure. For more details and a more general framework, we refer to the presentations in [5, 6, 10, 8].

5.1. Efficient computation of error estimators

We illustrate the efficient computation of the deterministic error estimator $\eta_{T}$ . For each element $T\in\mathcal{T}$ of the triangulation, the error estimator is given by (20). Due to the sum over $\Lambda$ it suffers from the curse of dimensionality. However, employing the low-rank approximation ${\boldsymbol{A}}^{\operatorname{TT}}\approx{\boldsymbol{A}}$ , $f^{\operatorname{TT}}\approx f$ and $w_{N}$ renders the computation feasible. To make this more explicit, recall that

[TABLE]

This is evaluated by expansion of the inner product,

[TABLE]

The first term is a simple inner product of a functional tensor train. It reduces to a simple summation over the tensor components due to the orthonormality of the polynomial basis, i.e.,

[TABLE]

whereas the high-dimensional sum can be evaluated for every tensor dimension in parallel using, for all $i,i^{\prime}=1,\ldots,N_{0}$ , that

[TABLE]

Note that the iterated sum over the tensor ranks has to be interpreted as matrix-vector multiplications. Hence, the formula above can be evaluated highly efficiently. In fact, if the employed TT format utilizes a component orthogonalization and $f^{\operatorname{TT}}$ is left-orthogonal, the product can be neglected and one only has to sum over $k_{0}$ and $k_{0}^{\prime}$ .

For the remaining terms in (42), one has to find a suitable representation of ${\boldsymbol{A}}^{\operatorname{TT}}\nabla_{\hat{x}}w_{N}$ . Since the gradient is a linear operator, one can calculate a tensor representation of this product explicitly, involving multiplied ranks and doubled polynomial degrees. For a detailed construction we refer to [8, Section 5]. The matrix-vector multiplication due to entry-wise TT representation of ${\boldsymbol{A}}^{\operatorname{TT}}$ does not impose any further difficulties but a slight increase in complexity since one needs to cope with a sum of individual parts. Eventually, the mixed and operator terms are computed in the same fashion, using the same arguments as for (45).

5.2. Fully adaptive algorithm

Given an initial configuration consisting of a regular mesh $\mathcal{T}$ , a finite active tensor multi-index set $\Lambda\subset\mathcal{F}$ , a (possibly random) start tensor $W^{\mathrm{TT}}$ with TT-rank $\mathbf{r}$ and solver parameter $\tau$ , consisting e.g. of a termination threshold, rounding parameter, iteration limit or precision arguments, we now present the adaptive refinement procedure summarized in Algorithm 11.

On every level, we generate an approximation of the data $f^{\mathrm{TT}}$ and ${\boldsymbol{A}}^{\mathrm{TT}}$ by a tensor reconstruction. The procedure is e.g. described in [9] and referred to as

[TABLE]

where the multi-index set $\Xi$ can be chosen arbitrarily, but it is advisable to consider Remark 4.1. The number of samples $N_{s}$ can be related e.g. to Monte Carlo samples or more structured quadrature techniques such as Quasi Monte Carlo and sparse grid points. In what follows we assume that the obtained approximations become sufficiently accurate.

The procedure for obtaining a numerical approximation $w_{N}\in\mathcal{V}_{N}$ is denoted by

[TABLE]

The used preconditioned ALS algorithm is only exemplary to obtain $w_{N}$ . Alternative alternating methods or Riemannian algorithms are feasible as well.

To obtain the overall estimator $\Theta(\eta,\zeta,\iota)$ , one has to evaluate the individual components by the following methods

[TABLE]

A weighted balancing of the global estimator values $\eta,\zeta$ and $\iota$ results in the marking and refinement decision.

5.2.1. Deterministic refinement

In case of a dominant deterministic error estimator $\eta$ , one employs a Dörfler marking strategy on the mesh $\mathcal{T}$ for a ratio constant $\theta_{\eta}(w_{N})$ . In abuse of notation, we use $(\eta_{T})_{T\in\mathcal{T}}$ as the local error estimator on every triangle, where the jump components of $(\eta_{S})_{S\in\mathcal{F}}$ are distributed among their nearby elements. The method, consisting of the marking process and the conforming refinement of the marked triangles is covered by

[TABLE]

5.2.2. Stochastic refinement

In case of a dominant stochastic error estimator $\zeta(w_{N})$ , we apply a Dörfler marking on the set of local estimators $(\zeta_{m})_{m\in\mathbb{N}}$ until the prescribed ratio $0<\theta_{\zeta}<1$ is reached. The marked dimensions in $\Lambda$ are increased $d_{m}\leftarrow d_{m}+1$ by the method

[TABLE]

{remark}

As stated in Section 3.4, the global estimator $\zeta$ is not just the sum of the individual estimators $(\zeta_{m})_{m\in\mathbb{N}}$ since the coupling structure is more involved. Hence, we use $\zeta_{\mathrm{sum}}\mathrel{\mathrel{\mathop{:}}=}\sum_{m\in\mathbb{N}}\zeta_{m}$ in the marking procedure. Due to the high regularity of the solution (Theorem 2), for $\Lambda$ large enough, one has $\zeta_{\mathrm{sum}}\approx\zeta$ . Note that in the finite dimensional noise case, we have $\zeta_{m}=0$ for $m>M$ .

5.2.3. Representation refinement

In the end, if $\iota$ has the largest contribution in the error estimator we improve the accuracy of the iterative solver. For simplicity, we fix most of the solver parameter such as the number of alternating iteration or the termination value to low values that can be seen as overcautious. Nevertheless, in the low-rank tensor framework, the model class is restricted by the TT-rank $\mathbf{r}$ . Hence, we then allow $\mathbf{r}\leftarrow\mathbf{r}+{\boldsymbol{1}}$ and add a random rank 1 tensor onto the solution tensor $W^{\mathrm{TT}}$ . We summarize this approach in

[TABLE]

5.2.4. Adaptive algorithm

One global iteration of this algorithm refines either the deterministic mesh $\mathcal{T}$ , the active stochastic polynomial index-set $\Lambda$ or the tensor rank $\mathbf{r}$ . Iteration until the defined estimator $\Theta(\eta,\zeta,\iota)$ in Corollary 3.4 falls below a desired accuracy $\epsilon>0$ yields the adaptively constructed low-rank approximation $w_{N}\in\mathcal{V}_{N}$ .

{algorithm}

0: Initial guess $w_{N}$ with solution coefficient $W^{\mathrm{TT}}$ ; solver accuracy $\tau$ ; mesh $\mathcal{T}$ with degrees $p$ ; active index set $\Lambda$ ;sample size $N_{s}$ for reconstruction; Dörfler marking parameters $\theta_{\eta}$ and $\theta_{\zeta}$ ; desired estimator $\Theta$ accuracy $\epsilon$ .

0: New solution $w_{N}$ with new solution coefficient $W^{+}$ ; new mesh $\mathcal{T}^{+}$ , or new index set $\Lambda^{+}$ , or new tolerance $\tau^{+}$ .

repeat

[TABLE]

if $\max\{\eta,\zeta,\iota\}==\eta$ then

$\mathcal{T}\qquad\;\,\leftarrow\operatorname{Refine}_{x}[(\eta_{T})_{T\in\mathcal{T}},\eta,\mathcal{T},\theta_{\eta}]$

else if $\max\{\eta,\zeta,\iota\}==\zeta$ then

$\Lambda\qquad\;\;\leftarrow\operatorname{Refine}_{{\boldsymbol{y}}}[(\zeta_{m})_{m\in\mathbb{N}},\zeta,\Lambda,\theta_{\zeta}]$

else

$W^{\mathrm{TT}},\tau\leftarrow\operatorname{Refine}_{\mathrm{LS}}[W^{\mathrm{TT}},\tau]$

end if

until $\Theta(\eta,\zeta,\iota)<\epsilon$

return $w_{N}^{+}=w_{N};\;\mathcal{T}^{+}=\mathcal{T};\;\Lambda^{+}=\Lambda;\;\tau^{+}=\tau$ Reconstruction based adaptive stochastic Galerkin method

6. Numerical examples

This section is concerned with the demonstration of the performance of the described Galerkin tensor discretisation and the adaptive algorithm depicted in the preceding section. We consider the linear second order model problem with a constant right-hand side and homogeneous Dirichlet boundary conditions

[TABLE]

on two different reference domains in $\mathbb{R}^{2}$ , namely the unit circle and the L-shape. The Karhunen-Loève expansion of the random vector field stems from a Gaussian covariance kernel of the form

[TABLE]

The random variables in the Karhunen-Loève expansion are assumed to be independent and uniformly distributed on $[-\sqrt{3},\sqrt{3}]$ , i.e. they have normalized variance. Moreover, the mean is given by the identity, i.e. $\mathbb{E}[{{\boldsymbol{V}}}](\hat{x})=\hat{x}$ .

The computed spectral decomposition is truncated at a given threshold $\hat{\epsilon}$ , which takes different values in the computational examples. Table 1 summarizes how the choice of the truncation parameter affects the number of involved stochastic dimensions.

We are interested in the correct approximation of the solution mean

[TABLE]

and solution variance

[TABLE]

by means of the adaptive low-rank Galerkin approximation. In order to verify this, all experiments involve the computation of a reference mean and variance, based on a sampling approach. To that end, we employ the anisotropic sparse grid quadrature with Gauss-Legendre points111The implementation can be found online: https://github.com/muchip/SPQR, as described in [17]. The corresponding moments are then calculated on a fine reference mesh, resulting from uniform refinement of the last, adaptively computed, mesh $\mathcal{T}$ , having at least $10^{5}$ degrees of freedom. All experiments involve linear finite element spaces, i.e. $\mathcal{X}_{p}(\mathcal{T})$ with $p=1$ . The number of quadrature points is chosen differently for the problems at hand. For $\hat{\epsilon}=0.7$ we take $53$ adaptively chosen nodes. Benchmarking the resulting mean from the sparse quadrature for this choice of samples against an approximation with additional nodes does not significantly improve the approximation quality (data not shown). The same arguments apply for $\hat{\epsilon}=0.5$ and $301$ nodes, as well as for $\hat{\epsilon}=0.1$ and $4217$ nodes. We denote the reference mean as $\mathbb{E}_{\operatorname{ref}}[u]$ and the reference variance as $\mathbb{V}_{\operatorname{ref}}(u)$ .

{remark}

In the low-rank tensor train format, the mean of a function, given in orthonormal polynomials, is computed highly efficiently, since the set of employed polynomials is orthonormal with respect to the constant function. Since, the corresponding coefficient is already incorporated in the representation, computing the mean is a simple tensor evaluation. More precisely, given $u\in\mathcal{V}_{N}$ we compute

[TABLE]

Here, the evaluation of the tensor train $U^{\mathrm{TT}}$ at the multi-index ${\boldsymbol{0}}=(0,\ldots,0)$ consists of $M$ matrix-vector multiplications.

Similarly for the variance, we can compute the second moment as

[TABLE]

This computation reduces even further, since in the tensor train setting of $U(i,\mu)=U_{0}(i)\otimes\bigotimes_{m=1}^{M}U_{m}(\mu_{m})$ , with $U_{0}\in\mathbb{R}^{N,r_{0}}$ and $U_{m}\in\mathbb{R}^{r_{m-1},d_{m},r_{m}}$ for $m=1,\ldots,M$ it is common to impose left-orthogonality, i.e. $\sum_{\mu_{m}=1}^{d_{m}}U_{m}(\mu_{m})U_{m}(\mu_{m})^{T}=I_{r_{m-1}}$ . Hence, the second moment reads

[TABLE]

where it is advisable to not evaluate the matrix-matrix product over $k_{0}=1,\ldots,r_{0}$ , since the resulting $\mathcal{O}(N^{2})$ matrix is usually not sparse and exhibits available memory resources.

The considered quantity of interest is the error of the mean and variance to the corresponding reference. Therefore, for any TT approximation $w_{N}\in\mathcal{V}_{N}$ using remark 6, we compute the relative error of the mean in $H_{0}^{1}(D_{\operatorname{ref}})$ norm

[TABLE]

and the relative error of the variance in $W^{1,1}(D_{\operatorname{ref}})$ norm

[TABLE]

Furthermore, we evaluate the rate of convergence of $e_{E}$ for the adaptive and uniform refinement, as well as the rate of the estimator $\Theta$ . This is done by fitting the function $x\mapsto c_{1}e^{-\alpha}$ to the individual values, with respect to the number of degrees of freedom in the physical mesh. We denote the corresponding rates $\alpha_{E}^{a}$ , $\alpha_{E}^{u}$ and $\alpha_{\Theta}$ respectively.

The employed tensor reconstruction algorithm is implemented in the open-source library xerus [30]. Every such approximated tensor is constructed on a set of $N_{s}$ samples $\{y^{(i)}\in\Gamma\}_{i=1\ldots N_{s}}$ as in the computation of the reference mean and polynomial degrees that are determined by the solution approximation as described in remark 4.1. In the considered examples, the tensor train solution employs constant and linear polynomials in all dimensions only. This behaviour is based on the stochastic estimator and reasoned in the complexity of the mesh approximation. For the assembling of the physical part of the bilinear and linear form and the evaluation of sample solutions we make use of the PDE library FEniCS [12]. The entire adaptive stochastic Galerkin method is implemented in the open source framework ALEA [7].

6.1. Example 1

The first example is the random domain problem on the unit circle. We use this problem as a reference, since the adaptive refinement is expected to yield similar results to uniform mesh refinement. Starting with an initial configuration of $16$ cells, fixed polynomial degree in the stochastic space of $d_{1}=\ldots=d_{M}=2$ and tensor rank $\mathbf{r}={\boldsymbol{2}}$ , the described adaptive Galerkin FE algorithm yields the adaptively refined mesh depicted in Figure 2.

For illustration purposes, we show the mean and variance of the solution on the unit disc together with realisations of the transformed reference domain for the adapted discretisation in Figures 4 and 5. In Table 3 we show the corresponding rates of convergence and the minimal reached error $e_{E}$ and $e_{V}$ . The maximal involved tensor ranks are displayed in column $r_{\operatorname{max}}$ . The degrees of freedom are shown for the physical mesh in column m-dofs and for the tensor train itself in column tt-dofs. Note that the tensor train degrees of freedom refer to the dimension of the corresponding low-rank manifold. As expected for the unit circle, the adaptive refinement does not improve the already optimal convergence rate.

6.2. Example 2

For the second example, we chose the L-shaped domain $[-1,1]^{2}\setminus\{[0,1]\times[-1,0]\}$ . The corner singularity is a typical example where adaptive refinement yields better approximation rates with respect to degrees of freedom than a uniform refinement.

Starting with an initial configuration of $24$ cells, fixed polynomial degree in the stochastic space of $d_{1}=\ldots=d_{M}=2$ and tensor rank $\mathbf{r}={\boldsymbol{2}}$ , the described adaptive Galerkin FE algorithm yields the adaptively refined mesh displayed in Figure 7. Again, we show the approximated mean and variance together with some random realizations of the transformed reference domain in Figures 6 and 8.

The obtained rates of convergence and error quantities are shown in Table 9. Fortunately, the obtained rate for the estimator $\alpha_{\Theta}$ follows the error rate $\alpha_{E}^{a}$ , in contrast to the uniform refinement strategy, where a slightly slower convergence is achieved.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables . Applied mathematics series. Dover Publications, N. Chemsford, MA, 1964.
2[2] J. E. Castrillon-Candas, F. Nobile, and R. F. Tempone. Analytic regularity and collocation approximation for elliptic PD Es with random domain deformations. Computers & Mathematics with Applications , 71(6):1173–1197, 2016.
3[3] Sergey Dolgov, Boris N Khoromskij, Alexander Litvinenko, and Hermann G Matthies. Polynomial chaos expansion of random coefficients and the solution of stochastic partial differential equations in the tensor train format. SIAM/ASA Journal on Uncertainty Quantification , 3(1):1109–1135, 2015.
4[4] Sergey Dolgov and Robert Scheichl. A hybrid alternating least squares–tt cross algorithm for parametric pdes. ar Xiv preprint ar Xiv:1707.04562 , 2017.
5[5] Martin Eigel, Claude Jeffrey Gittelson, Christoph Schwab, and Elmar Zander. Adaptive stochastic Galerkin FEM. Comput. Methods Appl. Mech. Engrg. , 270:247–269, 2014.
6[6] Martin Eigel, Claude Jeffrey Gittelson, Christoph Schwab, and Elmar Zander. A convergent adaptive stochastic Galerkin finite element method with quasi-optimal spatial meshes. ESAIM: Mathematical Modelling and Numerical Analysis , 49(5):1367–1398, 2015.
7[7] Martin Eigel, Robert Gruhlke, Manuel Marschall, Philipp Trunschke, and Elmar Zander. Alea - a python framework for spectral methods and low-rank approximations in uncertainty quantification.
8[8] Martin Eigel, Manuel Marschall, Max Pfeffer, and Reinhold Schneider. Adaptive stochastic galerkin fem for lognormal coefficients in hierarchical tensor representations. ar Xiv preprint ar Xiv:1811.00319 , 2018.