A Low-rank Solver for the Stochastic Unsteady Navier-Stokes Problem

Howard C. Elman; Tengfei Su

arXiv:1906.06785·math.NA·April 22, 2020

A Low-rank Solver for the Stochastic Unsteady Navier-Stokes Problem

Howard C. Elman, Tengfei Su

PDF

TL;DR

This paper introduces a low-rank iterative solver for stochastic unsteady Navier-Stokes equations, significantly reducing computational costs by using tensor representations and effective preconditioning.

Contribution

It develops a novel low-rank Krylov subspace method with mean-based preconditioning for efficient all-at-once stochastic Navier-Stokes simulations.

Findings

01

Achieves significant reductions in storage and computational costs.

02

Requires only a small number of linear iterations per Picard step.

03

Demonstrates efficiency on a 2D flow model.

Abstract

We study a low-rank iterative solver for the unsteady Navier-Stokes equations for incompressible flows with a stochastic viscosity. The equations are discretized using the stochastic Galerkin method, and we consider an all-at-once formulation where the algebraic systems at all the time steps are collected and solved simultaneously. The problem is linearized with Picard's method. To efficiently solve the linear systems at each step, we use low-rank tensor representations within the Krylov subspace method, which leads to significant reductions in storage requirements and computational costs. Combined with effective mean-based preconditioners and the idea of inexact solve, we show that only a small number of linear iterations are needed at each Picard step. The proposed algorithm is tested with a model of flow in a two-dimensional symmetric step domain with different settings to…

Figures10

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 6.1: Parameter values for numerical experiments.

$ν_{0}$	$σ$	$b$	$m$	$d_{ψ}$	$t_{f}$	$τ$	$h$
$1 / 50$	0.01	4.0	3	3	1.0	$2^{- 6}$	$2^{- 2}$

Table 2. Table 6.2: Stopping and truncation tolerances.

GMRES stopping tolerance	$t o l_{gmres} = 10^{- 1}$
Picard stopping tolerance	$t o l_{picard} = 10^{- 5}$
GMRES truncation tolerance	$ϵ_{gmres} = 10^{- 3}$
Truncation tolerance for solutions	$ϵ_{soln} = 10^{- 7}$
Truncation tolerance for convection matrix	$ϵ_{conv} = 10^{- 3}$

Table 3. Table 6.3: Performance of Picard’s method with different values of GMRES stopping tolerance t o l gmres 𝑡 𝑜 subscript 𝑙 gmres tol_{\text{gmres}} . Truncation tolerance ϵ gmres = 10 − 2 ∗ t o l gmres subscript italic-ϵ gmres superscript 10 2 𝑡 𝑜 subscript 𝑙 gmres \epsilon_{\text{gmres}}=10^{-2}*tol_{\text{gmres}} . t o l picard = 10 − 5 𝑡 𝑜 subscript 𝑙 picard superscript 10 5 tol_{\text{picard}}=10^{-5} . LSC preconditioner is used.

$t o l_{gmres}$	$10^{- 1}$	$10^{- 3}$	$10^{- 5}$
$ϵ_{gmres}$	$10^{- 3}$	$10^{- 5}$	$10^{- 7}$
Number of Picard steps	5	5	5
Total number of GMRES iterations	18	39	58
Computational time (s)	$205.9$	$555.7$	$1168.2$

Equations129

\frac{\partial u}{\partial t} - \nabla \cdot (ν \nabla u) + u \cdot \nabla u + \nabla p

\frac{\partial u}{\partial t} - \nabla \cdot (ν \nabla u) + u \cdot \nabla u + \nabla p

\nabla \cdot u

u

u

ν \nabla u \cdot n - p n

ν (x, ξ) = ν_{0} (x) + l = 1 \sum m ν_{l} (x) ξ_{l},

ν (x, ξ) = ν_{0} (x) + l = 1 \sum m ν_{l} (x) ξ_{l},

\frac{u ^{k} - u ^{k - 1}}{τ} - \nabla \cdot (ν \nabla u^{k}) + u^{k} \cdot \nabla u^{k} + \nabla p^{k}

\frac{u ^{k} - u ^{k - 1}}{τ} - \nabla \cdot (ν \nabla u^{k}) + u^{k} \cdot \nabla u^{k} + \nabla p^{k}

\nabla \cdot u^{k}

(H^{1} (D))^{2} \otimes L^{2} (Γ)

(H^{1} (D))^{2} \otimes L^{2} (Γ)

L^{2} (D) \otimes L^{2} (Γ)

τ^{- 1} ⟨(u_{h}^{k}, v_{h})⟩ - τ^{- 1} ⟨(u_{h}^{k - 1}, v_{h})⟩ + ⟨(ν \nabla u_{h}^{k}, \nabla v_{h})⟩

τ^{- 1} ⟨(u_{h}^{k}, v_{h})⟩ - τ^{- 1} ⟨(u_{h}^{k - 1}, v_{h})⟩ + ⟨(ν \nabla u_{h}^{k}, \nabla v_{h})⟩

+ ⟨(u_{h}^{k} \cdot \nabla u_{h}^{k}, v_{h})⟩ - ⟨(p_{h}^{k}, \nabla \cdot v_{h})⟩

⟨(\nabla \cdot u_{h}^{k}, q_{h})⟩

u_{h}^{k} (x, ξ)

u_{h}^{k} (x, ξ)

p_{h}^{k} (x, ξ)

(F^{k} (u) I_{n_{ξ}} \otimes B I_{n_{ξ}} \otimes B^{T} 0) (u^{k} p^{k}) + (- τ^{- 1} (I_{n_{ξ}} \otimes M) 0 00) (u^{k - 1} p^{k - 1}) = (f^{u, k} f^{p, k})

(F^{k} (u) I_{n_{ξ}} \otimes B I_{n_{ξ}} \otimes B^{T} 0) (u^{k} p^{k}) + (- τ^{- 1} (I_{n_{ξ}} \otimes M) 0 00) (u^{k - 1} p^{k - 1}) = (f^{u, k} f^{p, k})

F^{k} (u) = τ^{- 1} (I_{n_{ξ}} \otimes M) + l = 0 \sum m (G_{l} \otimes A_{l}) + l = 1 \sum n_{ξ} (H_{l} \otimes N (u_{h, l}^{k})) .

F^{k} (u) = τ^{- 1} (I_{n_{ξ}} \otimes M) + l = 0 \sum m (G_{l} \otimes A_{l}) + l = 1 \sum n_{ξ} (H_{l} \otimes N (u_{h, l}^{k})) .

[M]_{ij} = (ϕ_{j}, ϕ_{i}), [A_{l}]_{ij} = (ν_{l} \nabla ϕ_{j}, \nabla ϕ_{i}), [N (u_{h, l}^{k})]_{ij} = (u_{h, l}^{k} \cdot \nabla ϕ_{j}, ϕ_{i}),

[M]_{ij} = (ϕ_{j}, ϕ_{i}), [A_{l}]_{ij} = (ν_{l} \nabla ϕ_{j}, \nabla ϕ_{i}), [N (u_{h, l}^{k})]_{ij} = (u_{h, l}^{k} \cdot \nabla ϕ_{j}, ϕ_{i}),

[B_{x_{1}}]_{ij} = - (φ_{i}, \frac{\partial ϕ _{j}}{\partial x _{1}}), [B_{x_{2}}]_{ij} = - (φ_{i}, \frac{\partial ϕ _{j}}{\partial x _{2}}),

[B_{x_{1}}]_{ij} = - (φ_{i}, \frac{\partial ϕ _{j}}{\partial x _{1}}), [B_{x_{2}}]_{ij} = - (φ_{i}, \frac{\partial ϕ _{j}}{\partial x _{2}}),

[G_{l}]_{r s} = ⟨ ξ_{l} ψ_{r} ψ_{s} ⟩, [H_{l}]_{r s} = ⟨ ψ_{l} ψ_{r} ψ_{s} ⟩,

[G_{l}]_{r s} = ⟨ ξ_{l} ψ_{r} ψ_{s} ⟩, [H_{l}]_{r s} = ⟨ ψ_{l} ψ_{r} ψ_{s} ⟩,

u = u^{1} u^{2} ⋮ u^{n_{t}} \in R^{n_{t} n_{ξ} n_{u}}

u = u^{1} u^{2} ⋮ u^{n_{t}} \in R^{n_{t} n_{ξ} n_{u}}

(F (u) + C B B^{T} 0) (u p) = (f^{u} f^{p}),

(F (u) + C B B^{T} 0) (u p) = (f^{u} f^{p}),

F (u) = τ^{- 1} I_{n_{t}} \otimes I_{n_{ξ}} \otimes M + l = 0 \sum m (I_{n_{t}} \otimes G_{l} \otimes A_{l}) + N (u) .

F (u) = τ^{- 1} I_{n_{t}} \otimes I_{n_{ξ}} \otimes M + l = 0 \sum m (I_{n_{t}} \otimes G_{l} \otimes A_{l}) + N (u) .

(F (u^{(i - 1)}) + C B B^{T} 0) (u^{(i)} p^{(i)}) = (f^{u} f^{p}) .

(F (u^{(i - 1)}) + C B B^{T} 0) (u^{(i)} p^{(i)}) = (f^{u} f^{p}) .

(F (u^{(i - 1)}) + C B B^{T} 0) (δ u^{(i)} δ p^{(i)}) = (r^{u, (i - 1)} r^{p, (i - 1)}),

(F (u^{(i - 1)}) + C B B^{T} 0) (δ u^{(i)} δ p^{(i)}) = (r^{u, (i - 1)} r^{p, (i - 1)}),

r^{(i)} = (r^{u, (i)} r^{p, (i)}) = (f^{u} f^{p}) - (F (u^{(i)}) + C B B^{T} 0) (u^{(i)} p^{(i)}) .

r^{(i)} = (r^{u, (i)} r^{p, (i)}) = (f^{u} f^{p}) - (F (u^{(i)}) + C B B^{T} 0) (u^{(i)} p^{(i)}) .

u = vec (\underline{u}) \Leftrightarrow u (\overline{i_{1} i_{2} i_{3}}) = \underline{u} (i_{1}, i_{2}, i_{3})

u = vec (\underline{u}) \Leftrightarrow u (\overline{i_{1} i_{2} i_{3}}) = \underline{u} (i_{1}, i_{2}, i_{3})

\underline{z} (i_{1}, i_{2}, i_{3}) \approx α_{1}, α_{2} \sum \underline{z}^{(1)} (i_{1}, α_{1}) \underline{z}^{(2)} (α_{1}, i_{2}, α_{2}) \underline{z}^{(3)} (α_{2}, i_{3}),

\underline{z} (i_{1}, i_{2}, i_{3}) \approx α_{1}, α_{2} \sum \underline{z}^{(1)} (i_{1}, α_{1}) \underline{z}^{(2)} (α_{1}, i_{2}, α_{2}) \underline{z}^{(3)} (α_{2}, i_{3}),

z = vec (\underline{z}) = α_{1}, α_{2} \sum z_{α_{1}}^{(1)} \otimes z_{α_{1}, α_{2}}^{(2)} \otimes z_{α_{2}}^{(3)},

z = vec (\underline{z}) = α_{1}, α_{2} \sum z_{α_{1}}^{(1)} \otimes z_{α_{1}, α_{2}}^{(2)} \otimes z_{α_{2}}^{(3)},

X z = α_{1}, α_{2} \sum (X^{(1)} z_{α_{1}}^{(1)}) \otimes (X^{(2)} z_{α_{1}, α_{2}}^{(2)}) \otimes (X^{(3)} z_{α_{2}}^{(3)}) .

X z = α_{1}, α_{2} \sum (X^{(1)} z_{α_{1}}^{(1)}) \otimes (X^{(2)} z_{α_{1}, α_{2}}^{(2)}) \otimes (X^{(3)} z_{α_{2}}^{(3)}) .

\tilde{\underline{z}} = T_{ϵ} (\underline{z}),

\tilde{\underline{z}} = T_{ϵ} (\underline{z}),

∥ \tilde{\underline{z}} - \underline{z} ∥_{F} /∥ \underline{z} ∥_{F} \leq ϵ .

∥ \tilde{\underline{z}} - \underline{z} ∥_{F} /∥ \underline{z} ∥_{F} \leq ϵ .

\|\underline{z}-\tilde{\underline{z}}\|_{F}\leq\Big{(}\sum_{k=1}^{d-1}\delta_{k}^{2}\Big{)}^{1/2}.

\|\underline{z}-\tilde{\underline{z}}\|_{F}\leq\Big{(}\sum_{k=1}^{d-1}\delta_{k}^{2}\Big{)}^{1/2}.

u^{(i)} = T_{ϵ_{soln}} (u^{(i - 1)} + δ u^{(i)}), p^{(i)} = T_{ϵ_{soln}} (p^{(i - 1)} + δ p^{(i)}) .

u^{(i)} = T_{ϵ_{soln}} (u^{(i - 1)} + δ u^{(i)}), p^{(i)} = T_{ϵ_{soln}} (p^{(i - 1)} + δ p^{(i)}) .

u_{j l}^{k} = \underline{u} (k, l, j) = α_{1}, α_{2} \sum \underline{u}^{(1)} (k, α_{1}) \underline{u}^{(2)} (α_{1}, l, α_{2}) \underline{u}^{(3)} (α_{2}, j) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newsiamremark

remarkRemark

\headersLow-rank Solver for Stochastic Unsteady Navier–Stokes Problem H. C. Elman and T. Su

A Low-rank Solver for the Stochastic Unsteady Navier–Stokes Problem††thanks: This work was supported by the U.S. Department of Energy Office of Advanced Scientific Computing Research, Applied Mathematics program under award DE-SC0009301 and by the U.S. National Science Foundation under grant DMS1819115.

Howard C. Elman Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742 (). [email protected]

Tengfei Su Applied Mathematics & Statistics, and Scientific Computation Program, University of Maryland, College Park, MD 20742 (). [email protected]

Abstract

We study a low-rank iterative solver for the unsteady Navier–Stokes equations for incompressible flows with a stochastic viscosity. The equations are discretized using the stochastic Galerkin method, and we consider an all-at-once formulation where the algebraic systems at all the time steps are collected and solved simultaneously. The problem is linearized with Picard’s method. To efficiently solve the linear systems at each step, we use low-rank tensor representations within the Krylov subspace method, which leads to significant reductions in storage requirements and computational costs. Combined with effective mean-based preconditioners and the idea of inexact solve, we show that only a small number of linear iterations are needed at each Picard step. The proposed algorithm is tested with a model of flow in a two-dimensional symmetric step domain with different settings to demonstrate the computational efficiency.

keywords:

time-dependent Navier–Stokes, stochastic Galerkin method, all-at-once system, low-rank tensor approximation

{AMS}

35R60, 60H35, 65F08, 65F10, 65N22

1 Introduction

Stochastic partial differential equations (PDEs) are widely used to model physical problems with uncertainty [16]. In this paper, we develop some new computational methods for solving the stochastic unsteady Navier–Stokes equations, using stochastic Galerkin methods [11] to address the stochastic nature of the problem and so-called all-at-once treatment of time integration.

For a time-dependent problem, the solutions at different time steps are usually computed in a sequential manner via time stepping. For example, a fully-implicit scheme with adaptive time step sizes was studied in [7, 14]. On the other hand, an all-at-once system can be formed by collecting the algebraic systems at all the discrete time steps into a single one, and the solutions are computed simultaneously. Such a formulation avoids the serial nature of time stepping, and allows parallelization in the time direction for accelerating the solution procedure [10, 18, 19]. A drawback, however, is that for large-size problems, the all-at-once system may require excessive storage. In this study, we address this issue by using a low-rank tensor representation of data within the solution methods.

We develop a low-rank iterative algorithm for solving the unsteady Navier–Stokes equations with an uncertain viscosity. The equations are linearized with Picard’s method. At each step of the nonlinear iteration, the stochastic Galerkin discretization gives rise to a large linear system, which is solved by a Krylov subspace method. Similar approaches have been used to study the steady-state problem [23, 27], where the authors also proposed effective preconditioners by taking advantage of the special structures of the linear systems. To reduce memory and computational costs, we compute low-rank approximations to the discrete solutions, which are represented as three-dimensional tensors in the all-at-once formulation. We refer to [12] for a review of low-rank tensor approximation techniques, and we will use the tensor train decomposition [21] in this work. The tensor train decomposition allows efficient basic operations on tensors. A truncation procedure is also available to compress low-rank tensors in the tensor train format to ones with smaller ranks.

Our goal is to use the low-rank tensors within Krylov subspace methods, in order to efficiently solve the large linear systems arising in each nonlinear step. The basic idea is to represent all the vector quantities that arise during the course of a Krylov subspace computation as low-rank tensors. With this strategy, much less memory is needed to store the data produced during the iteration. Moreover, the associated computations, such as matrix-vector products and vector additions, become much cheaper. The tensors are compressed in each iteration to maintain low ranks. This idea has been used for the conjugate gradient (CG) method and the generalized minimal residual (GMRES) method, with different low-rank tensor formats [1, 2, 5, 15]. In addition, the convergence of Krylov subspace methods can be greatly improved by an effective preconditioner. In conjunction with the savings achieved through low-rank tensor computations, we will derive preconditioners for the stochastic all-at-once formulation based on some state-of-the-art techniques used for deterministic problems, and we will demonstrate their performances in numerical experiments. We also explore the idea of inexact Picard methods where the linear systems are solved inexactly at each Picard step to further save computational work, and we show that with this strategy very small numbers of iterations are needed for the Krylov subspace method.

We note that a different type of approach, the alternating iterative methods [6, 13, 25], including the density matrix renormalization group (DMRG) algorithm and its variants, can be used for solving linear systems in the tensor train format. In these methods, each component of the low-rank solution tensor is approached directly and optimized by projecting to a small local problem. This approach avoids the rank growth in intermediate iterates typically encountered in a low-rank Krylov subspace method. However, these methods are developed for solving symmetric positive definite systems and require nontrivial effort to be adapted for a nonsymmetric Navier–Stokes problem.

The rest of the paper is organized as follows. In Section 2 we give a formal presentation of the problem. Discretization techniques that result in an all-at-once linear system at each Picard step are discussed in Section 3. In Section 4 we introduce the low-rank tensor approximation and propose a low-rank Krylov subspace iterative solver for the all-at-once systems. The preconditioners are derived in Section 5 and numerical results are given in Section 6.

2 Problem setting

Consider the unsteady Navier–Stokes equations for incompressible flows on a space-time domain $\mathcal{D}\times(0,t_{f}]$ ,

[TABLE]

where $\vec{u}$ and $p$ stand for the velocity and pressure, respectively, $\nu$ is the viscosity, and $\mathcal{D}$ is a two-dimensional spatial domain with boundary $\partial\mathcal{D}=\partial\mathcal{D}_{\text{D}}\cup\partial\mathcal{D}_{\text{N}}$ . The Dirichlet boundary $\partial\mathcal{D}_{\text{D}}$ consists of an inflow boundary and fixed walls, and Neumann boundary conditions are set for the outflow,

[TABLE]

We assume the Neumann boundary $\partial\mathcal{D}_{\text{N}}$ is not empty so that the pressure $p$ is uniquely determined. The function $\vec{u}_{\text{D}}(x,t)$ denotes a time-dependent inflow, typically growing from zero to a steady state, and it is set to zero at fixed walls. The initial conditions are zero everywhere for both $\vec{u}$ and $p$ .

The uncertainty in the problem is introduced by a stochastic viscosity $\nu$ , which is modeled as a random field depending on a finite collection of random variables $\{\xi_{l}\}_{l=1}^{m}$ (or written as a vector $\xi$ ). Specifically, we consider a representation as a truncated Karhunen–Loève (KL, [17]) expansion,

[TABLE]

where $\nu_{0}$ is the mean viscosity, and $\{\nu_{l}\}_{l=1}^{m}$ are determined by the covariance function of $\nu$ . We assume that the random parameters $\{\xi_{l}\}_{l=1}^{m}$ are independent and that the viscosity satisfies $\nu(x,\xi)\geq\nu_{\text{min}}>0$ almost surely for any $x\in\mathcal{D}$ . We refer to [23, 27] for different forms of the stochastic viscosity. The solutions $\vec{u}$ and $p$ in Eq. 2.1 will also be random fields which depend on the space parameter $x$ , time $t$ , and the random variables $\xi$ .

3 Discrete problem

In this section, we derive a fully discrete problem for the stochastic unsteady Navier–Stokes equations Eq. 2.1. This involves a time discretization scheme and a stochastic Galerkin discretization for the physical and parameter spaces at each time step. The discretizations give rise to a nonlinear algebraic system. Instead of solving such a system at each time step, we collect the systems from all time steps to form an all-at-once system, where the discrete solutions at all the time steps are solved simultaneously. The discrete problem is then linearized with Picard’s method, and a large linear system is solved at each step of the nonlinear iteration.

3.1 Time discretization

For simplicity we use the backward Euler method for time discretization, which is first-order accurate but unconditionally stable and dissipative. The all-at-once formulation discussed later in section 3.3 requires predetermined time steps. Divide the interval $(0,t_{f}]$ into $n_{t}$ uniform steps $\{t_{k}\}_{k=1}^{n_{t}}$ with step size $\tau=t_{f}/n_{t}$ and initial time $t_{0}=0$ . Given the solution at time $t_{k-1}$ , we need to solve the following equations for $\vec{u}^{k}$ and $p^{k}$ :

[TABLE]

After discretization (in physical space and parameter space) the implicit method requires solving an algebraic system at each time step. In the following we discuss how the system is assembled from the stochastic Galerkin discretization of Eq. 3.1.

3.2 Stochastic Galerkin method

At time step $k$ , the stochastic Galerkin method finds parametrized approximate velocity solutions $\vec{u}_{h}^{k}$ and pressure solutions $p_{h}^{k}$ in finite-dimensional subspaces of $(H^{1}(\mathcal{D}))^{2}\otimes L^{2}(\Gamma)$ and $L^{2}(\mathcal{D})\otimes L^{2}(\Gamma)$ , where $\Gamma$ is the joint image of the random variables $\{\xi_{l}\}$ . The functional spaces are defined as follows,

[TABLE]

The expectations are taken with respect to the joint distribution of the random variables $\{\xi_{l}\}$ . In the following we use $\langle\cdot\rangle$ to denote the expected value. Let the finite-dimensional subspaces be $\mathcal{X}=\text{span}\{\vec{\phi}_{i}(x)\}\subset(H^{1}(\mathcal{D}))^{2}$ , $\mathcal{Y}=\text{span}\{\varphi_{i}(x)\}\subset L^{2}(\mathcal{D})$ , and $\mathcal{Z}=\text{span}\{\psi_{r}(\xi)\}\subset L^{2}(\Gamma)$ . Let $\mathcal{X}_{\text{D}}^{k}$ and $\mathcal{X}_{0}$ be the spaces of functions in $\mathcal{X}$ with Dirichlet boundary conditions $\vec{u}_{\text{D}}(x,t_{k})$ and $\vec{0}$ imposed for the velocity field, respectively. Then for Eq. 3.1 the stochastic Galerkin formulation entails the computation of $\vec{u}_{h}^{k}\in\mathcal{X}_{\text{D}}^{k}\otimes\mathcal{Z}$ and $p_{h}^{k}\in\mathcal{Y}\otimes\mathcal{Z}$ , satisfying the weak form

[TABLE]

for any $\vec{v}_{h}\in\mathcal{X}_{0}\otimes\mathcal{Z}$ and $q_{h}\in\mathcal{Y}\otimes\mathcal{Z}$ . Here, $(\cdot,\cdot)$ denotes the inner product in ${L}^{2}(\mathcal{D})$ . For the physical spaces, we use a div-stable Taylor–Hood discretization [8] on quadrilateral elements, with biquadratic basis functions $\{\vec{\phi}_{i}\}_{i=1}^{n_{u}}=\left\{\left(\begin{smallmatrix}\phi_{i}\\ 0\end{smallmatrix}\right),\left(\begin{smallmatrix}0\\ \phi_{i}\end{smallmatrix}\right)\right\}_{i=1}^{n_{u}/2}$ for velocity, and bilinear basis functions $\{\varphi_{i}\}_{i=1}^{n_{p}}$ for pressure. The stochastic basis functions $\{\psi_{r}\}_{r=1}^{n_{\xi}}$ are $m$ -dimensional orthonormal polynomials constructed from generalized polynomial chaos (gPC, [28]) satisfying $\langle\psi_{r}\psi_{s}\rangle=\delta_{rs}$ . The stochastic Galerkin solutions are expressed as linear combinations of the basis functions,

[TABLE]

The coefficient vectors $\bm{u}^{k}=[u_{11}^{k},u_{21}^{k},\ldots,u_{n_{u}1}^{k},\ldots,u_{1n_{\xi}}^{k},u_{2n_{\xi}}^{k},\ldots,u_{n_{u}n_{\xi}}^{k}]$ and similarly defined $\bm{p}^{k}$ are computed from the nonlinear algebraic system

[TABLE]

where

[TABLE]

Here $I_{n_{\xi}}$ is the $n_{\xi}\times n_{\xi}$ identity matrix, and $\otimes$ denotes the Kronecker product of two matrices. The boldface matrices $\bm{M}$ , $\bm{A}_{l}$ , and $\bm{N}(\vec{u}_{h,l}^{k})$ are $2\times 2$ block-diagonal, with the scalar mass matrix $M$ , weighted stiffness matrix $A_{l}$ , and discrete convection operator $N(\vec{u}_{h,l}^{k})$ as diagonal components, where

[TABLE]

for $i,j=1,\ldots,n_{u}/2$ . Note the dependency on $\bm{u}^{k}$ comes from the nonlinear convection term $\bm{N}$ , with convection velocity $\vec{u}_{h,l}^{k}=\sum_{j}u_{jl}^{k}\vec{\phi}_{j}(x)$ . Let $x=(x_{1},x_{2})$ . The discrete divergence operator $B=[B_{x_{1}},B_{x_{2}}]$ , with

[TABLE]

for $i=1,\ldots,n_{p}$ and $j=1,\ldots,n_{u}/2$ . The matrices $\{G_{l}\}_{l=0}^{m}$ and $\{H_{l}\}_{l=1}^{n_{\xi}}$ of Eq. 3.6 come from the stochastic basis functions and have entries

[TABLE]

for $r,s=1,\ldots,n_{\xi}$ , where $\xi_{0}\equiv 1$ . These matrices are also sparse due to orthogonality of the basis functions [9]. The Dirichlet boundary conditions are incorporated in the right-hand side of Eq. 3.5.

3.3 All-at-once system

As discussed in the beginning of the section, we consider an all-at-once system where the discrete solutions at all the time steps are computed together. Let

[TABLE]

and let $\bm{p}$ , $\bm{f}^{u}$ , and $\bm{f}^{p}$ be similarly defined. By collecting the algebraic systems Eq. 3.5 corresponding to all the time steps $\{t_{k}\}_{k=1}^{n_{t}}$ , we get the single system

[TABLE]

where $\mathbb{F}(\bm{u})$ is block diagonal with $\mathbb{F}^{k}(\bm{u})$ as the $k$ th diagonal block, $\mathbb{B}=I_{n_{t}}\otimes I_{n_{\xi}}\otimes B$ , and $\mathbb{C}=-\tau^{-1}C_{n_{t}}\otimes I_{n_{\xi}}\otimes\bm{M}$ with $C_{n_{t}}=\left(\begin{smallmatrix}0&&&\\ 1&0&&\\ &\ddots&\ddots&\\ &&1&0\end{smallmatrix}\right)\in\mathbb{R}^{n_{t}\times n_{t}}$ . Note that the zero initial conditions are incorporated in Eq. 3.5 for $k=1$ . The all-at-once system Eq. 3.11 is nonsymmetric and blockwise sparse. Each part of the system contains sums of Kronecker products of three matrices, i.e., in the form $\sum_{l}X_{l}^{(1)}\otimes X_{l}^{(2)}\otimes X_{l}^{(3)}$ . In fact, from Eq. 3.6,

[TABLE]

We discuss later (see section 4.3) how the convection matrix $\mathbb{N}$ can also be put in the Kronecker product form. It will be seen that this structure is useful for efficient matrix-vector product computations.

3.4 Picard’s method

We use Picard’s method to solve the nonlinear equation Eq. 3.11. Picard’s method is a fixed-point iteration. Let $\bm{u}^{(i)}$ , $\bm{p}^{(i)}$ be the approximate solutions at the $i$ th step. Each Picard step entails solving a large linear system

[TABLE]

Instead of Eq. 3.13, one can equivalently solve the corresponding residual equation for a correction of the solution. Let $\bm{u}^{(i)}=\bm{u}^{(i-1)}+\delta\bm{u}^{(i)}$ , $\bm{p}^{(i)}=\bm{p}^{(i-1)}+\delta\bm{p}^{(i)}$ . Then $\delta\bm{u}^{(i)}$ and $\delta\bm{p}^{(i)}$ satisfy

[TABLE]

where the nonlinear residual is

[TABLE]

Let $\bm{f}$ denote the right-hand side of Eq. 3.11. The complete algorithm is summarized in Algorithm 3.1. The initial iterates $\bm{u}^{(0)}$ , $\bm{p}^{(0)}$ are obtained as the solution of a Stokes problem, for which in Eq. 3.13 the convection matrix $\mathbb{N}$ is set to zero.

4 Low-rank approximation

In this section we discuss low-rank approximation techniques and how they can be used with iterative solvers. The computational cost of solving Eq. 3.14 at each Picard step is high due to the large problem size $n_{t}n_{\xi}(n_{u}+n_{p})$ , especially when large numbers of spatial grid points or time steps are used to achieve high-resolution solution. We will address this using low-rank tensor approximations to the solution vectors $\bm{u}$ and $\bm{p}$ . We will develop efficient iterative solvers and preconditioners where the solution is approximated using a compressed data representation in order to greatly reduce memory requirements and computational effort. The idea is to represent the iterates in an approximate Krylov subspace method in a low-rank tensor format. The basic operations associated with the low-rank format are much cheaper, and as the Krylov subspace method converges it constructs a sequence of low-rank approximations to the solution of the system.

4.1 Tensor train decomposition

A tensor $\underline{z}\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ is a multidimensional array with entries $\underline{z}(i_{1},\ldots,i_{d})$ , where $i_{l}=1,\ldots,n_{l}$ , $l=1,\ldots,d$ . The solution coefficients in Eq. 3.4 can be represented in the form of three-dimensional $n_{t}\times n_{\xi}\times n_{x}$ tensors $\underline{u}$ (where $n_{x}=n_{u}$ ) and $\underline{p}$ ( $n_{x}=n_{p}$ ), such that $\underline{u}(k,s,j)=u^{k}_{js}$ and $\underline{p}(k,s,j)=p^{k}_{js}$ . Equivalently, such tensors can be represented in vector format, where the vector version $\bm{u}$ and $\bm{p}$ are specified using the vectorization operation

[TABLE]

where $\overline{i_{1}i_{2}i_{3}}=i_{3}+(i_{2}-1)n_{x}+(i_{1}-1)n_{\xi}n_{x}$ , and $\bm{p}=\text{vec}(\underline{p})$ in a similar manner. In an iterative solver for the system Eq. 3.14, any iterate $\bm{z}$ can be equivalently represented as a three-dimensional tensor $\underline{z}\in\mathbb{R}^{n_{t}\times n_{\xi}\times n_{x}}$ . In the sequel we use vector $\bm{z}$ and tensor $\underline{z}$ interchangebly. The tensor train decomposition [21] is a compressed low-rank representation to approximate a given tensor and efficiently perform tensor operations. Specifically, the tensor train format of $\underline{z}$ is defined as

[TABLE]

where $\underline{z}^{(1)}\in\mathbb{R}^{n_{t}\times\kappa_{1}}$ , $\underline{z}^{(2)}\in\mathbb{R}^{\kappa_{1}\times n_{\xi}\times\kappa_{2}}$ , $\underline{z}^{(3)}\in\mathbb{R}^{\kappa_{2}\times n_{x}}$ are the tensor train cores, and $\kappa_{1}$ and $\kappa_{2}$ are called the tensor train ranks. It is easy to see that if $\kappa_{1},\kappa_{2}\approx\kappa$ and $\kappa$ is small, the memory cost to store $\underline{z}$ is reduced from $O(n_{t}n_{\xi}n_{x})$ to $O((n_{t}+n_{\xi}\kappa+n_{x})\kappa)$ .

The tensor train decomposition allows efficient basic operations on tensors. Most importantly, matrix-vector products can be computed much less expensively if the vector $\bm{z}$ is in the tensor train format. For $\underline{z}$ as in Eq. 4.2, the vector $\bm{z}$ has an equivalent Kronecker product form [6]

[TABLE]

where in the right-hand side $z^{(1)}_{\alpha_{1}}$ , $z^{(2)}_{\alpha_{1},\alpha_{2}}$ , and $z^{(3)}_{\alpha_{2}}$ are vectors of length $n_{t}$ , $n_{\xi}$ , and $n_{x}$ , respectively, obtained by fixing the indices $\alpha_{1}$ and $\alpha_{2}$ in $\underline{z}^{(1)}$ , $\underline{z}^{(2)}$ , and $\underline{z}^{(3)}$ . Then for any matrix $\mathbb{X}=X^{(1)}\otimes X^{(2)}\otimes X^{(3)}$ , such as the blocks in Eq. 3.14,

[TABLE]

The product is also in tensor train format with the same ranks as in $\bm{z}$ (of the right-hand side of Eq. 4.2), and it only requires matrix-vector products for each component of $\mathbb{X}$ . From left to right in the Kronecker products, the component matrices from Eq. 3.12 are sparse with numbers of nonzeros proportional to $n_{t}$ , $n_{\xi}$ , and $n_{x}$ , respectively, and the computational cost is thus reduced from $O(n_{t}n_{\xi}n_{x})$ to $O((n_{t}+n_{\xi}\kappa+n_{x})\kappa)$ .

Other vector computations, including additions and inner products, are also inexpensive with the tensor train format. One thing to note is that the additions of two vectors in tensor train format will tend to increase the ranks. This can be easily seen from Eq. 4.2, since the addition of two low-rank tensors end up with more terms for the summation on the right-hand side. An important operation for the tensor train format is a truncation (or rounding) operation, used to reduce the ranks for tensors that are already in the tensor train format but have suboptimal high ranks. For a given tensor $\underline{z}$ as in Eq. 4.2, the truncation operation $\mathcal{T}$ with tolerance $\epsilon$ computes

[TABLE]

such that $\tilde{\underline{z}}$ has smaller ranks than $\underline{z}$ and satisfies the relative error

[TABLE]

(Note that $\|\underline{z}\|_{F}=\|\bm{z}\|_{2}$ .) The truncation operator is based on the TT-SVD algorithm [21], given in Algorithm 4.1, which is used to compute a low-rank tensor train approximation for a full tensor $\underline{z}\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ . In the algorithm, a sequence of singular value decompositions (SVDs) is computed for the so-called unfolding matrix $Z$ , obtained by reshaping the entries of a tensor into a two-dimensional array. Terms corresponding to small singular values are dropped such that an error $E$ in the truncated SVD satisfies $\|E\|_{F}\leq\delta_{j}$ , $j=1,\ldots,d-1$ (see line 4 of Algorithm 4.1). It was shown in [21] that the algorithm produces a tensor train $\tilde{\underline{z}}$ that satisfies

[TABLE]

Thus, one can choose $\delta_{1}=\cdots=\delta_{d-1}=\epsilon\|\underline{z}\|_{F}/\sqrt{d-1}$ to make the relative error $\|\tilde{\underline{z}}-\underline{z}\|_{F}/\|\underline{z}\|_{F}\leq\epsilon$ . Note the algorithm is costly since it requires SVDs on matrices $Z\in\mathbb{R}^{\kappa_{j-1}n_{j}\times n_{j+1}\cdots n_{d}}$ . However, when the tensor $\underline{z}$ is already in the tensor train format, the computation can be greatly simplified, and only SVDs on the much smaller tensor train cores are needed. In this case, the cost of the truncation operation is $O(dn\kappa^{3})$ if $n_{1},\ldots,n_{d}\approx n$ and $\kappa_{1},\ldots,\kappa_{d-1}\approx\kappa$ . We refer to [21] for more details. In the numerical experiments, we use TT-Toolbox [22] for tensor train computations.

4.2 Low-rank solver

The tensor train decomposition offers efficient tensor operations and we use it in iterative solvers to reduce the computational costs. The all-at-once system Eq. 3.14 to be solved at each step of Picard’s method is nonsymmetric. We use a right-preconditioned GMRES method to solve the system. The complete algorithm for solving $\mathscr{L}\bm{z}=\bm{b}$ is summarized in Algorithm 4.2. The preconditioner $\mathscr{P}^{-1}$ entails an inner iterative process and is not fixed for each GMRES iteration, and therefore a variant of the flexible GMRES method (see, e.g., [24]) is used. As discussed above, all the iterates in the algorithm are represented in the tensor train format for efficient computations, and a truncation operation with tolerance $\epsilon_{\text{gmres}}$ is used to compress the tensor train ranks so that they stay small relative to the problem size. It should be noted that since the quantities are truncated, the Arnoldi vectors $\{\bm{v}_{i}\}$ do not form orthogonal basis for the Krylov subspace, and thus this is not a true GMRES computation. When the algorithm is used for solving Eq. 3.14, the truncation operator is applied to quantities associated with the two tensor trains $\delta\bm{u}^{(i)}$ and $\delta\bm{p}^{(i)}$ separately. In Section 5, we construct effective preconditioners for the system Eq. 3.14.

We also use the tensor train decomposition to construct a more efficient variant of Algorithm 3.1. In particular, the updated solutions $\bm{u}^{(i)}$ and $\bm{p}^{(i)}$ in line 5 are truncated, with a tolerance ${\epsilon_{\text{soln}}}$ , so that

[TABLE]

Another truncation operation with $\epsilon_{\text{gmres}}$ is applied to compress the ranks of the nonlinear residual $\bm{r}^{(i)}$ in line 7. We will use this truncated version of Algorithm 3.1 in numerical experiments; choices of the truncation tolerances will be specified in Section 6.

4.3 Convection matrix

We now show that in Eq. 3.12 if the velocity $\bm{u}$ is in the tensor train format, the convection matrix $\mathbb{N}(\bm{u})$ can be represented as a sum of Kronecker products of matrices [3], which allows efficient matrix-vector product computations as in Eq. 4.4. Assume the coefficient tensor in Eq. 3.4 is approximated by a tensor train decomposition,

[TABLE]

Note that the entries of $\bm{N}(\vec{u}_{h,l}^{k})$ are linear in $\vec{u}_{h,l}^{k}$ and

[TABLE]

Let $\vec{u}_{\alpha_{2}}^{(3)}=\sum_{j}\underline{u}^{(3)}(\alpha_{2},j)\vec{\phi}_{j}(x)$ . Then the $k$ th diagonal block of $\mathbb{N}(\bm{u})$ is

[TABLE]

The convection matrix $\mathbb{N}(\bm{u})$ can be expressed as

[TABLE]

Here $u^{(1)}_{\alpha_{1}}$ is a vector obtained by fixing the index $\alpha_{1}$ in $\underline{u}^{(1)}$ , and $\text{diag}(u^{(1)}_{\alpha_{1}})$ is a diagonal matrix with $u^{(1)}_{\alpha_{1}}$ on the diagonal. The result is a sum of Kronecker products of three smaller matrices. Such a representation can be constructed for any iterate $\bm{u}^{(i)}$ in the tensor train format.

Given the number of terms in the summation in the right-hand side of Eq. 4.12, the matrix-vector product with $\mathbb{N}$ will result in a dramatic tensor train rank increase, from $\kappa$ to $\kappa^{2}$ . Unless $\kappa$ is very small, a tensor train with rank $\kappa^{2}$ will require too much memory and also be expensive to work with. To overcome this difficulty, when solving the all-at-once system Eq. 3.14, we use a low-rank approximation of $\bm{u}^{(i)}$ to construct $\mathbb{N}(\bm{u}^{(i)})$ . Specifically, let

[TABLE]

with some truncation tolerance $\epsilon_{\text{conv}}$ . Since $\tilde{\bm{u}}^{(i)}$ has smaller ranks than $\bm{u}^{(i)}$ , the approximate convection matrix $\mathbb{N}(\tilde{\bm{u}}^{(i)})$ contains a smaller number of terms in Eq. 4.12, and thus the rank increase becomes less significant when computing matrix-vector products with it. In other words, the linear system solved at each Picard step becomes

[TABLE]

Note that the original $\bm{u}^{(i)}$ is still used for computing the nonlinear residual $\bm{r}^{(i)}$ in Picard’s method.

5 Preconditioning

In this section we discuss preconditioning techniques for the all-at-once system Eq. 3.14 so that the Krylov subspace methods converge in a small number of iterations. To simplify the notation, we use $\bm{w}$ instead of $\bm{u}^{(i-1)}$ , and the associated approximate solution at the $k$ th time step is

[TABLE]

with $\vec{w}_{h,l}^{k}(x)=\sum_{j}w_{jl}^{k}\vec{\phi}_{j}(x)$ . In the following the dependence on $\bm{w}$ in $\mathbb{F}(\bm{w})$ is omitted in most cases. We derive a preconditioner by extending ideas for more standard problems [8], starting with an “idealized” block triangular preconditioner

[TABLE]

With this choice of preconditioner, the Schur complement is $\mathbb{S}=\mathbb{B}(\mathbb{F}+\mathbb{C})^{-1}\mathbb{B}^{T}$ , and the idealized preconditioned system derived from a block factorization

[TABLE]

has eigenvalues equal to 1 and Jordan blocks of order 2. (Here $\mathbb{I}$ is an identity block.) Thus a right-preconditioned true GMRES method will converge in two iterations. However, the application of $\mathscr{P}^{-1}$ involves solving linear systems associated with $\mathbb{S}$ and $\mathbb{F}+\mathbb{C}$ . These are too expensive for practical computation and to develop preconditioners we will construct inexpensive approximations to the linear solves. Specifically, we derive mean-based preconditioners that use results from the mean deterministic problem. Such preconditioners for the stochastic steady-state Navier–Stokes equations have been studied in [23]. We generalize the techniques for the all-at-once formulation of the unsteady equations.

5.1 Deterministic operator

We review the techniques used for approximating the Schur complement in the deterministic case [8]. The approximations are based on the fact that a commutator of the convection-diffusion operator with the divergence operator

[TABLE]

is small under certain assumptions about smoothness and boundary conditions. The subscript $p$ means the operators are defined on the pressure space. For a discrete convection-diffusion operator $\bm{F}=\bm{A}_{0}+\bm{N}(\vec{w}_{h,1}^{k})$ (which is part of the mean problem we discuss later), as defined in Eq. 3.7, an approximation to the Schur complement $S=B\bm{F}^{-1}B^{T}$ is identified from a discrete analogue of Eq. 5.4,

[TABLE]

where the subscript $p$ means the corresponding matrices constructed on the discrete pressure space. Equation 5.5 leads to an approximation to the Schur complement matrix,

[TABLE]

The pressure convection-diffusion (PCD) preconditioner is constructed by replacing the mass matrices with approximations containing only their diagonal entries (denoted by a subscript $*$ ) in Eq. 5.6,

[TABLE]

The least-squares commutator (LSC) preconditioner avoids the construction of matrices on the pressure space, with the approximation to $F_{p}$ ,

[TABLE]

(see [8, section 9.2] for a derivation). The LSC preconditioner is obtained by substituting $F_{p}$ in Eq. 5.6 and replacing the mass matrices with their diagonals,

[TABLE]

For both preconditioners, the only use of the matrices $\bm{F}$ and $F_{p}$ is through matrix-vector products with them.

5.2 Approximations to $\mathbb{S}^{-1}$

The Schur complement $\mathbb{S}$ involves $(\mathbb{F}+\mathbb{C})^{-1}$ and is impractical to work with. For our stochastic unsteady problem, we consider mean-based preconditioners that use approximations to the Schur complement matrix

[TABLE]

where the “mean” matrix $\mathbb{F}_{0}$ is block-diagonal with $\mathbb{F}_{0}^{k}$ as the $k$ th diagonal block, and

[TABLE]

This corresponds to taking only the first term in the two summations on the right-hand side of Eq. 3.6. Since the gPC basis functions are orthonormal with $\langle\psi_{r}\psi_{s}\rangle=\delta_{rs}$ and $\psi_{1}\equiv 1$ , it follows $\langle\psi_{s}\rangle=\delta_{1s}$ , and $G_{0}=H_{1}=I_{n_{\xi}}$ . The matrices $\bm{A}_{0}$ and $\bm{N}(\vec{w}_{h,1}^{k})$ are constructed from the mean of $\nu$ and $\vec{w}_{h}^{k}$ ,

[TABLE]

The matrix $\mathbb{F}_{0}^{k}$ can be expressed as $I_{n_{\xi}}\otimes(\tau^{-1}\bm{M}+\bm{A}_{0}+\bm{N}(\vec{w}_{h,1}^{k}))$ and this enables use of approximations associated with a deterministic problem. Now, similarly define $\mathbb{F}_{p,0}$ on the pressure space, with

[TABLE]

Let $\mathbb{M}=I_{n_{t}}\otimes I_{n_{\xi}}\otimes\bm{M}$ and $\mathbb{M}_{p}=I_{n_{t}}\otimes I_{n_{\xi}}\otimes M_{p}$ . Assuming the validity of Eq. 5.5 it is easy to check that

[TABLE]

On the other hand, let $\mathbb{C}_{p}=-\tau^{-1}C_{n_{t}}\otimes I_{n_{\xi}}\otimes M_{p}$ , so that $\mathbb{C}$ satisfies

[TABLE]

Combining Eq. 5.14 and Eq. 5.15 gives an approximation to $\mathbb{S}_{0}$ ,

[TABLE]

Then the mean-based PCD preconditioner is given as

[TABLE]

where $\mathbb{M}_{*}=I_{n_{t}}\otimes I_{n_{\xi}}\otimes\bm{M}_{*}$ and $\mathbb{M}_{p*}=I_{n_{t}}\otimes I_{n_{\xi}}\otimes{M}_{p*}$ . Similarly from Eq. 5.8, it holds that

[TABLE]

Substituting $\mathbb{F}_{p,0}+\mathbb{C}_{p}$ in Eq. 5.17 and replacement of the mass matrices with their diagonals gives the mean-based LSC preconditioner

[TABLE]

The two mean-based preconditioners in Eqs. 5.17 and 5.19 have the same form as for the deterministic problem, except that there is an extra term $\mathbb{C}$ or $\mathbb{C}_{p}$ from the all-at-once formulation. Computations associated with the two approximations to the Schur complement are also inexpensive. For example, $(\mathbb{B}\mathbb{M}_{*}^{-1}\mathbb{B}^{T})^{-1}=I_{n_{t}}\otimes I_{n_{\xi}}\otimes(B\bm{M}_{*}^{-1}B^{T})^{-1}$ , and this only requires solving a system with $B\bm{M}_{*}^{-1}B^{T}$ a discrete Laplacian. Multiplications with the mean matrix $\mathbb{F}_{0}+\mathbb{C}$ are reduced to its components (see Eq. 4.4),

[TABLE]

The matrix $\mathbb{N}_{0}$ is block-diagonal with $\mathbb{N}_{0}^{k}=I_{n_{\xi}}\otimes\bm{N}(\vec{w}_{h,1}^{k})$ and can be expressed as a sum of Kronecker products of matrices as discussed in section 4.3,

[TABLE]

5.3 System solve with $\mathbb{F}+\mathbb{C}$

The application of the preconditioner $\mathscr{P}^{-1}$ in Eq. 5.2 also involves solving a linear system associated with the (1,1) block $\mathbb{F}+\mathbb{C}$ . For approximation, we replace it with the mean matrix $\mathbb{F}_{0}+\mathbb{C}$ , and solve a system of the form

[TABLE]

For such a system it is easy to compute matrix-vector products and we again use a low-rank GMRES method for solving the system. This inner GMRES solver is preconditioned with

[TABLE]

where $\vec{w}_{h,1}^{\text{avg}}$ is the average of $\vec{w}_{h,1}^{k}$ over all time steps. For small time step $\tau$ , the contribution from the mass matrix, $\tau^{-1}\bm{M}$ , becomes dominant and $\mathscr{M}$ forms a good approximation to the coefficient matrix $\mathbb{F}_{0}+\mathbb{C}$ . The application of $\mathscr{M}^{-1}$ is also conveniently reduced to computations associated with smaller matrices. We note that Eq. 5.22 need not be solved accurately. In particular, with a stopping criterion $\|\bm{y}-(\mathbb{F}_{0}+\mathbb{C})\bm{v}\|_{2}\leq tol\|\bm{y}\|_{2}$ , a relatively large stopping tolerance, e.g., $tol=10^{-1}$ , will suffice for the mean-based preconditioner $\mathscr{P}$ to be effective.

Remark 5.1.

*For systems like Eq. 5.22, a block diagonal preconditioner ( $\mathscr{M}=\mathbb{F}_{0}$ ) was studied in [20], where it was shown that preconditioned GMRES converges very slowly before a sharp drop in the residual occurs when the number of iterations reaches $n_{t}$ , which is equal to the number of diagonal blocks. In numerical experiments, we found that the preconditioner in Eq. 5.23 is more effective than a block diagonal one, for which performance deteriorates as $\tau$ becomes smaller. *

6 Numerical experiments

6.1 Benchmark problem

Consider a flow around a symmetric step where the spatial domain $\mathcal{D}$ is a two-dimensional rectangular duct with a symmetric expansion (see Fig. 6.1). The Dirichlet inflow boundary conditions at $(-1,x_{2})$ , $|x_{2}|\leq 0.5$ are deterministic and time-dependent, growing from zero to a steady parabolic profile,

[TABLE]

Neumann boundary conditions $\nu{\partial u_{x_{1}}}/{\partial x_{1}}=p$ , ${\partial u_{x_{2}}}/{\partial x_{1}}=0$ are imposed at the outflow boundary $(12,x_{2})$ , $|x_{2}|\leq 1$ , and no-flow conditions $\vec{u}=\vec{0}$ at the fixed walls $(x_{1},\pm 1)$ , $0\leq x_{1}\leq 12$ ; $(x_{1},\pm 0.5)$ , $-1\leq x_{1}\leq 0$ ; $(0,x_{2})$ , $0.5\leq|x_{2}|\leq 1$ . The initial conditions are zero everywhere for both $\vec{u}$ and $p$ . The Taylor–Hood spatial discretization with biquadratic basis functions for the velocity space and bilinear basis functions for the pressure space is defined on a uniform grid of square elements with mesh size $h$ , and it is constructed using the IFISS software package [26].

The stochastic viscosity $\nu(x,\xi)$ is represented as a truncated KL expansion

[TABLE]

The constants $\nu_{0}$ and $\nu_{0}\sigma$ represent the mean and the standard deviation of the stochastic field. We use an exponential covariance function $c(x,y)=\exp(-\|x-y\|_{1}/b)$ , where $b$ is the correlation length. The pair $(\beta_{l},a_{l}(x))$ is the $l$ th largest eigenvalue and the corresponding eigenfunction of $c(x,y)$ , satisfying

[TABLE]

This can be computed with a standard finite element method. The random variables $\{\xi_{l}\}_{l=1}^{m}$ are assumed to be independent and each of them uniformly distributed on the interval $[-\sqrt{3},\sqrt{3}]$ , so they have zero means and unit variances. For the stochastic Galerkin method, the basis functions $\{\psi_{r}\}_{r=1}^{n_{\xi}}$ are $m$ -dimensional Legendre polynomials, with total degrees bounded by $d_{\psi}$ . Then the number of stochastic basis functions is $n_{\xi}=(m+d_{\psi})!/(m!d_{\psi}!)$ . In the numerical experiments, unless otherwise stated, the parameter values associated with the discrete problem are chosen as in Table 6.1. This gives a problem with dimensions $n_{t}=64$ , $n_{\xi}=20$ , $n_{u}=2992$ , $n_{p}=461$ , and $n_{t}n_{\xi}(n_{u}+n_{p})=4419840$ . All computations are done in MATLAB 9.4.0 (R2018a) on a desktop with 64 GB memory.

6.2 Inexact Picard method

The main computational cost associated with Picard’s method is to solve an all-at-once system Eq. 3.14 at each step. In Section 4 we discussed how to construct low-rank approximate solutions in tensor train format with much cheaper computations. To further reduce the cost, we adopt the idea of inexact Picard method [4], where the linear systems are solved inexactly to save unnecessary computational work. Let Eq. 3.14 be denoted as $\mathscr{L}\bm{z}^{(i)}=\bm{r}^{(i-1)}$ , and define the residual norm $\|\bm{s}_{k}\|_{2}=\|\bm{r}^{(i-1)}-\mathscr{L}\bm{z}^{(i)}_{k}\|_{2}$ for an approximate solution $\bm{z}^{(i)}_{k}$ . It was shown in [4] that if the stopping criterion for the linear solve (line 2 of Algorithm 4.2) is given as

[TABLE]

then Picard’s method converges as long as $tol_{\text{gmres}}<1$ . This is especially helpful for our low-rank GMRES method. The best accuracy that the low-rank GMRES method can achieve is related to the truncation tolerance $\epsilon_{\text{gmres}}$ used in the algorithm (see Fig. 6.2a). A relaxed stopping tolerance not only reduces the number of GMRES iterations, but it also allows use of larger truncation tolerances for tensor rank compressions, resulting in smaller ranks for the iterates and more efficient computations in the iterative solver. In the numerical tests, we set $tol_{\text{gmres}}=10^{-1}$ and $\epsilon_{\text{gmres}}=10^{-2}*tol_{\text{gmres}}=10^{-3}$ . The same tolerances are used for solving the linear system Eq. 5.22 required for the preconditioning operation. For the initial $\bm{u}^{(0)}$ , $\bm{p}^{(0)}$ , the Stokes problem is solved to satisfy $\|\bm{s}_{k}\|_{2}\leq tol_{\text{gmres}}\|\bm{f}\|_{2}$ where $\bm{f}$ is the right-hand side of Eq. 3.13.

6.3 Numerical results

In the following, we examine the performance of the proposed low-rank algorithm in different settings. The choices of stopping and truncation tolerances are summarized in Table 6.2. In Algorithm 3.1, the stopping criterion for Picard’s method is

[TABLE]

We set $tol_{\text{picard}}=10^{-5}$ . A small truncation tolerance $\epsilon_{\text{soln}}=10^{-7}$ is used to produce low-rank approximate solutions $\bm{u}^{(i)}$ and $\bm{p}^{(i)}$ in Eq. 4.8. It is shown in Fig. 6.2b that, like the exact method, the inexact Picard method still exhibits a linear convergence rate. It takes 5 Picard steps to reach the required accuracy. Figure 6.3 shows the tensor train ranks $\kappa_{1}$ and $\kappa_{2}$ of the iterates at each Picard step. As the Picard iteration converges, the right-hand side of Eq. 6.4 becomes smaller, and the corrections $\delta\bm{u}^{(i)}$ and $\delta\bm{p}^{(i)}$ computed from the low-rank GMRES method have increasing ranks. On the other hand, for the approximate solutions $\bm{u}^{(i)}$ and $\bm{p}^{(i)}$ , their ranks drop to smaller values in the latter steps of the iteration. With a more stringent $tol_{\text{picard}}$ , a smaller $\epsilon_{\text{soln}}$ is required and the approximate solutions have slightly higher ranks than those shown in Fig. 6.3b. Also shown in Fig. 6.3b are the tensor train ranks of $\tilde{\bm{u}}^{(i)}$ for constructing the approximate convection matrices using Eq. 4.13. They have much smaller values than the ranks of $\bm{u}^{(i)}$ .

We demonstrate the savings obtained from the inexact solves. Table 6.3 shows the performance of Picard’s method if different stopping tolerances are used in Eq. 6.4. With a larger $tol_{\text{gmres}}$ , the number of Picard steps does not increase, while the total number of GMRES iterations and the associated computational costs are greatly reduced.

We compare the two mean-based preconditioners discussed in Section 5. Figure 6.4 shows the number of GMRES iterations required at each Picard step, and the associated computational costs when the two preconditioners are used. For two different mesh sizes, the PCD preconditioner results in larger numbers of GMRES iterations, and thus higher computational times, than the LSC preconditioner. It should also be noted that for both preconditioners, only a small number of GMRES iterations is needed for solving the linear system at each Picard step. This is partially due to the large stopping tolerance used in Eq. 6.4. The LSC preconditioner will be used for the numerical tests below.

In the following, we test the algorithm with several variants of the benchmark problem determined by various values of parameters associated with it. Figure 6.5a shows the solution ranks and computational times for three different values of $\sigma$ . When $\sigma$ is smaller, the standard deviation is smaller, the discrete solution can be approximated by a tensor train with smaller ranks, and it is also less expensive to solve the nonlinear problem. On the other hand, even for $\sigma=0.1$ , the low-rank solution takes much less storage than a full tensor. For example, the ranks of the approximate solution $\bm{u}^{(i)}$ are $\kappa_{1}=13$ , $\kappa_{2}=83$ . The ratio of storage requirements between such a tensor train and a full tensor is

[TABLE]

The same quantities are plotted in Fig. 6.5b for different values of the mean viscosity $\nu_{0}$ . The ranks and computational times are not significantly affected by $\nu_{0}$ .

Finally, the algorithm is applied to solve discrete problems with various mesh sizes $h$ or time step sizes $\tau$ . It can be seen from Fig. 6.6a that there is only a slight increase in the solution ranks as the spatial mesh is refined. It is also shown in Fig. 6.6a that the computational time increases with an asymptotic rate $O(h^{-2})$ (note that a logarithmic scale is used in the figure). In other words, as the spatial mesh is refined, no extra computational burden is introduced except for the increased problem size. For different time step sizes $\tau$ , the computational time increases much more slowly than $O(\tau^{-1})$ (see Fig. 6.6b). This is due to the fact that in $\mathbb{F}+\mathbb{C}$ , the matrices obtained from time discretization are very simple (e.g., $I_{n_{t}}$ and $C_{n_{t}}$ ), and thus an increase in $n_{t}$ does not make a significant impact on the computational costs.

7 Conclusions

In this paper, we developed and studied efficient low-rank iterative methods for solving the time-dependent Navier–Stokes equations with a random viscosity. We considered an all-at-once formulation where the discrete solutions at all the time steps are solved together in a single system. To address the high storage and computational costs of this strategy, we used low-rank tensor approximations in a Newton–Krylov type algorithm. For the all-at-once system, we proposed two mean-based preconditioners using results from the deterministic problem. The computational costs were further reduced with inexact Picard method and approximate convection matrices. It was shown in the numerical experiments that the low-rank method is able to solve the nonlinear problem efficiently and the discrete solutions have small tensor ranks.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Andreev and C. Tobler , Multilevel preconditioning and low-rank tensor iteration for space-time simultaneous discretizations of parabolic PD Es , Numerical Linear Algebra with Applications, 22 (2015), pp. 317–337.
2[2] J. Ballani and L. Grasedyck , A projection method to solve linear systems in tensor format , Numerical Linear Algebra with Applications, 20 (2013), pp. 27–43.
3[3] P. Benner, S. Dolgov, A. Onwunta, and M. Stoll , Solving optimal control problems governed by random Navier–Stokes equations using low-rank methods , Mar. 2017, https://arxiv.org/abs/1703.06097 .
4[4] P. Birken , Termination criteria for inexact fixed-point schemes , Numerical Linear Algebra with Applications, 22 (2015), pp. 702–716.
5[5] S. V. Dolgov , TT-GMRES: solution to a linear system in the structured tensor format , Russian Journal of Numerical Analysis and Mathematical Modelling, 28 (2013), pp. 149–172.
6[6] S. V. Dolgov and D. V. Savostyanov , Alternating minimal energy methods for linear systems in higher dimensions , SIAM Journal on Scientific Computing, 36 (2014), pp. A 2248–A 2271.
7[7] H. Elman, M. Mihajlović, and D. Silvester , Fast iterative solvers for buoyancy driven flow problems , Journal of Computational Physics, 230 (2011), pp. 3900–3914.
8[8] H. C. Elman, D. J. Silvester, and A. J. Wathen , Finite Elements and Fast Iterative Solvers: With Applications in Incompressible Fluid Dynamics , Oxford University Press, UK, second ed., 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Abstract

keywords:

1 Introduction

2 Problem setting

3 Discrete problem

3.1 Time discretization

3.2 Stochastic Galerkin method

3.3 All-at-once system

3.4 Picard’s method

4 Low-rank approximation

4.1 Tensor train decomposition

4.2 Low-rank solver

4.3 Convection matrix

5 Preconditioning

5.1 Deterministic operator

5.2 Approximations to S−1\mathbb{S}^{-1}S−1

5.3 System solve with F+C\mathbb{F}+\mathbb{C}F+C

Remark 5.1**.**

6 Numerical experiments

6.1 Benchmark problem

6.2 Inexact Picard method

6.3 Numerical results

7 Conclusions

5.2 Approximations to $\mathbb{S}^{-1}$

5.3 System solve with $\mathbb{F}+\mathbb{C}$

Remark 5.1.