A stable parareal-like method for the second order wave equation

Hieu Nguyen; Richard Tsai

arXiv:1905.00473·math.NA·January 29, 2020·J. Comput. Phys.

A stable parareal-like method for the second order wave equation

Hieu Nguyen, Richard Tsai

PDF

TL;DR

This paper introduces a parallel-in-time iterative method for the second-order wave equation that combines coarse and fine propagators, utilizing a data-driven stabilization strategy to improve efficiency and accuracy.

Contribution

It develops a stable parareal-like method with a data-driven coupling strategy for the wave equation, enhancing parallel efficiency and solution stability.

Findings

01

Effective in stabilizing the solution process

02

Demonstrated improved performance on Marmousi model

03

Allows larger time steps with maintained accuracy

Abstract

A new parallel-in-time iterative method is proposed for solving the homogeneous second-order wave equation. The new method involves a coarse scale propagator, allowing for larger time steps, and a fine scale propagator which fully resolves the medium using finer spatial grid and shorter time steps. The fine scale propagator is run in parallel for short time intervals. The two propagators are coupled in an iterative way that resembles the standard parareal method developed by Lions, Maday and Turinici. We present a data-driven strategy in which the computed data gathered from each iteration are re-used to stabilize the coupling by minimizing the energy residual of the fine and coarse propagated solutions. An example of Marmousi model is provided to demonstrate the performance of the proposed method.

Figures37

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1 . Computing time (in seconds) of each part in our algorithm. Number of parareal iteration is 4. Projected speed up is calculated as if the number of CPUs is equal to the number of time slices. In the projected speed up calculation, we assume the time to create phase corrector does not change when the number of CPUs increases.

Cores

N = T / Δ ​ t_{c ​ o ​ m}

Parallel

computation

Creating corrector

Serial

update

Serial fine

computation

Projected

speed up

20

186.81

180.84

180.31

180.35

0.26 0.11 0.13 0.15

1.99

1.52

1.49

1.52

334.49

8.44

4.32

2.91

2.18

100

908.74

922.19

911.85

921.26

0.48 0.49 0.64 0.74

8.64

8.46

8.72

8.71

1724.30

37.92

18.88

12.57

9.40

4

200

1807.23

1837.55

1819.07

1854.84

0.81 1.06 3.14 1.77

17.13

17.90

17.82

18.98

3479.63

64.34

31.69

20.82

15.47

20

119.61

119.06

129.25

119.51

0.95 0.1 0.11 0.14

1.78

1.41

1.43

1.47

333.78

7.83

3.98

2.60

1.96

100

525.09

547.06

543.50

527.34

0.46 0.49 0.57 0.73

8.27

7.78

8.31

8.33

1725.16

35.12

17.34

11.49

8.63

8

200

1023.37

1050.60

1032.15

1029.58

0.99 1.08 1.39 1.94

16.26

16.53

17.80

20.47

3491.95

60.02

29.64

19.58

14.43

20

73.40

65.93

63.73

66.45

0.26 0.11 0.11 0.13

2.41

1.51

1.53

1.58

332.83

4.37

2.32

1.59

1.46

100

337.38

331.03

335.75

345.86

0.46 0.46 0.58 0.72

8.00

8.18

8.34

8.76

1734.02

22.84

11.50

7.64

5.67

20

200

656.80

644.41

655.77

653.80

1.06 1.03 1.35 1.73

16.66

17.73

17.39

19.35

3492.86

41.88

20.96

13.92

10.35

Equations223

u_{tt} = c^{2} (x) Δ u, x \in [0, 1)^{d}, 0 \leq t < T,

u_{tt} = c^{2} (x) Δ u, x \in [0, 1)^{d}, 0 \leq t < T,

u (x, 0) = u_{0} (x),

u_{t} (x, 0) = p_{0} (x) .

v_{n + 1}^{k + 1}

v_{n + 1}^{k + 1}

v_{0}^{1}

u_{n}^{k} = P [C u_{n - 1}^{k} + (F u_{n - 1}^{k - 1} - (C u_{n - 1}^{k - 1}))],

u_{n}^{k} = P [C u_{n - 1}^{k} + (F u_{n - 1}^{k - 1} - (C u_{n - 1}^{k - 1}))],

u_{n}^{k} = (C (I - P^{k}) u_{n - 1}^{k} + F P^{k} u_{n - 1}^{k}) + F u_{n - 1}^{k - 1} - (C (I - P^{k}) u_{n - 1}^{k - 1} + F P^{k} u_{n - 1}^{k - 1}),

u_{n}^{k} = (C (I - P^{k}) u_{n - 1}^{k} + F P^{k} u_{n - 1}^{k}) + F u_{n - 1}^{k - 1} - (C (I - P^{k}) u_{n - 1}^{k - 1} + F P^{k} u_{n - 1}^{k - 1}),

C (I - P^{k}) + F P^{k} .

C (I - P^{k}) + F P^{k} .

u_{n + 1}^{k + 1} = C u_{n}^{k + 1} + F u_{n}^{k} - C u_{n}^{k} .

u_{n + 1}^{k + 1} = C u_{n}^{k + 1} + F u_{n}^{k} - C u_{n}^{k} .

e_{n}^{k + 1} \leq ∥ F - C ∥_{\infty} \sum_{i = 1}^{n - k - 1} ∥ C ∥_{\infty}^{i} e_{n}^{k} .

e_{n}^{k + 1} \leq ∥ F - C ∥_{\infty} \sum_{i = 1}^{n - k - 1} ∥ C ∥_{\infty}^{i} e_{n}^{k} .

C_{parareal}=K\Big{(}\dfrac{T}{\Delta t}+\dfrac{T}{n_{CPU}\delta t}\Big{)}.

C_{parareal}=K\Big{(}\dfrac{T}{\Delta t}+\dfrac{T}{n_{CPU}\delta t}\Big{)}.

u_{n + 1}^{k + 1} = θ_{n + 1}^{k} [\tilde{C} u_{n}^{k + 1}] + \tilde{F} u_{n}^{k} - θ_{n + 1}^{k} [\tilde{C} u_{n}^{k}] .

u_{n + 1}^{k + 1} = θ_{n + 1}^{k} [\tilde{C} u_{n}^{k + 1}] + \tilde{F} u_{n}^{k} - θ_{n + 1}^{k} [\tilde{C} u_{n}^{k}] .

θ_{n + 1}^{k} \approx \tilde{F} \tilde{C}^{- 1} : \tilde{C} u \mapsto \tilde{F} u .

θ_{n + 1}^{k} \approx \tilde{F} \tilde{C}^{- 1} : \tilde{C} u \mapsto \tilde{F} u .

[u_{n}, \overset{u}{˙}_{n}] := F [u_{n - 1}^{k}, \overset{u}{˙}_{n - 1}^{k}]

[u_{n}, \overset{u}{˙}_{n}] := F [u_{n - 1}^{k}, \overset{u}{˙}_{n - 1}^{k}]

[U_{n}, \dot{U}_{n}] := C [R u_{n - 1}^{k}, R \overset{u}{˙}_{n - 1}^{k}] .

[U_{n}, \dot{U}_{n}] := C [R u_{n - 1}^{k}, R \overset{u}{˙}_{n - 1}^{k}] .

\left[\begin{array}[]{c}v\\ w\end{array}\right]

\left[\begin{array}[]{c}v\\ w\end{array}\right]

\mathsf{F}:=\left[\begin{array}[]{cccc}\nabla_{h}\mathcal{R}u_{1}&\nabla_{h}\mathcal{R}u_{2}&\cdots&\nabla_{h}\mathcal{R}u_{N}\\ c^{-1}\mathcal{R}\dot{u}_{1}&c^{-1}\mathcal{R}\dot{u}_{2}&\cdots&c^{-1}\mathcal{R}\dot{u}_{N}\end{array}\right],

\mathsf{F}:=\left[\begin{array}[]{cccc}\nabla_{h}\mathcal{R}u_{1}&\nabla_{h}\mathcal{R}u_{2}&\cdots&\nabla_{h}\mathcal{R}u_{N}\\ c^{-1}\mathcal{R}\dot{u}_{1}&c^{-1}\mathcal{R}\dot{u}_{2}&\cdots&c^{-1}\mathcal{R}\dot{u}_{N}\end{array}\right],

\mathsf{G}:=\left[\begin{array}[]{cccc}\nabla_{h}U_{1}&\nabla_{h}U_{2}&\cdots&\nabla_{h}U_{N}\\ c^{-1}\dot{U}_{1}&c^{-1}\dot{U}_{2}&\cdots&c^{-1}\dot{U}_{N}\end{array}\right].

\mathsf{G}:=\left[\begin{array}[]{cccc}\nabla_{h}U_{1}&\nabla_{h}U_{2}&\cdots&\nabla_{h}U_{N}\\ c^{-1}\dot{U}_{1}&c^{-1}\dot{U}_{2}&\cdots&c^{-1}\dot{U}_{N}\end{array}\right].

E([U_{n},\dot{U}_{n}]):=\dfrac{1}{2}\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}|\nabla_{h}U_{n}(x_{j})|^{2}\Delta x^{d}+\dfrac{1}{2}\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}c_{j}^{-2}|\dot{U}_{n}(x_{j})|^{2}\Delta x^{d}.

E([U_{n},\dot{U}_{n}]):=\dfrac{1}{2}\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}|\nabla_{h}U_{n}(x_{j})|^{2}\Delta x^{d}+\dfrac{1}{2}\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}c_{j}^{-2}|\dot{U}_{n}(x_{j})|^{2}\Delta x^{d}.

\|\mathsf{G}\|^{2}_{F}=\sum^{N}_{n=1}\Big{[}\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}|\nabla_{h}U_{n}(x_{j})|^{2}+\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}c_{j}^{-2}|\dot{U}_{n}(x_{j})|^{2}\Big{]}\\ =\dfrac{2}{\Delta x^{d}}\sum^{N}_{n=1}E([U_{n},\dot{U}_{n}]).

\|\mathsf{G}\|^{2}_{F}=\sum^{N}_{n=1}\Big{[}\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}|\nabla_{h}U_{n}(x_{j})|^{2}+\sum^{{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}N_{\Delta}x}}_{j}c_{j}^{-2}|\dot{U}_{n}(x_{j})|^{2}\Big{]}\\ =\dfrac{2}{\Delta x^{d}}\sum^{N}_{n=1}E([U_{n},\dot{U}_{n}]).

Ω \in R^{(d + 1) N_{Δ x} \times (d + 1) N_{Δ x}} min j = 1 \sum N ∣∣ f_{j} - Ω g_{j} ∣ ∣_{2}^{2}, s.t. Ω Ω^{T} = Ω^{T} Ω = I .

Ω \in R^{(d + 1) N_{Δ x} \times (d + 1) N_{Δ x}} min j = 1 \sum N ∣∣ f_{j} - Ω g_{j} ∣ ∣_{2}^{2}, s.t. Ω Ω^{T} = Ω^{T} Ω = I .

∣∣ f_{j} - Ω g_{j} ∣ ∣_{2}^{2} = ∣∣ f_{j} ∣ ∣_{2}^{2} + ∣∣ g_{j} ∣ ∣_{2}^{2} - 2 (f_{j}, Ω g_{j}),

∣∣ f_{j} - Ω g_{j} ∣ ∣_{2}^{2} = ∣∣ f_{j} ∣ ∣_{2}^{2} + ∣∣ g_{j} ∣ ∣_{2}^{2} - 2 (f_{j}, Ω g_{j}),

Ω \in R^{(d + 1) N_{Δ x} \times (d + 1) N_{Δ x}} min ∥ F - ΩG ∥_{F}^{2}, s . t . Ω Ω^{T} = Ω^{T} Ω = I,

Ω \in R^{(d + 1) N_{Δ x} \times (d + 1) N_{Δ x}} min ∥ F - ΩG ∥_{F}^{2}, s . t . Ω Ω^{T} = Ω^{T} Ω = I,

M := F G^{T} = j = 1 \sum n \nabla_{h} R u_{j} \otimes \nabla_{h} U_{j} + c^{- 1} R \overset{u}{˙}_{j} \otimes c^{- 1} \dot{U}_{j} .

M := F G^{T} = j = 1 \sum n \nabla_{h} R u_{j} \otimes \nabla_{h} U_{j} + c^{- 1} R \overset{u}{˙}_{j} \otimes c^{- 1} \dot{U}_{j} .

Ω_{*} = XY^{T},

Ω_{*} = XY^{T},

r_{min}^{2} = ∥ F ∥_{F}^{2} + ∥ G ∥_{F}^{2} - 2 trace (Σ) .

r_{min}^{2} = ∥ F ∥_{F}^{2} + ∥ G ∥_{F}^{2} - 2 trace (Σ) .

Q_{F}, Q_{G} \in R^{(d + 1) N_{Δ x} \times N}, R_{F}, R_{G} \in R^{N \times N} .

Q_{F}, Q_{G} \in R^{(d + 1) N_{Δ x} \times N}, R_{F}, R_{G} \in R^{N \times N} .

M = Q_{F} X_{F} Σ Y_{G}^{T} Q_{G}^{T} .

M = Q_{F} X_{F} Σ Y_{G}^{T} Q_{G}^{T} .

rank (M) = rank (R_{F} R_{G}^{T}) = min (rank (F), rank (G)) .

rank (M) = rank (R_{F} R_{G}^{T}) = min (rank (F), rank (G)) .

Ω_{*} = (Q_{F} X_{F}) (Q_{G} Y_{G})^{T} .

Ω_{*} = (Q_{F} X_{F}) (Q_{G} Y_{G})^{T} .

\Omega_{*}:=\Big{(}\mathsf{Q}_{F}(:,1:s)\mathsf{X}_{F}(1:s,1:s)\Big{)}\Big{(}\mathsf{Q}_{G}(:,1:s)\mathsf{Y}_{G}(1:s,1:s)\Big{)}^{T}.

\Omega_{*}:=\Big{(}\mathsf{Q}_{F}(:,1:s)\mathsf{X}_{F}(1:s,1:s)\Big{)}\Big{(}\mathsf{Q}_{G}(:,1:s)\mathsf{Y}_{G}(1:s,1:s)\Big{)}^{T}.

M^{k + 1} = M^{k} + F^{k} (G^{k})^{T} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A stable parareal-like method for the second order wave equation

Hieu Nguyen

and

Richard Tsai

Abstract.

A new parallel-in-time iterative method is proposed for solving the homogeneous second-order wave equation. The new method involves a coarse scale propagator, allowing for larger time steps, and a fine scale propagator which fully resolves the medium using finer spatial grid and uses shorter time steps. The fine scale propagator is run in parallel for short time intervals. The two propagators are coupled in an iterative way that resembles the standard parareal method [24]. We present a data-driven strategy in which the computed data gathered from each iteration are re-used to stabilize the coupling by minimizing the wave energy residual of the fine and coarse propagated solutions. Several examples, including a wave speed with discontinuities, are provided to demonstrate the effectiveness of the proposed method.

Key words and phrases:

Keywords: parallel-in-time, wave equation, Procrustes problem

Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX 78712, USA. E-mail: [email protected]

Department of Mathematics and Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, TX 78712, USA and KTH Royal Institute of Technology, Sweden. E-mail: [email protected]

1. Introduction

In this paper, we will focus on the initial value problem of the standard second order wave equation:

[TABLE]

For boundary conditions, we consider either periodic, absorbing boundary conditions or placing a perfectly matched layer around $[0,1)^{d}$ . The wave speed $c(x)$ is given explicitly and independent of the solution. Our objective is to develop a stable parallel-in-time algorithm for (1).

The wave equation is a physical model for seismic wave and electromagnetic wave in certain simplified setups. It is also used as a test case for developing algorithms that are further generalized to more complicated elastic and electromagnetic wave equations.

Time domain decomposition methods for evolution problems has been of increasing interest in the partial differential equation community due to the increasing number of cores available in modern supercomputers. Despite rapid advance in parallel computer architecture, parallelizing the time evolution of the second order wave equation efficiently is still a challenging problem. One of the time domain decomposition paradigms is parallel-in-time method. The whole time domain $[0,T)$ is partitioned into subintervals for parallel processing. The most relevant algorithm to this paper is the parareal method introduced by Lions, Maday and Turinici [24]. The parareal method combines iteratively two propagators, denoted by $\mathcal{F}v$ the fine propagator and by $\mathcal{C}v$ the coarse propagator. They approximate the solution $v(t_{n+1})$ propagated from $v(t_{n})$ . The approximate solution at parareal iteration $k$ , denoted as $v^{k}_{n}$ , can be described by

[TABLE]

Note that for each $k$ , $\mathcal{F}v_{n}^{k}$ is computed in parallel. For the second order wave equation under consideration, $v(x,t)$ is a vector corresponding to $[u(x,t),u_{t}(x,t)]$ .

Typically, the coarse propagator runs on coarse grid and is cheaper to compute, while the fine propagator runs on finer grid and is assumed to fully resolve the small scales in the problem. The finer propagator is thus more costly to compute.

In [4], it is shown that the parareal method is stable and converges linearly to the serial fine solution if the coarse propagator is smooth and has sufficient dissipation. When certain conditions are met, the parareal method can achieve high fidelity solution within few iterations. Some applications of the parareal method are: plasma turbulence in Tokamak reactor [38, 37, 36], Navier-Stoke equations [14, 40, 11], acoustic wave [28], shallow water [20], chemical kinetic [6], molecular dynamics [9], reaction wave [13], neutron diffusion [5, 26], lattice Boltzmann equation for laminar flow [29, 23, 30].

Speed up of wall-clock time is attained when the coarse propagator can be chosen as a spatial coarsening of the fine propagator [35, 33] which allows larger coarse time step. Indeed, this coarsening technique provides additional speed up in some applications [25, 3, 23] because the coarse propagator has less grid points to compute, provided an appropriate grid restriction and interpolation operator. However as shown in [33], considerable coarse grid resolution and accurate interpolation are required in order to make the parareal iteration (2) converge.

The parareal method tends to suffer from slow convergence or instability when applied to hyperbolic problems. Using an oscillatory dynamical system as an example, [1, 2, 22] pointed out that the phase error between the coarse and fine propagators is the reason for the slow convergence. Analogously for advection problems, the authors in [34] observed that numerical dispersion between the solvers makes the parareal method converge from above and hence causes instability. Intuitively, constructive or destructive interference of two overlapping plane waves depends on their relative phase which is sensitive to the frequency, yet the parareal iterative coupling (2) is point-wise in space and time.

There have been some attempts to modify the classical parareal method in order to address the slow convergence issue. In [12], the fact that solutions to the wave equation live on a submanifold of constant energy is exploited. In that work, the solutions are projected onto the submanifold to stabilize the parareal iterations. More precisely the algorithm can be presented as

[TABLE]

where $P$ denotes the projection onto the constant energy submanifold. However, the projection is obtained by solving nonlinear equations which can be sensitive to the initial guess.

In the so-called Krylov-subspace enhanced parareal methods [15, 35], computed solution data is used to construct projection operators, which is used to modify the coarse propagator. To get the projection operator $P^{k}$ , a set of orthogonal vectors is constructed for the subspace spanned by $(\{u^{j}_{i}\}$ for $i=1,2\dots n,j=1,2\dots k-1)$ . Let $S^{k}$ be the matrix whose columns are the orthogonal vectors $s_{\ell}$ , then $P^{k}=S^{k}(S^{k})^{T}$ . The enhanced parareal algorithm takes the following form:

[TABLE]

where $I$ is the identity. The enhanced coarse propagator corresponds to

[TABLE]

The fine propagation, $\mathcal{F}s_{\ell}$ for $s_{\ell}$ is the orthogonal vector that defines $P^{k}$ , is precomputed and stored. The precomputation incurs an additional computing cost on top of orthogonalization of the data matrices.

The reduced basis parareal method [10] develops more efficient ways to construct the basis vectors and extends the approach to solve nonlinear equations.

The convergence and stability of these methods are analyzed and demonstrated by numerical examples of constant wave speed media in one and two dimension. However, in these work, the fine and coarse solvers are assumed to work on the same spatial grids and examples of variable wave speed are not presented. In this paper, we will consider the solvers on different spatial grids and present examples with variable wave speed. We also use the computed data, but its usage is very different from [15, 35, 10], see Section 3.2.

On the other hand, it is known that the slow convergence and instability of the parareal method for hyperbolic problems can be due to some notions of phase errors [2, 1] and numerical dispersion [34]. In [1], effective multiscale parareal schemes relying on elaborate phase correction are proposed for a class of highly oscillatory dynamical systems. In [2], we derived convergence theory for a modified parareal scheme applying to linear systems of ordinary differential equations (ODEs). Additionally, in that work, we investigated a few simple strategies of phase correction systematically and showed that appropriate phase correction could enable the resulting scheme to have superior performance.

In this paper, we propose a new method, based on the idea of $\theta$ -parareal scheme [2]. Instead of decomposing the input data as in [15, 35], we use the computed data to build an operator, formally denoted as $\theta$ , that directly brings the coarse solutions, $\mathcal{C}u$ , closer to the fine solutions, $\mathcal{F}u$ . In this paper, the $\theta$ operators are constructed by minimizing the residual between the fine and coarse solutions in a semi-norm related to the discrete wave energy.

2. Preliminary background

We briefly review the plain parareal method and its properties. In a context of linear evolutionary problem $\dot{u}(t)=Au(t)$ for $t\in\{0,\Delta t,...N\Delta t=T\}$ and $A:\mathbb{R}\mapsto\mathbb{R}$ linear function, let us denote the fine propagator/solver $\mathcal{F}u_{n}\mapsto u_{n+1}$ and the coarse propagator/solver $\mathcal{C}u_{n}\mapsto u_{n+1}$ . Then the plain parareal iteration $k+1$ can be written as a recurrence relation

[TABLE]

Starting solution $k=1$ is the serial coarse solution $u^{k=1}_{n}=\mathcal{C}^{n}u_{0}$ . In addition, by rewriting the recurrence relation (3) in a matrix form and manipulating the inverse of Toeplitz structure, an error estimate $e^{k}_{n}:=|u^{k}_{n}-u(t_{n})|$ is derived in [2]

[TABLE]

The first term on the right hand side is equivalent to the local truncation error of the coarse propagator, assuming the fine solver is an exact one. The summation term is bounded above by $N$ for stable schemes, e.g. $\|\mathcal{C}\|_{\infty}\leq 1$ . Above error estimate is equivalent to linear convergence analysis of the parareal method derived in [4, 16].

Wall-clock complexity of the parareal algorithm is estimated by

[TABLE]

Comparing to the complexity of the serial fine solver $C_{fine}=T/\delta t$ , the parareal algorithm is more effective (from the perspective of total wall-clock computing time) if (i) a large number of computing cores, $n_{CPU}$ , are used; (ii) the coarse/fine time stepping ratio is sufficiently large $\Delta t/\delta t\gg 1$ ; and (iii) the number of iterations, needed to for the desired accuracy $K$ , is small.

The key objective of this paper is to introduce a data-driven strategy to stabilize and improve the efficiency of the parareal iteration.

3. The proposed method

We propose a scheme that takes the general form:

[TABLE]

Here, $\mathbf{u}^{k}_{n}$ denotes the solutions computed on the grid, and it has two component blocks, one corresponds to the wave solution $u$ and the other the time derivative $u_{t}$ . In this paper, for readability we shall also use $\dot{u}$ to denote the time derivative of $u$ , i.e. $u_{t}\equiv\dot{u}$ . The coarse and fine propagators, $\mathcal{C}$ and $\mathcal{F}$ will operate on different grids, and additional interpolation and restriction operators are needed for coupling the two propagators. Here we use $\tilde{\mathcal{C}}$ and $\tilde{\mathcal{F}}$ to denote the appropriately defined operations to be described in detail in this section.

A family of operators $\theta^{k}_{n}[\cdot]$ are constructed such that

[TABLE]

Clearly, direct calculation of $\tilde{\mathcal{F}}\tilde{\mathcal{C}}^{-1}$ is not practical because it undermines time parallelization of the $\theta$ -parareal method. Instead, we seek an effective mapping that has similar property as $\tilde{\mathcal{F}}\tilde{\mathcal{C}}^{-1}$ and is constructed from data computed along the parareal iterations.

3.1. Discretizations and data preparation

In this paper, we use the uniform Cartesian grids for the spatial domain and uniform stepping in time. Both the coarse and the fine propagators are defined by the standard second order central difference scheme for the spatial derivatives and velocity Verlet for time marching. The coarse propagator will operate on the coarse grid: $\Delta x\cdot\mathbb{Z}^{d}\times\Delta t\cdot\mathbb{Z}^{+},$ and the fine propagator will operate on the fine grid: $\delta x\cdot\mathbb{Z}^{d}\times\delta t\cdot\mathbb{Z}^{+},$ for $d=1$ or $2$ .

Let $u_{n}\in\mathbb{R}^{N_{\delta x}},U_{n}\in\mathbb{R}^{N_{\Delta x}}$ denote respectively the solutions computed at time $t_{n}=n\Delta t_{com},~{}n=1,2,3,\dots N$ on the fine and coarse grids. $N_{\delta x},N_{\Delta x}$ are the number of grid points for the fine grids and the coarse grids respectively.

These fine grid functions $u\in\mathbb{R}^{N_{\delta x}}$ and coarse grid functions $U\in\mathbb{R}^{N_{\Delta x}}$ are coupled by an interpolation $\mathcal{I}:U\mapsto u$ and a restriction $\mathcal{R}:u\mapsto U$ . The accuracy of the interpolation method will influence the stability of parareal iteration, as discussed in Section 6.4. Coarse propagator uses point-wise value of the wave speed $c(j\Delta x)$ and does not involve averaging of the wave speed nor homogenization of the wave equation. The fine and coarse propagators communicate at $n\Delta t_{com}.$ The fine propagator uses the step size $\delta t=\Delta t_{com}/m_{\mathcal{F}}$ and the coarse propagator uses $\Delta t=\Delta t_{com}/m_{\mathcal{C}}$ , with $m_{\mathcal{F}},m_{\mathcal{C}}\in\mathbb{N}$ selected according to $\delta x$ and $\Delta x$ for stability in the respective time stepping. See Figure 1.

Given $[u^{k}_{n-1},\dot{u}^{k}_{n-1}]$ at $t_{n-1}$ , the fine and coarse propagators are applied to obtain the solutions to define

[TABLE]

and

[TABLE]

For readability, we will write in-line vector $[v,w]$ and full vector

[TABLE]

interchangeably. These solutions are propagated over a coupling time interval $[t_{n-1},t_{n})$ . These propagators are expected to approximately preserve the wave energy.

Finally, we will quickly describe the data matrices that will be used to construct the operators $\theta^{k}_{n}$ . We are interested in using the computed solution data, particularly the gradient of the wavefield $u$ and a weighted momentum of $\dot{u}$ . Each column of data matrices is formed by block(s) of the gradients $\nabla U_{n}$ followed by a block of momentum $\dot{U}_{n}$ of coarse grid solution at $n$ -th coupling time. In practice, the gradient operator, $\nabla$ , will be replaced by some numerical approximation $\nabla_{h}$ . Then define the data:

[TABLE]

Here and for the rest of the paper, $c^{-1}\dot{U}_{n}$ denotes the component-by-component multiplications of $c^{-1}(x_{j})$ and $\dot{U}_{n}(x_{j})$ . The same convention is used for $c^{-1}\mathcal{R}\dot{u}_{j}.$

Now, define the discrete wave energy function as

[TABLE]

We see that it is equivalent, up to a constant, to the Frobenius norm of the $\mathsf{G}$ :

[TABLE]

3.2. Minimization of coarse-fine solution gaps

For simple plane waves, it is well known that the phase error, not the amplitude difference, between coarse and fine solutions, causes the parareal iteration to converge slowly or diverge [35, 22]. If two plane waves are in phase, parareal style updates can effectively correct the amplitude error. For general wave solutions, it is inconvenient to work with the phase notion defined by the plane wave solutions. Instead, we consider the discrete wave energy semi-norm (9) which is induced by the $\ell^{2}$ inner-product of the energy component vectors, i.e. the columns of $\mathsf{F},\mathsf{G}$ in (7) and (8). Such inner-product gives us a notion of angle between two wave solutions. The proposed strategy to stabilize the parareal iteration is by minimizing the inner-product between coarse and fine energy component vectors without changing their $\ell^{2}$ norm. Similar strategies of using wave energy to compare wavefields for wave propagation purposes have been used successfully, for example in seismic imaging [31], wavefield approximation by Gaussian beams [39].

Denote the $j$ -th column of $\mathsf{F}$ and $\mathsf{G}$ by $f_{j}$ and $g_{j}$ respectively. We consider the following optimization problem:

[TABLE]

Recall that the elements in the columns of $\mathsf{F}$ and $\mathsf{G}$ consist of the spatial gradients and weighted time derivatives of the solutions on the respective fine and coarse grid, and that the $\ell^{2}$ norm corresponds to the discrete wave energy (9). Therefore, we look for a unitary matrix so that the discrete wave energy of the corrected coarse solutions is the same as before correction. Intuitively the correction operator aligns the phase (in the above sense) of the coarse solution to fine solution for each $t_{n}$ . It is similar to the local phase-alignment procedure in [1] as depicted in Figure 2. Indeed, from each term in the summation

[TABLE]

the minimization can be interpreted as minimizing the sum of the angles between the columns on the data matrices. Thus, we shall refer to $\mathsf{\Omega}$ as the phase corrector.

The minimization problem (11) is equivalent to the ”Procrustes Problem” [17]:

[TABLE]

where $||\cdot||_{F}$ denotes the Frobenius norm of a matrix. An in-depth review of the Procrustes problem can be found in [18]. Its variants have been instrumental to multidimensional statistical analysis, rigid body motion simulation, satellite tracking and machine learning [42, 32, 19].

3.3. Solution to the optimization problem

The optimization problem (11) can be solved in a couple of different ways. One of them is to use the singular value decomposition (SVD) of the correlation matrix

[TABLE]

If matrix $\mathsf{M}$ has full rank, the minimizer of (11) is uniquely

[TABLE]

where $\mathsf{X},\mathsf{Y}$ are the left and right singular vectors of $\mathsf{M}=\mathsf{X}\mathsf{\Sigma}\mathsf{Y}^{T}$ . Correspondingly, the minimum residual is

[TABLE]

Figure 2 illustrates the Procrustes problem and its solution in a simple setup in $\mathbb{R}^{2}$ .

3.3.1. Low rank approximation of $\mathsf{\Omega}_{*}$

We now consider a low rank approximation of $\mathsf{\Omega}_{*}$ for computational efficiency. Since the number of time slices is usually much smaller than the number of (coarse) spatial grid nodes, i.e. $N\ll(d+1)N_{\Delta x}$ , we can factorize the data matrices using the reduced QR factorization. Denote the factorizations by $\mathsf{F}=\mathsf{Q}_{F}\mathsf{R}_{F}$ and $\mathsf{G}=\mathsf{Q}_{G}\mathsf{R}_{G}$ , where

[TABLE]

With the singular value decomposition of the smaller system $\mathsf{R}_{F}\mathsf{R}_{G}^{T}=\mathsf{X}_{F}\Sigma\mathsf{Y}_{G}^{T}$ , the correlation matrix can be factored into

[TABLE]

The last relation shows that

[TABLE]

Hence we can use the factorization of the smaller $N\times N$ matrix $\mathsf{R}_{F}\mathsf{R}_{G}^{T}$ to obtain

[TABLE]

By setting a tolerance to singular values in $\Sigma$ , there are $s$ singular values such that $\sigma_{s}\geq tol$ remained. As the result, we only need to store $s$ number of columns in $\mathsf{Q}_{F},\mathsf{Q}_{G}$ , and the truncated phase corrector becomes

[TABLE]

3.3.2. Enriching the phase corrector $\mathsf{\Omega}_{*}$

After every parareal iteration, more data becomes available. We can use this data to enrich the phase corrector. Define

[TABLE]

The singular value decomposition of $M^{k+1}=\tilde{\mathsf{U}}\tilde{\mathsf{S}}\tilde{\mathsf{V}}^{T}$ can be updated using that of $M^{k}=\mathsf{U}\mathsf{S}\mathsf{V}^{T}$ , see [7]. We summarize the update procedure is Algorithm 1.

3.4. Reconstruction of wavefield from the gradient

After correcting the energy components, i.e. the gradients and the weighted time derivatives, of the coarse solutions, it is necessary to reconstruct the wavefield pair from the corrected energy components. In other words, we denote $[q,p]$ as the corrected energy components of a wavefield pair $[w,\dot{w}]$

[TABLE]

where the mapping $\Lambda:[w,\dot{w}]\mapsto[\nabla_{h}w,c^{-1}\dot{w}]$ takes function to wave energy components. Then we want to deduce the corrected wavefield pair $[v,\dot{v}]$ such that

[TABLE]

It is straightforward to find the latter component $\dot{v}=cp$ . For the displacement component $v$ , we use the spectral property of differentiation $\texttt{fft}\{\nabla v\}=i\boldsymbol{\xi}\texttt{fft}\{v\}$ to recover its the Fourier modes as follow

[TABLE]

We denote this mapping from energy component to wavefield component as $\Lambda^{\dagger}:[\nabla v,c^{-1}\dot{v}]\mapsto[v,\dot{v}]$ . In particular, when the gradient is approximated by Fourier method, this reconstruction is an identity.

Proposition 3.1.

Suppose the gradient of function $v(x)$ is estimated by spectral method $\nabla_{h}v\equiv\texttt{ifft}\{i\boldsymbol{\xi}\texttt{fft}\{v\}\}$ , then

[TABLE]

Proof.

Let

[TABLE]

Since $\Lambda$ maps function to energy components we have

[TABLE]

By construction of $\Lambda^{\dagger}$ , for nonzero wavenumber $|\boldsymbol{\xi}|\neq 0$

[TABLE]

Here the gradient is approximated using spectral method then

[TABLE]

And for zero wavenumber $|\boldsymbol{\xi}|=0$ , $\texttt{fft}\{w\}=\sum_{j}v(x_{j})=\texttt{fft}\{v\}.$ Thus, $w=v$ while the second energy component $\dot{w}=cc^{-1}\dot{v}=\dot{v}.$ This concludes that the mapping $\Lambda^{\dagger}\Lambda$ is equal to identity. ∎

If the gradient is approximated by a central finite difference of $2m$ -order instead of the spectral method, for one dimensional setting equation (20) in the proof above becomes

[TABLE]

where $\beta_{j}$ are appropriate coefficients of the difference stencil. When the spatial grid is small enough $\xi\Delta x\ll 1,$ above expression is approximately

[TABLE]

Particularly for the second order central difference $m=1$ , we would have $\beta_{1}=1/2$ , then

[TABLE]

which says that $|\texttt{fft}\{w\}|\leq|\texttt{fft}\{v\}|$ because $\text{sinc}(\xi\Delta x)\leq 1$ .

In practice, we observe that the algorithm does not require spectral approximation of the gradient, but instead $\|\Lambda^{\dagger}\Lambda\|_{2}\leq 1$ is necessary for stability of the method. When central finite difference is utilized, it is well known that the modified wavenumber is less than $|\boldsymbol{\xi}|$ , hence central difference satisfies the requirement $\|\Lambda^{\dagger}\Lambda\|_{2}\leq 1$ . Algorithm 2 summarizes above procedure.

3.5. The proposed algorithm

The proposed algorithm couples the fine and the coarse propagators at times $n\Delta t_{com},n=1,2,\dots,N$ over the fine grid (the spatial grid that the fine solutions are defined). However, it is important to note that the phase corrections are applied on the coarse grid. If the two grids are not identical, an interpolation is needed. We denote the interpolation operator by $\mathcal{I}$ . Furthermore, denote the mappings between the wavefield $[v,\dot{v}]$ and its energy components $[\nabla v,c^{-1}\dot{v}]$ by $\Lambda:[v,\dot{v}]\mapsto[\nabla v,c^{-1}\dot{v}]$ and $\Lambda^{\dagger}:[\nabla v,c^{-1}\dot{v}]\mapsto[v,\dot{v}]$ . With these notations, the $\theta$ operator after $k$ iterations can be written as

[TABLE]

Here we use $\mathsf{\Omega}^{k}_{*}$ to denote the phase corrector derived from the data matrix $\mathsf{M}^{k}$ .

Finally, our new algorithm can be written compactly as in $\theta$ -parareal form

[TABLE]

Algorithm 3 describes the new scheme in a pseudo-code form with more details.

Similar to the Krylov subspace method [15, 10], our method requires orthogonalization of data matrices, but they are formed in a different way. In this paper, the data matrices are the multiplication of the wave energy components of the fine data and the coarse data as in equation (14). Then the phase correctors are constructed from the singular value decomposition of the data matrices. In [15, 10], the data matrices, consisting of computed solutions, are orthogonalized to form projection operators. In contrast, our phase correctors are not projections, but they effectively induce translation of the coarse solutions on constant energy submanifolds.

4. Complexity Analysis

There are three parts to our implementation: parallel fine propagator computation, construction of $\mathsf{\Omega}^{*}$ and the serial coarse updates. We assume that (i) no spatial domain decomposition, i.e. whole domain on a single core, (ii) standard QR complexity, i.e. no multithreading, (iii) communication between nodes and other tasks negligible.

In each iteration, the wall clock complexity for the parallel fine and coarse computations is in the order of

[TABLE]

where $n_{CPU}$ is the number of cores, $N_{\delta x},N_{\Delta x}$ are respectively the total number of fine and coarse grid points. The complexity of serial coarse update in an iteration is

[TABLE]

The complexity of standard QR factorization for constructing $\mathsf{\Omega}$ is

[TABLE]

Therefore, the total complexity is

[TABLE]

where $K$ is the number of iterations. In this set up, the speed up over a serial fine computation is

[TABLE]

Additionally, we have coarse/fine time step ratio $\Delta t/\delta t=m_{t}$ , which implies $\Delta t_{com}/\delta t\geq m_{t}$ , and their corresponding the mesh ratio is $\Delta x/\delta x=m_{s}$ . Hence the theoretical speed up is

[TABLE]

We note that the third term in above speed up is derived from the classical $N^{2}$ complexity for QR factorization (of matrices of fixed number of rows). The speed up analysis (32) presents the worst-case asymptotics as N approaches infinity. In practice, we observe that QR factorization has sub-quadratic scaling when multithreading, ubiquitous in modern computers, is enabled. However, to our knowledge, speed up analysis of QR in a multithreading environment is not straightforward. To illustrate the effectiveness of multithreading in computing QR factorization, consider random matrices with fixed $100,000$ number of rows and vary the number of columns in a way relevant to the paper. The computing time is presented in Figure 3. The computing time roughly grows as the power of $1.5$ of the number of columns, rather than quadratically according to the classical QR complexity. We also see that having 68 threads speed up the computation by more than a factor of 10. Finally from numerical experiments, the QR step in our algorithm takes relatively small amount of time compared to other components, see Section 7.2.4.

5. Stability and convergence

In this section, we will derive some estimates that show the stability and the convergence of algorithm 3 under certain assumptions. We measure the difference, in the discrete energy semi-norm on the coarse grid, between the serially computed fine solution and the iterated solution.

Consider energy components of parareal iterated solution restricted on the coarse grid

[TABLE]

Its parareal iterative coupling is expressed as equation (29)

[TABLE]

Recall that ${\theta}[v,\dot{v}]:=\mathcal{I}\Lambda^{\dagger}\mathsf{\Omega}\Lambda[v,\dot{v}]$ , so

[TABLE]

Since the restriction operator takes point wise values, it cancels action of the interpolation $\mathcal{R}\mathcal{I}=1$ . So equation (33) becomes

[TABLE]

Let us denote the square root of wave energy as $\mathcal{E}([U,\dot{U}]):=\sqrt{E([U,\dot{U}])}$ , where $E$ is defined in (9). Thus,

[TABLE]

Theorem 5.1.

Suppose that

(1)

the coarse propagator $\mathcal{C}$ satisfies, for some $\epsilon>0$ ;

[TABLE] 2. (2)

the residual of the energy minimization problem is bounded uniformly for $k=1,2,\dots$ :

[TABLE]

where $\mathsf{F}^{k},\mathsf{G}^{k}$ are data matrices in (7),(8) gathered in the first $k$ iterations; 3. (3)

$\|\Lambda^{\dagger}\Lambda\|_{2}\leq 1$ , and $\|1-\Lambda^{\dagger}\Lambda\|_{2}<\lambda{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\ll}1/N$ .

Then

[TABLE]

where $C$ is a norm equivalence constant between $\ell_{2,1}$ norm (sum of $\ell_{2}$ norm of columns) and Frobenius norm.

Proof.

Consider the square root of wave energy of (34)

[TABLE]

We apply triangle inequality to obtain

[TABLE]

By construction, $\|\mathsf{\Omega}\|_{2}=1$ , and by the hypotheses that $\|\Lambda\Lambda^{\dagger}\|_{2}\leq 1$ and energy bound of the coarse propagator,

[TABLE]

we have

[TABLE]

Seeing the third term as part of the energy minimization problem in (10),

[TABLE]

As the above relation also holds for $\max_{j\leq N}\mathcal{E}([U^{k}_{j},\dot{U}^{k}_{j}])$ , therefore,

[TABLE]

Applying the discrete Grönwall inequality [21] on index $k$ we get

[TABLE]

By the assumption $\lambda N\ll 1$ ,

[TABLE]

∎

Next, we will show that, under some hypotheses, the proposed method converges to the solutions computed by applying the fine propagator serially. The hypotheses involve Lipschitz smoothness of the phase corrector, which implies the minimization problem (11) is solved with sufficient accuracy. We shall use the following notation for those reference solutions:

[TABLE]

We measure the overall error on the fine grid as the square root of the difference in the discrete wave energy:

[TABLE]

Hypothesis 5.1.

(i) The phase corrected coarse solution is Lipschitz continuous in energy

[TABLE]

Let $\epsilon_{\theta}$ denote the overall perturbation

[TABLE]

(ii) The energy error between fine and corrected coarse operators is Lipschitz continuous

[TABLE]

Theorem 5.2.

Suppose that the fine and corrected coarse operators satisfy Hypothesis (44) and (51). Then,

[TABLE]

Proof.

In the following expansion of the parareal iteration, the superscript $k$ in $\theta^{k}$ are dropped for brevity

[TABLE]

It can be verified that the serial fine solution $[u(t_{n+1}),\dot{u}(t_{n+1})]$ also satisfies above expression when superscript $k,k-1$ are dropped in solution vector $[u^{k}_{\cdot},\dot{u}^{k}_{\cdot}]$ . Then we have an expression for the difference of the solutions

[TABLE]

Recall the square root of energy error is defined as

[TABLE]

Using triangle inequality on $\mathcal{E}^{k}_{n+1}$ with equation (57) we obtain

[TABLE]

Apply equation (44) in Hypothesis 5.1 (i) to bound each term

[TABLE]

Finally we use equation (51) in Hypothesis 5.1 (ii) to obtain

[TABLE]

Thus

[TABLE]

By assumption $\kappa N<1$ and $\epsilon_{\theta}N\ll 1$ , the error goes to zero as $k$ approaches infinity. ∎

We see that the convergence depends on the Lipschitz constant $\kappa$ in Hypothesis 5.1 (ii), which reflects the gap between the corrected coarse propagator to the fine propagator. This gap between propagators is quantified by the energy residual of the minimization (13).

6. Numerical Study of the New Algorithm

In this section, we study the influence of different components of the proposed algorithm to the overall stability and accuracy. From Section 6.1 to Section 6.4, we consider the influence of (i) varying the low-rank approximation of the optimal phase correctors $\mathsf{\Omega}_{*}$ , (ii) effect of the phase corrector and the parareal update, (iii) different orders of approximation for the gradient $\nabla_{h}$ and interpolation operator $\mathcal{I}.$ Regarding to the last item, we will use the following interpolation methods, written as MATLAB functions, in this section:

•

interpft: Fourier interpolation

•

akima: cubic Hermite interpolation

•

pchip: cubic interpolation

•

linear: linear interpolation

From Section 6.1 to Section 6.4, we shall consider the simplest one dimensional setting with $c\equiv 1$ for both coarse and fine propagator, and the initial data:

[TABLE]

For Section 6.5, we consider random subsampling of the data matrices to exploit their observed low rank property. In this study, we consider a two dimensional problem with variable wave speed.

We will assume that the coarse grid nodes overlap with the fine grid nodes, and that the restriction operator $\mathcal{R}$ is just a point-wise evaluation on the coarse grid nodes.

The errors at final time $T_{N}=T$ are defined as square root of energy of difference on the fine grid

[TABLE]

And similarly the error can also be defined in $\ell^{2}$ of difference in displacement component

[TABLE]

The reference solution $[u(t_{N}),\dot{u}(t_{N})]$ are serially computed using the fine propagators.

6.1. Rank tolerance of the phase corrector

In this example, we study the sensitivity of the algorithm to rank-truncation of the optimal phase corrector $\mathsf{\Omega_{*}}.$ We use the same spatial grid for both the coarse and the fine propagators in order to avoid error coming from interpolation/restriction. The fine propagator has an CFL number that is 20 times smaller than the coarse, and the coupling take place every 10 coarse steps. We sample several values for tolerance in Algorithm 1 at $10^{-15},10^{-12},10^{-9},10^{-6},10^{-3}$ . The parameters are tabulated below:

[TABLE]

Figure 4 shows the relative energy error along with the iterations as the tolerance in the truncation of $\mathsf{\Omega}_{*}$ is varied. The errors decrease in the first few. The rate of decrease seem independent of the chosen tolerance values. As more iterations progress, the errors convergence eventually stagnate at certain values that strongly correlate to the chosen tolerance values. Particularly, the stagnated error values scale as the square root of the tolerance as shown on the right plot of Figure 4. This scaling can be explained by the fact that the tolerance corresponds to the truncation of $\mathsf{\Omega}_{*}$ , which modifies the wave energy components, and we measure the square root of wave energy difference. Hence in general, the convergence rate of our method is expected to slow down after the error has passed $10^{-8}$ because the tolerance can only be as small as machine epsilon $10^{-16}$ . Figure 5 shows the number of retained singular values for different values of tolerance.

6.2. The effect of phase correction ( $\mathsf{\Omega}\equiv 1$ )

Assuming again that the coarse and fine propagators are on the same grid. Without the phase correction, i.e. $\mathsf{\Omega}\equiv 1$ , the proposed iteration takes the form

[TABLE]

The above expression becomes the plain parareal method if the term $\Lambda^{\dagger}\Lambda=1$ . But when the first wave component $\nabla_{h}u$ is approximated by some finite difference, the term $\Lambda^{\dagger}\Lambda\neq 1$ in general. In particular when $\nabla_{h}$ is approximated by the standard second order central difference, i.e. $\nabla_{h}=D_{\Delta x}^{0}$ , $\Lambda^{\dagger}\Lambda$ corresponds to multiplication of

[TABLE]

to the Fourier mode of the solutions. Since $|\mathrm{sinc}(\xi\Delta x)|\leq 1$ , $\Lambda^{\dagger}\Lambda$ damps high frequency modes, and thus stabilizes parareal-like iterations.

Nevertheless, for long time simulations, such high frequency damping may be insufficient to stabilize the parareal-like iterations. To illustrate this, we take the same discretization as above but now consider four terminal times $T=2.5,5,10,50$ :

[TABLE]

$*:{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\{2.5,5,10,50\}}.$ Figure 6 presents a comparison of the errors computed with $\mathsf{\Omega}\equiv 1$ and with $\mathsf{\Omega}=\mathsf{\Omega}^{k}_{*}$ , for different terminal times. For shorter time intervals, such as $T=2.5$ , the two choices of $\mathsf{\Omega}$ yield similar convergence rates until after some iterations when the errors computed with $\mathsf{\Omega}_{*}$ plateau around a much larger value. For larger terminal times, $T=5,10,50$ , the instability that comes with using $\mathsf{\Omega}\equiv 1$ becomes more and more apparent, while the computations with $\mathsf{\Omega}=\mathsf{\Omega}^{k}_{*}$ remain stable.

6.3. The effect of parareal-like corrections

If the parareal-style additive correction is omitted, solution is propagated with just the phase corrected coarse propagator:

[TABLE]

The simulation parameters are given as follow

[TABLE]

We first point out that if $\mathcal{C}$ preserves the discrete wave energy, then the above scheme will also preserve it by construction of $\theta^{k}$ . Figure 7 shows the errors comparing to the serial fine solution. At iteration $k=1$ , the solution is serially computed with the coarse propagator $\mathcal{C}$ . At iteration $k=1$ , a phase corrector $\theta^{2}$ is constructed based on the data computed in $k=1$ . The solution at $k=2$ is serially computed with $\theta\mathcal{C}$ . On the right subplot of Figure 7, we see that the coarse solution now has the same phase as the fine solution, but has a slightly different amplitude. For iteration after $k=3$ , however, the error does not decrease further since the parareal-style additive correction has been omitted. Comparing to the examples with similar simulation parameters presented in the previous subsection, we see that the parareal-style correction

[TABLE]

is important, as it adds the missing amplitudes back to improve accuracy (when the solutions are properly aligned).

6.4. Influence of interpolation and gradient approximation

So far in this section, we have only considered examples in which the coarse and fine propagators operate on the same spatial grid. When these propagators are on two different grids, interpolation is needed to couple the solutions. In this subsection, we study the effect of interpolation. To illustrate this point, take coarse/fine grid ratio to be 2 and keep the discretization as before

[TABLE]

$*:{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\{\texttt{interpfft},\texttt{akima},\texttt{pchip},\texttt{linear}\}}.$ Input wave speed for coarse propagator is $c=1$ as well. Figure 8 shows the error convergence with different methods for grid interpolation. We observe particularly for this example that the spectral interpolation interpft performs better than the lower order methods because it resolves the initial wave form much better.

We also study the influence of the accuracy in approximating the gradient of the wavefield in forming the data matrices. We observe from the following examples that higher order approximations of gradient estimation accelerates convergence rate of the proposed method. The parameters used in the simulations are tabulated below:

[TABLE]

$*:{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\{2,4,6,8,spectral\}}.$ To isolate other factors that can also influence the convergence rate, the table below shows the relative residual in Sec 3.3 averaged over all iterations, denoted as $\Bigg{\langle}\|\mathsf{F}-\mathsf{\Omega}^{k}_{*}\mathsf{G}\|_{F}\Big{/}\|\mathsf{F}\|_{F}\Bigg{\rangle}_{k}$ . We see that the residual does not change while we increase the order of finite difference. In the last column, the errors in reconstruction of $U$ from the its approximated gradient is provided. To be specific, we denote operation in equation (18) as $Y:\nabla_{h}v\mapsto v$ .

[TABLE]

Figure 9 shows the convergence of errors for different central differencing and Fourier approximations for $\nabla_{h}$ . The ones with second order approximation has the slowest convergence rate, while those using sixth order or higher converge faster.

6.5. Random subsample of the data matrices

We point out here that the cost of the stabilization can be further reduced by certain randomized algorithms [41, 27]. These randomized algorithms exploit the observed low-rank nature of our data matrices. Another approach is directly subsampling the data matrices. To illustrate the low-rank property, consider a plane wave in a 2D wave guide

[TABLE]

with the following discretization

[TABLE]

After each parallel computation of coarse and fine data $\mathsf{G},\mathsf{F}$ , we plot the normalized strength of singular values of the correlation matrix $\mathsf{M}=\mathsf{F}\mathsf{G}^{T}$ for a few iterations in Figure 10.

The normalized strength of the singular values drops exponentially in this particular example. A quick and simple strategy to exploit this low-rank property is to randomly sample time slices in matrices $\mathsf{F}$ and $\mathsf{G}$ . By reducing the sample size, the data matrices becomes thinner so that QR factorization is faster. We compared the convergence of different sample sizes in Figure 10.

7. Numerical Examples

In this section, we shall consider one and two dimension examples, including an example that involve a large scale wave speed model commonly used in the seismic migration community. When the spatial grid of coarse and fine are different, wave speed on the coarse grid is point wise evaluation of the given wave speed.

7.1. One dimensional examples

Consider a medium with the wave speed

[TABLE]

and the initial wavefield in $[-0.5,0.5]$

[TABLE]

We present a numerical simulation using the parameters listed below:

[TABLE]

The fine propagator operate on a spatial grid which is 10 times finer than the coarse grid, and uses a CFL which is 10 times smaller. Figure 11 shows convergence of the proposed method comparing to the plain parareal. Because fine and coarse solution in a variable medium may differ a lot, the plain parareal method becomes even more unstable.

7.2. Two dimensional cases

We apply the proposed method to three types of media: one with a smoothly varying wave speed (wave guide), one containing a piece-wise constant wave speed (inclusion), and a more complicated wave speed profile which is often used in exploration seismology as a standard case study (Marmousi).

7.2.1. Waveguide

We consider a wave guide in $xy$ -plane $[-1,1]\times[-0.5,0.5]$ with the wave speed

[TABLE]

The initial data is a plane wave traveling left to right along the $x$ -axis:

[TABLE]

The parameters used in the simulation are set as follow

[TABLE]

Figure 12 shows error of the solution with different coarse fine grid ratio.

7.2.2. Inclusion

In this example, we consider the two dimensional domain in the $xy$ -plane $[-1,1]\times[-0.5,0.5]$ where a plane wave encounters an inclusion of radius $\sqrt{0.002}$ centered at $[0.5,0.1]$ , modeled by the wave speed

[TABLE]

We used the initial data traveling from left to right

[TABLE]

and discretization parameters

[TABLE]

When the coarse grid is the same as fine grid, the iterations converge to the serial fine solution for the whole time interval (shown in left subplot of Figure 13). On the other hand, when coarse/fine grid ratio is $5$ , the right subplot of Figure 13 shows that the error escalates quickly at $n=50$ (or $t=1$ ), when the initial plane wave hits the inclusion for the first time, and again at $n=150$ (or $t=2$ ), as some parts of the initial plane wave wraps around the domain the interact with the inclusion again. The error does not decreasing for later iterations.

Figure 14 shows the relative density error in the Fourier modes of the computed solution at different times. For the short time range $n=15$ (before the wave energy is scattered by the inclusion), most of the error concentrates at low frequencies which the coarse grid is able to resolve. Once the wave touches the inclusion at $n=40$ and thereafter $n=140,n=200$ , the errors in the higher frequencies becomes significant. These scattered higher frequency wave is not resolved by the coarse grid and cannot be corrected by the proposed method.

7.2.3. Marmousi experiment

We test our method with the Marmousi wave speed model [8], as shown in Figure 15. The fine scale domain has $2422\times 7367$ grid points while the coarse scale has $49\times 147$ grid points or 50 times smaller in each dimension. The initial data is a pulse waveform centered at $x_{0}=(400m,3880m),$ where $m$ denotes the length unit in meter,

[TABLE]

The discretization parameters are in the following table where coarse and fine computation communicate every $500$ coarse time steps

[TABLE]

The computation is executed on one node consisting of $20$ cores on the Stampede2 system at Texas Advanced Computing Center (TACC). With our non-optimized MATLAB code, it took 26 hours to run 6 parareal iterations and 12 hours to compute the serial fine solution. Hence each iteration takes about 4 hours, almost 3 times faster on the wall clock than the serial fine computation. For more detailed experiment, see Section 7.2.4.

Figure 16 shows the solutions computed by the proposed method. One observes that some finer details are added back to the computed solution along the iterations. However, Figure 17 reveals that the errors decreases rather slowly after the first few iterations. Indeed, the setup in this experiment is a challenging example of strong scattering due to discontinuities in the wave speed (compared to the previous Example).

It is natural to wonder if the proposed method computes solutions that would converge to the serially computed fine solutions, when the coarse and fine propagators run on the same spatial grid. For this purpose, it suffices to consider a smaller version of the Marmousi velocity model, which is defined on $485\times 1474$ grid points. A different set of discretization parameters are described in the following table

[TABLE]

Figure 18 shows the absolute error $|u^{k}_{n}-u(t_{n})|$ and energy error fields at iterations $k=1$ and $k=7$ . On the left column, we see that the solution at $k=1$ has larger point-wise absolute error and energy error in regions of high wave speed contrast (e.g. the lower left region in the image domain) than the regions of low wave speed contrast (e.g. upper left region in the image domain). On the right column, however, the solution at $k=7$ has large patches of point-wise absolute error at regions of low wave speed contrast. These errors contribute to the increase of overall $\ell^{2}$ error in the initial few iterations shown in the right subplot in Figure 19.

We observe a discrepancy between the two errors curves. The energy error decreases while the $\ell^{2}$ error increases, particularly in the regions of low wave speed contrast. This discrepancy in regions of low wave speed contrast is likely due to the construction of the phase corrector. At regions of high contrast, when locally scattered wave emerged, the phase corrector is constructed to decrease the error there but because it is a global operator, it also perturbs solution everywhere else that in effect increases the overall error.

7.2.4. Timing

To see how wall clock computing time changes as the number of cores changes, we use the Marmousi model again and the discretization parameters are as follow

[TABLE]

We used an Intel Skylake node and varied the number of cores to perform the computation. In Table 1, computing time in seconds is recorded for different parts in the algorithm: parallel computation, creation of the phase corrector (requiring QR), serial coarse update. We see the computing time of the stabilization process is small, relative to other parts of the algorithm. As a benchmark, we also timed the serial fine computation. The projected speed up is calculated as if the number of cores is equal to the number of time slices $n_{CPU}=N=T/\Delta t_{com}$ .

8. Summary and conclusion

We present here a new stable parareal-like method for the second order wave equation. The method uses the solutions computed along the iterations to construct linear operators which bridge the energy difference between the coarse and fine propagators. Such operators are referred to as the phase correctors in this paper. We presented an extensive set of numerical studies which aim at revealing the properties of the proposed method. From the experiments, we see that the proposed method works well for constant and smooth wave speeds.

For piece-wise smooth wave speeds, the algorithm is stable, but does not seem to produce numerical solutions that converge to the solutions computed by the fine propagators (as the number of iterations increase), when the fine and coarse propagators run on different spatial resolution. This is expected because the higher Fourier modes of the solutions computed by the fine propagator on a finer spatial grid cannot be resolved by coarser grids. This is true even when the initial wavefield is resolved by the coarse grid. As our simulations reveal, the stagnation of the errors may be caused additionally by a couple of different approximations used in the algorithm. This paper outline these factors for future improvement. In the last two examples involving piece-wise smooth wave speeds with high contrast, we observe that the relative errors are in general much larger than the previous cases. Most likely, this is due to strong local scattering of waves cause by the discontinuities in the wave speeds. Such scatterings cannot be corrected efficiently by the proposed Procrustean approach.

Finally, if domain decomposition in space is applied, due to the finite speed of propagation nature of wave, we expect that different phase correctors in the subdomains can be constructed in the same way and the resulting algorithm would be stable. This important topic should be investigated more carefully in a separate paper.

Acknowledgment

The authors are supported partially by NSF grants DMS-1620396 and DMS-1720171. Nguyen is supported by an ICES NIMS fellowship. Part of this research was performed while the second author was visiting the Institute for Pure and Applied Mathematics (IPAM), which is supported by the National Science Foundation (Grant No. DMS-1440415). This work was partially supported by a grant from the Simons Foundation. The authors thanks TACC for providing computing resources.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Ariel, S. J. Kim, and R. Tsai. Parareal multiscale methods for highly oscillatory dynamical systems. SIAM Journal on Scientific Computing , 38(6):A 3540–A 3564, 2016.
2[2] G. Ariel, H. Nguyen, and R. Tsai. theta-parareal scheme. ar Xiv:1704.06882 [math.NA] , 2017.
3[3] A. Arteaga, D. Ruprecht, and R. Krause. A stencil-based implementation of parareal in the c++ domain specific embedded language stella. Applied Mathematics and Computation , 267:727–741, 2015.
4[4] G. Bal. On the Convergence and the Stability of the Parareal Algorithm to Solve Partial Differential Equations , pages 425–432. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.
5[5] A.-M. Baudron, J.-J. Lautard, Y. Maday, M. K. Riahi, and J. Salomon. Parareal in time 3d numerical solver for the lwr benchmark neutron diffusion transient model. Journal of Computational Physics , 279:67–79, 2014.
6[6] A. Blouza, L. Boudin, and S. M. Kaber. Parallel in time algorithms with reduction methods for solving chemical kinetics. Communications in Applied Mathematics and Computational Science , 5(2):241–263, 2011.
7[7] M. Brand. Fast low-rank modifications of the thin singular value decomposition. Linear algebra and its applications , 415(1):20–30, 2006.
8[8] A. Brougois, M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg. Marmousi, model and data. In EAEG workshop-practical aspects of seismic data inversion , 1990.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A stable parareal-like method for the second order wave equation

Abstract.

Key words and phrases:

1. Introduction

2. Preliminary background

3. The proposed method

3.1. Discretizations and data preparation

3.2. Minimization of coarse-fine solution gaps

3.3. Solution to the optimization problem

3.3.1. Low rank approximation of Ω∗\mathsf{\Omega}_{*}Ω∗​

3.3.2. Enriching the phase corrector Ω∗\mathsf{\Omega}_{*}Ω∗​

3.4. Reconstruction of wavefield from the gradient

Proposition 3.1**.**

Proof.

3.5. The proposed algorithm

4. Complexity Analysis

5. Stability and convergence

Theorem 5.1**.**

Proof.

Hypothesis 5.1**.**

Theorem 5.2**.**

Proof.

6. Numerical Study of the New Algorithm

6.1. Rank tolerance of the phase corrector

6.2. The effect of phase correction (Ω≡1\mathsf{\Omega}\equiv 1Ω≡1)

6.3. The effect of parareal-like corrections

6.4. Influence of interpolation and gradient approximation

6.5. Random subsample of the data matrices

7. Numerical Examples

7.1. One dimensional examples

7.2. Two dimensional cases

7.2.1. Waveguide

7.2.2. Inclusion

7.2.3. Marmousi experiment

7.2.4. Timing

8. Summary and conclusion

Acknowledgment

3.3.1. Low rank approximation of $\mathsf{\Omega}_{*}$

3.3.2. Enriching the phase corrector $\mathsf{\Omega}_{*}$

Proposition 3.1.

Theorem 5.1.

Hypothesis 5.1.

Theorem 5.2.

6.2. The effect of phase correction ( $\mathsf{\Omega}\equiv 1$ )