Computation of Volume Potentials on Structured Grids Using the Method of   Local Corrections

Chris Kavouklis; Phillip Colella

arXiv:1702.08111·math.NA·July 24, 2019

Computation of Volume Potentials on Structured Grids Using the Method of Local Corrections

Chris Kavouklis, Phillip Colella

PDF

TL;DR

This paper introduces an improved Method of Local Corrections for solving 3D Poisson's equation on structured grids, enhancing accuracy while maintaining low communication costs and demonstrating convergence through numerical examples.

Contribution

The paper develops a new version of MLC that achieves higher accuracy by decomposing local convolutions and incorporating Legendre expansions, reducing low-order errors of the original method.

Findings

01

Achieves asymptotic error bounds of O(h^P) + O(h^Q) + O(εh^2) + O(ε).

02

Maintains computational cost per patch similar to the original method.

03

Numerical examples confirm convergence and improved accuracy.

Abstract

We present a new version of the Method of Local Corrections (MLC) \cite{mlc}, a multilevel, low communications, non-iterative, domain decomposition algorithm for the numerical solution of the free space Poisson's equation in 3D on locally-structured grids. In this method, the field is computed as a linear superposition of local fields induced by charges on rectangular patches of size $O (1)$ mesh points, with the global coupling represented by a coarse grid solution using a right-hand side computed from the local solutions. In the present method, the local convolutions are further decomposed into a short-range contribution computed by convolution with the discrete Green's function for an $Q^{t h}$ -order accurate finite difference approximation to the Laplacian with the full right-hand side on the patch, combined with a longer-range component that is the field induced by the terms up to…

Equations192

Δ ϕ \equiv \frac{\partial ^{2} ϕ}{\partial x ^{2}} + \frac{\partial ^{2} ϕ}{\partial y ^{2}} + \frac{\partial ^{2} ϕ}{\partial z ^{2}} = f, in R^{3},

Δ ϕ \equiv \frac{\partial ^{2} ϕ}{\partial x ^{2}} + \frac{\partial ^{2} ϕ}{\partial y ^{2}} + \frac{\partial ^{2} ϕ}{\partial z ^{2}} = f, in R^{3},

ϕ (x) = - \frac{1}{4 π ∥ x ∥} \int_{R^{3}} f (y) d y + o (\frac{1}{∥ x ∥}), ∥ x ∥ \to \infty,

ϕ (x) = \int_{Ω} G (x - y) f (y) d y \equiv (G * f) (x),

ϕ (x) = \int_{Ω} G (x - y) f (y) d y \equiv (G * f) (x),

G (z) = - \frac{1}{4 π ∥ z ∥} .

(\nabla^{p} ϕ) (x) = O ((\frac{1}{∣∣ x - x _{0} ∣∣})^{∣∣ p ∣ ∣_{1} + 1} R^{3} ∣∣ f ∣ ∣_{\infty}) .

(\nabla^{p} ϕ) (x) = O ((\frac{1}{∣∣ x - x _{0} ∣∣})^{∣∣ p ∣ ∣_{1} + 1} R^{3} ∣∣ f ∣ ∣_{\infty}) .

G (D, r) = [l - (r, r, r), u + (r, r, r)], r \in Z

G (D, r) = [l - (r, r, r), u + (r, r, r)], r \in Z

\displaystyle\mathcal{C}(D)=\Big{[}\Big{\lfloor}\frac{\boldsymbol{l}}{{{N}_{ref}}}\Big{\rfloor},\Big{\lceil}\frac{\boldsymbol{u}}{{{N}_{ref}}}\Big{\rceil}\Big{]}

(Δ^{h} ϕ^{h})_{g} = s \in [- s, s]^{3} \sum a_{s} ϕ_{g + s}^{h}, a_{s} \in R .

(Δ^{h} ϕ^{h})_{g} = s \in [- s, s]^{3} \sum a_{s} ϕ_{g + s}^{h}, a_{s} \in R .

τ^{h} (ϕ) = C_{2} h^{2} Δ (Δ ϕ) + q^{'} = 2 \sum \frac{q}{2} - 1 h^{2 q^{'}} L^{2 q^{'}} (Δ ϕ) + h^{q} L^{q + 2} (ϕ) + O (h^{q + 2}),

τ^{h} (ϕ) = C_{2} h^{2} Δ (Δ ϕ) + q^{'} = 2 \sum \frac{q}{2} - 1 h^{2 q^{'}} L^{2 q^{'}} (Δ ϕ) + h^{q} L^{q + 2} (ϕ) + O (h^{q + 2}),

τ^{h} (ϕ) (x) = Δ^{h} (ϕ) (x) = h^{q} L^{q + 2} (ϕ) (x) + O (h^{q + 2}) .

τ^{h} (ϕ) (x) = Δ^{h} (ϕ) (x) = h^{q} L^{q + 2} (ϕ) (x) + O (h^{q + 2}) .

\frac{\partial ^{2 r} ϕ}{\partial x _{d}^{2 r}} = \frac{\partial ^{2 r - 2}}{\partial x _{d}^{2 r - 2}} (Δ ϕ) - d^{'} \neq = d \sum \frac{\partial ^{2 r}}{x _{d^{'}}^{2 r - 2} x _{d}^{2}} (ϕ) .

\frac{\partial ^{2 r} ϕ}{\partial x _{d}^{2 r}} = \frac{\partial ^{2 r - 2}}{\partial x _{d}^{2 r - 2}} (Δ ϕ) - d^{'} \neq = d \sum \frac{\partial ^{2 r}}{x _{d^{'}}^{2 r - 2} x _{d}^{2}} (ϕ) .

(G^{h} * f^{h}) = (Δ^{h})^{- 1} (f^{h}), (G^{h} * f^{h}) [g] \equiv g^{'} \in Z^{3} \sum h^{3} G^{h} [g - g^{'}] f [g^{'}]^{h}

(G^{h} * f^{h}) = (Δ^{h})^{- 1} (f^{h}), (G^{h} * f^{h}) [g] \equiv g^{'} \in Z^{3} \sum h^{3} G^{h} [g - g^{'}] f [g^{'}]^{h}

(\Delta^{h=1}G^{h=1})[{\boldsymbol{g}}]=\left\{\begin{array}[]{l}1,\hbox{ if }{\boldsymbol{g}}=\boldsymbol{0}\\ 0,\hbox{ otherwise}\end{array}\right.

(\Delta^{h=1}G^{h=1})[{\boldsymbol{g}}]=\left\{\begin{array}[]{l}1,\hbox{ if }{\boldsymbol{g}}=\boldsymbol{0}\\ 0,\hbox{ otherwise}\end{array}\right.

\displaystyle G^{h=1}[{\boldsymbol{g}}]=-\frac{1}{4\pi||{\boldsymbol{g}}||}+o\Big{(}\frac{1}{||{\boldsymbol{g}}||}\Big{)}{\hbox{ , }}\|{\boldsymbol{g}}\|\rightarrow\infty.

\displaystyle G^{h=1}[{\boldsymbol{g}}]=-\frac{1}{4\pi||{\boldsymbol{g}}||}+o\Big{(}\frac{1}{||{\boldsymbol{g}}||}\Big{)}{\hbox{ , }}\|{\boldsymbol{g}}\|\rightarrow\infty.

g \in D \sum h^{3} ∣ G^{h} [g] ∣ \leq C, C = C (nh), D \subseteq [- n, \dots, n]^{3},

g \in D \sum h^{3} ∣ G^{h} [g] ∣ \leq C, C = C (nh), D \subseteq [- n, \dots, n]^{3},

∣∣ G^{h} * f^{h} ∣ ∣_{\infty} \leq C^{'} ∣∣ f^{h} ∣ ∣_{\infty}, C^{'} independent of f, h,

∣∣ G^{h} * f^{h} ∣ ∣_{\infty} \leq C^{'} ∣∣ f^{h} ∣ ∣_{\infty}, C^{'} independent of f, h,

\displaystyle supp(f^{h})\subseteq\Big{[}-\Big{\lfloor}\frac{A}{h}\Big{\rfloor},\dots,\Big{\lceil}\frac{A}{h}\Big{\rceil}\Big{]}^{3}

Δ^{h} (ϕ) = \tilde{f}^{h} + O (h^{q})

Δ^{h} (ϕ) = \tilde{f}^{h} + O (h^{q})

\displaystyle\tilde{f}^{h}=f^{h}+\Big{(}C_{2}h^{2}(\Delta(f))^{h}+\sum_{q^{\prime}=2}^{\frac{q}{2}-1}h^{2q^{\prime}}\mathcal{L}^{2q^{\prime}}(f)^{h}\Big{)},

ϕ = G^{h} * f^{h} + C_{2} h^{2} f^{h} + O (h^{4}) .

ϕ = G^{h} * f^{h} + C_{2} h^{2} f^{h} + O (h^{4}) .

[(L^{q} G) * f] (x) = O ((\frac{1}{R})^{q - 2} \frac{1}{\frac{x}{R} - \frac{c}{R} _{\infty}^{q + 1}}) ∥ f ∥_{\infty} .

[(L^{q} G) * f] (x) = O ((\frac{1}{R})^{q - 2} \frac{1}{\frac{x}{R} - \frac{c}{R} _{\infty}^{q + 1}}) ∥ f ∥_{\infty} .

τ^{h} (f) = Δ^{h} (G * f) (x) = O ((\frac{h}{R})^{q} \frac{1}{\frac{x}{R} - \frac{c}{R} _{\infty}^{q + 3}}) ∥ f ∥_{\infty} .

τ^{h} (f) = Δ^{h} (G * f) (x) = O ((\frac{h}{R})^{q} \frac{1}{\frac{x}{R} - \frac{c}{R} _{\infty}^{q + 3}}) ∥ f ∥_{\infty} .

F^{H}=\left\{\begin{array}[]{l}\Delta^{H}(\phi)\text{ , on }D_{\beta}^{H,s}{\hbox{ , }}D_{\beta}^{H,s}={\mathcal{G}}({\mathcal{C}}(D_{\beta}^{H}),-s)\\ 0\text{ , on }\Omega^{H}\setminus D_{\beta}^{H,s}.\end{array}\right.

F^{H}=\left\{\begin{array}[]{l}\Delta^{H}(\phi)\text{ , on }D_{\beta}^{H,s}{\hbox{ , }}D_{\beta}^{H,s}={\mathcal{G}}({\mathcal{C}}(D_{\beta}^{H}),-s)\\ 0\text{ , on }\Omega^{H}\setminus D_{\beta}^{H,s}.\end{array}\right.

Δ^{H} (ϕ^{H} - G * f) =

Δ^{H} (ϕ^{H} - G * f) =

=

ϕ^{H} - G * f =

ϕ^{H} - G * f =

=

s u pp (f) \subset Ω = i ⋃ Ω_{R, i}, Ω_{R, i} = c^{i} + [- R, R]^{3}, i \in Z^{3}, c^{i} = (2 i + (1, 1, 1)) R .

s u pp (f) \subset Ω = i ⋃ Ω_{R, i}, Ω_{R, i} = c^{i} + [- R, R]^{3}, i \in Z^{3}, c^{i} = (2 i + (1, 1, 1)) R .

ϕ (x) = (G * f) (x) = i \sum (G * f^{i}) (x) .

ϕ (x) = (G * f) (x) = i \sum (G * f^{i}) (x) .

P (f^{i}) = p \in N^{3} : ∣∣ p ∣ ∣_{1} < P \sum ⟨ Q^{p}, f^{i} ⟩ Q^{p},

P (f^{i}) = p \in N^{3} : ∣∣ p ∣ ∣_{1} < P \sum ⟨ Q^{p}, f^{i} ⟩ Q^{p},

\displaystyle Q^{\boldsymbol{p}}(\boldsymbol{x})=R^{-\frac{3}{2}}\prod\limits_{d=1}^{3}Q^{p_{d}}\Big{(}\frac{x_{d}-c_{d}^{\boldsymbol{i}}}{R}\Big{)}{\hbox{ , }}\boldsymbol{x}\in{\Omega_{R,{\boldsymbol{i}}}}{\hbox{ , }}\boldsymbol{q}\in\mathbb{N}^{3},

F^{{\boldsymbol{i}},H}[\boldsymbol{g}]=\left\{\begin{array}[]{l}\Delta^{H}(G*f^{{\boldsymbol{i}}})[\boldsymbol{g}]\text{ , }\boldsymbol{g}\in{\Omega_{R,{\boldsymbol{i}},\alpha}^{H}}\\ \\ \Delta^{H}(G*\mathbb{P}(f^{{\boldsymbol{i}}}))[\boldsymbol{g}]\text{ , }\boldsymbol{g}\in{\Omega_{R,{\boldsymbol{i}},\beta}^{H}}\setminus{\Omega_{R,{\boldsymbol{i}},\alpha}^{H}}\\ \\ 0\text{ , otherwise}\end{array}\right.

F^{{\boldsymbol{i}},H}[\boldsymbol{g}]=\left\{\begin{array}[]{l}\Delta^{H}(G*f^{{\boldsymbol{i}}})[\boldsymbol{g}]\text{ , }\boldsymbol{g}\in{\Omega_{R,{\boldsymbol{i}},\alpha}^{H}}\\ \\ \Delta^{H}(G*\mathbb{P}(f^{{\boldsymbol{i}}}))[\boldsymbol{g}]\text{ , }\boldsymbol{g}\in{\Omega_{R,{\boldsymbol{i}},\beta}^{H}}\setminus{\Omega_{R,{\boldsymbol{i}},\alpha}^{H}}\\ \\ 0\text{ , otherwise}\end{array}\right.

F^{H} [g] = i \sum F^{i, H} [g],

F^{H} [g] = i \sum F^{i, H} [g],

ϕ^{H} = G^{H} * F^{H} .

ϕ^{H} = G^{H} * F^{H} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Computation of Volume Potentials on Structured Grids via the Method of Local Corrections

Chris Kavouklis, Phillip Colella

Computational Research Division

Lawrence Berkeley National Laboratory

1 Cyclotron Road, Berkeley, CA 94720, United States

Abstract

We present a new version of the Method of Local Corrections (MLC) [20], a multilevel, low communications, non-iterative, domain decomposition algorithm for the numerical solution of the free space Poisson’s equation in 3D on locally-structured grids. In this method, the field is computed as a linear superposition of local fields induced by charges on rectangular patches of size $O(1)$ mesh points, with the global coupling represented by a coarse grid solution using a right-hand side computed from the local solutions. In the present method, the local convolutions are further decomposed into a short-range contribution computed by convolution with the discrete Green’s function for an $Q^{th}$ -order accurate finite difference approximation to the Laplacian with the full right-hand side on the patch, combined with a longer-range component that is the field induced by the terms up to order $P-1$ of the Legendre expansion of the charge over the patch. This leads to a method with a solution error that has an asymptotic bound of $O(h^{P})+O(h^{Q})+O(\epsilon h^{2})+O(\epsilon)$ , where $h$ is the mesh spacing, and $\epsilon$ is the max norm of the charge times a rapidly-decaying function of the radius of the support of the local solutions scaled by $h$ . Thus we have eliminated the low-order accuracy of the original method (which corresponds to $P=1$ in the present method) for smooth solutions, while keeping the computational cost per patch nearly the same with that of the original method. Specifically, in addition to the local solves of the original method we only have to compute and communicate the expansion coefficients of local expansions (that is, for instance, 20 scalars per patch for $P=4$ ). Several numerical examples are presented to illustrate the new method and demonstrate its convergence properties. ††Keywords: Poisson solver, method of local corrections, Mehrstellen stencils, domain decomposition

1 Introdu]ction

We are interested in solving Poisson’s equation with infinite domain boundary conditions in three dimensions, that is

[TABLE]

where $f$ is a function with bounded support and by $\|\cdot\|$ we denote the Euclidean norm. It is well known that problem (1) has a solution if $f$ is Hölder continuous and has compact support $\Omega$ [12]. Furthermore, the solution of (1) is unique by means of a maximum principle argument for harmonic functions and is given as a convolution of the data with the three dimensional infinite domain Green’s function [10]

[TABLE]

In addition, if $\>\Omega\subset B(\boldsymbol{x}_{0},R)\>$ , where $\>B(\boldsymbol{x}_{0},R)\>$ is the closed ball of radius $\>R$ centered at point $\boldsymbol{x}_{0}$ , then $\phi$ is harmonic in $\mathbb{R}^{3}\backslash B(\boldsymbol{x}_{0},R)$ and hence real analytic. By differentiating (2), we find that the derivatives of the potential are rapidly-decaying functions of the form

[TABLE]

This suggests a domain-decomposition strategy, in which the contribution to the fields on each local domain is computed independently and the non-local coupling is computed using a reduced number of computational degrees of freedom. This approach has been exploited for particle methods with the right hand side in (1) given by $f(\boldsymbol{x})=\sum_{i}q_{i}\delta(\boldsymbol{x}-\boldsymbol{x}_{i})$ . For instance, we mention the Barnes-Hut algorithm [6], the Fast-Multipole Method (FMM) [13, 7, 14], and the Method of Local Corrections (MLC) [3, 1, 2]. The aforementioned particle algorithms have been modified to handle gridded data; for a more comprehensive review that includes benchmark studies of the FFT, FMM and multigrid methods, see [11].

The present work is based on the extension of the Method of Local Corrections to structured-grid data described in [4, 5, 20]. In this approach, the support of the right-hand side is discretized with a rectangular grid, which is decomposed into a set of cubic patches. For two levels the method proceeds in three steps: (i) a loop over the fine disjoint patches and the computation of local potentials induced by the charge restricted to those patches on sufficiently large extensions of their support (downward pass); (ii) a global coarse-grid Poisson solve with a right hand side computed by applying the coarse-grid Laplacian to the local potentials of step (i); and (iii) a correction of the local solutions computed in step (i) on the boundaries of the fine disjoint patches based on interpolating the global coarse solution from which the contributions from the local solutions have been subtracted (upward pass). These boundary conditions are propagated into the interior of the patches by performing Dirichlet solves on each patch. This can be generalized by replacing the global coarse solution in (ii) by a recursive call to MLC, or by replacing uniform grids at each level covering the entire domain by nested block-structured locally-refined grids. The local volume potentials are computed using a high-order finite-difference approximation to the Laplacian, combined with an extension to three dimensions of the James-Lackner algorithm [16, 17] for representing infinite-domain boundary conditions. Furthermore, in order to make the nested refinement version of this algorithm practical, we require that $R=O(H)=O(h)$ , where $R$ is the radius (in max norm) of local patches, $H$ the coarse mesh spacing, and $h$ the fine mesh spacing (i.e., a fixed number of points per patch and a fixed refinement ratio). In [20], the local field calculation in (i) was split into two contributions: one that represented the field induced by the complete charge distribution on a patch, and a second corresponding to the monopole component of the charge. By using such a splitting, it is possible to obtain a convergent method by using a relatively large region for computing the monopole component only while keeping the overall computation and communications cost low. However, the convergence properties of the resulting method were erratic, and exhibited a large $O(h)$ solution error for smooth charge distributions that were well-resolved on the fine grid.

In the present work, we generalize the method in [20] in a way that preserves the reduced-communication properties of that method, and leads to an error analysis that explains the observed convergence behavior. In particular, we replace the separate treatment of the monopole component of the charge on each patch by a similar treatment of a truncated expansion in Legendre polynomials of the charge distribution on each patch. Our error analysis predicts an $O(h^{P})+O(h^{Q})+O(\epsilon h^{2})+O(\epsilon)$ solution error, where $P-1$ is the maximum degree of the polynomials in the Legendre expansions, and $Q$ is the order of accuracy of the finite-difference discretization used to compute the local potentials. This is consistent with the earlier results in [20] corresponding to $P=1$ . The $O(\epsilon)$ term is a localization error, proportional to the max norm of the charge divided by a localization distance (measured multiples of the patch size) raised to the order of accuracy of the discretized Laplacian on harmonic functions. We also change the detailed approach to computing the local potentials, replacing the James-Lackner representation of the infinite–domain boundary conditions in the calculation of the local potentials in step (i) with local discrete convolutions computed using FFTs via a variation on Hockney’s domain–doubling method [15]. This leads to a conceptually simpler algorithm, and provides a compact numerical kernel on which to focus the effort of optimization.

In this paper, we focus on the design of the algorithm, including an error analysis of the method and calculations that demonstrate the error properties derived from that analysis. In a second paper [21], we will present performance and parallel scaling results on high-performance computing platforms.

2 Mehrstellen Discretization and Finite Difference Localization

Notation. We denote by $D^{h},{\Omega}^{h}\dots\subset{\mathbb{Z}}^{3}$ grids with grid spacing $h$ of discrete points in physical space: $\{{\boldsymbol{g}}h:{\boldsymbol{g}}\in D^{h}\}$ . Arrays of values defined over such sets will approximate functions on subsets of ${\mathbb{R}}^{3}$ , i.e. if $\psi=\psi({\boldsymbol{x}})$ is a function on $D\subset{\mathbb{R}}^{3}$ , then $\psi^{h}[{\boldsymbol{g}}]\approx\psi({\boldsymbol{g}}h)$ . We denote operators on arrays over grids of mesh spacing $h$ by $L^{h},\Delta^{h},\dots$ ; $L^{h}(\phi^{h}):D^{h}\rightarrow{\mathbb{R}}$ . Such operators are also defined on functions of ${\boldsymbol{x}}\in{\mathbb{R}}^{3}$ , and on arrays defined on finer grids $\phi^{h^{\prime}}$ , $h=Nh^{\prime},N\in\mathbb{N}_{+}$ , by sampling: $L^{h}(\phi)\equiv L^{h}(\mathcal{S}^{h}(\phi))$ , $\mathcal{S}^{h}(\phi)[{\boldsymbol{g}}]\equiv\phi({\boldsymbol{g}}h)$ ; $L^{h}(\phi^{h^{\prime}})\equiv L^{h}(\mathcal{S}^{h}(\phi^{h^{\prime}}))$ , $\mathcal{S}^{h}(\phi^{h^{\prime}})[{\boldsymbol{g}}]\equiv\phi^{h^{\prime}}[N{\boldsymbol{g}}]$ .

For a rectangle $D=[\boldsymbol{l},\boldsymbol{u}]$ , defined by its low and upper corners $\boldsymbol{l},\boldsymbol{u}\in\mathbb{Z}^{3}$ , we define the operators

[TABLE]

Throughout this paper, we will use $N_{ref}=4$ for the refinement ratio between levels.

We begin our discussion presenting the finite difference discretizations of (1) that we will be using throughout this work and some of their properties that pertain to the Method of Local Corrections. Specifically, we are employing Mehrstellen discretizations [8] (also referred to as compact finite difference discretizations) of the 3D Laplace operator

[TABLE]

If $\phi^{h}$ is defined on $D^{h}$ , then $\Delta^{h}\phi^{h}$ is defined on $D^{h,s}\equiv{\mathcal{G}}(D^{h},-s)$ . The associated truncation error $\tau^{h}\equiv(\Delta^{h}-\Delta)(\phi)=-\Delta^{h}(\phi^{h}-\phi)$ for the Mehrstellen discrete Laplace operator is of the form

[TABLE]

where $q$ is even and $\mathcal{L}^{2q^{\prime}}$ and $L^{q+2}$ are constant-coefficient differential operators that are homogeneous, i.e. for which all terms are derivatives of order $2q^{\prime}$ and $q+2$ , respectively. For the two operators we will consider here, $C_{2}=\frac{1}{12}$ . In general, the truncation error is $O(h^{2})$ . However, if $\phi$ is harmonic in a neighborhood of $\boldsymbol{x}$ ,

[TABLE]

In our numerical test cases we make use of the 19-point ( $L_{19}^{h}$ ) and 27-point ( $L_{27}^{h}$ ) Mehrstellen stencils [22] that are described in the Appendix (Section A.1), for which $q=4$ and $q=6$ , respectively. In general, it is possible to define operators for which $s=\lfloor\frac{q}{4}\rfloor$ for any even $q$ , using higher-order Taylor expansions and repeated applications of the identity

[TABLE]

Since we are primarily concerned with solving the free-space problem, the corresponding discrete problem can be expressed formally as a discrete convolution.

[TABLE]

where the discrete Green’s function $G^{h}[{\boldsymbol{g}}]=h^{-1}G^{h=1}[{\boldsymbol{g}}]$ satisfies

[TABLE]

and

[TABLE]

We use these conditions to construct approximations to $G^{h}$ numerically, see the Appendix. For any $n$ , we have

[TABLE]

from which it follows that convolution with $G^{h}$ is max-norm stable on bounded domains, i.e. ,

[TABLE]

for any fixed $A>0$ .

The form of the truncation error (5) allows us to compute $q^{th}$ -order accurate solutions to (1) by modifying the right-hand side, i.e.

[TABLE]

and replacing the differential operators on the right-hand side with finite difference approximations. If only a fourth-order accurate solution is required, it suffices to use the first term, leading to a correction of a particularly simple form:

[TABLE]

In particular, the solution error $\epsilon^{h}=G^{h}*f^{h}-\phi=O(h^{4})$ away from the support of $f$ without any modification of $f^{h}$ .

Suppose that $supp(f)\subset P_{\boldsymbol{c}}$ , where $P_{\boldsymbol{c}}=\boldsymbol{c}+[-R,R]^{3}$ is a cube of radius R centered at point $\boldsymbol{c}$ and that the differential operator $L^{q}$ is a linear combination of derivatives of order q. By differentiating (2), we have

[TABLE]

In particular, away from the support of $f$ , (5) becomes

[TABLE]

It is precisely this rapid decay of the truncation error, a consequence of the fact that the local potentials are harmonic away from the supports of the associated charges, that allows us to use a coarse mesh for the global coupling computation. In Figure 1, scatter plots of the truncation error for the case of a point charge located at the origin using the 19-point and 27-point Laplacians are depicted. The rapid decay of the truncation error in the far-field and the faster decay with increasing $q$ are evident.

Using this localization property of the Mehrstellen operators, we can reduce the cost of computing the potential (2) induced by a localized charged distribution to the cost of computing the potential near the support of the charge, using the finite difference localization approach originally introduced in [19]. We assume that the support of $f$ is contained in cube $D$ of radius $R$ centered at $\boldsymbol{c}$ . First, we compute $\phi=G*f$ in the extended cube $D_{\beta}$ of radius $\beta R,\beta>1$ . Then we compute $\phi^{H}=G^{H}*F^{H}$ on $\Omega^{H}$ . The coarse right hand side is defined by:

[TABLE]

Using (14), we have

[TABLE]

where $\>k\in\mathbb{N}$ . One can decompose the annular region $\{{\boldsymbol{g}}:((k+\beta)+1)R\geq\|{\boldsymbol{g}}H\|_{\infty}\geq(k+\beta)R\}$ into $O((k+\beta)^{2})$ rectangles, each of which of radius $\leq R$ , leading to an analogous decomposition of the right-hand side of (16) into a sum of terms, each of which is supported on one such rectangle. Applying convolution with $G^{H}$ to both sides of (16) represented in terms of such sums leads to a solution error given by

[TABLE]

Thus the accuracy of the potential away from the support of the charge can be improved by decreasing the ratio $H/R$ ; or, for fixed values of that ratio, by adjusting $\beta$ or $q$ . In any case, the error is only weakly dependent on $f$ . In this context, we will refer to $\beta$ as a localization radius. In addition, (18) is true independent of whether or not the right-hand side is modified using the Mehrstellen correction (11). The MLC algorithm combines finite difference localization with domain decomposition into a collection of rectangular patches of size $R$ to obtain a low-communication method for computing volume potentials. In that case, we want to keep the number of mesh points per patch fixed, which leads to (17) being an $O(1)$ error relative to the mesh spacing. Ultimately, that error is controlled by increasing $\beta$ , combined with choosing a discretization with a larger $q$ . However, the cost of computing the local convolution $G*f$ on $D^{H,s}_{\beta}$ scales like $\beta^{3}$ . To reduce that cost, we introduce a second localization radius $\alpha$ , $\alpha<\beta$ . On $D^{H,s}_{\alpha}$ , we use the full convolution to compute $F^{H}$ . In the remaining annular region, we use a reduced representation based on the field induced by the first few moments of the Legendre expansion of $f$ , which is much less expensive to compute.

3 Method of Local Corrections - Semi-Discrete Case

To clarify ideas, we discuss in this section a theoretical proxy for the fully discrete algorithm. We construct a function $\phi^{MLC}:\Omega\rightarrow{\mathbb{R}}$ that approximates the potential $\phi$ by a linear superposition of local potentials, combined with data interpolated from a discrete global solution. The computational domain is a cube $\Omega$ that contains the support of $f$ and is decomposed into a finite union of disjoint cubic subdomains of equal volume that are translations of $[-R,\>R]^{3},R>0$ .

[TABLE]

Then $f=\sum_{{\boldsymbol{i}}}f^{\boldsymbol{i}}$ where $f^{\boldsymbol{i}}=f\chi^{\boldsymbol{i}}$ , where $\chi^{\boldsymbol{i}}$ is the characteristic function of ${\Omega_{R,{\boldsymbol{i}}}}$ . As a consequence, the global potential may be written as

[TABLE]

In other words, it is the linear superposition of the potentials induced by the local charges $f^{\boldsymbol{i}}$ which can be computed independently in parallel. The MLC algorithm replaces each of the summands in (20) with a solution truncated to zero outside of a localization radius $\beta R$ , with the contribution to the solution outside the localization radius represented by interpolation from a single coarse grid solution $\phi^{H}$ obtained by summing contributions of the form (15) over all the patches. At each point in space, the coarse grid values used to interpolate the global contibution are corrected by subtracting off the contributions of the patches within the localization radius. Finally, to reduce the cost of computing the localized potentials, while keeping $\beta$ large enough to make the $O(1)$ contribution to the error coming from localization be acceptably small, we introduce an inner radius $\alpha<\beta$ (see Figure 2). Within that inner radius, we compute the full convolution $G*f^{\boldsymbol{i}}$ ; in the annular region ${\Omega_{R,{\boldsymbol{i}},\beta}}\setminus{\Omega_{R,{\boldsymbol{i}},\alpha}}$ , the local solution is approximated by $G*\mathbb{P}(f^{\boldsymbol{i}})$ , where $\mathbb{P}(f^{\boldsymbol{i}})$ is the orthogonal projection onto the Legendre polynomials on ${\Omega_{R,{\boldsymbol{i}}}}$ of some degree $P-1$ :

[TABLE]

where ${\langle\cdot,\cdot\rangle}$ is the inner product on ${\Omega_{R,{\boldsymbol{i}}}}$ , and $Q^{p}:[-1,1]\rightarrow{\mathbb{R}}$ is the classical Legendre polynomial of degree $p$ .

3.1 The Semi-Discrete MLC Algorithm

The semi-discrete MLC algorithm consists of three steps.

Step 1 - Local Convolutions.

We perform local convolutions in regions around each subdomain ${\Omega_{R,{\boldsymbol{i}}}}$ that are used to compute local charges at points on the grid.

[TABLE]

Step 2 - Global Coarse Solve.

The global charge at coarse mesh points is constructed by assembling local contributions

[TABLE]

and we obtain a global approximation $\phi^{H}$ of the potential, represented on the coarse mesh, by computing the discrete convolution over $\Omega^{H}$ .

[TABLE]

**Step 3 - Local Interactions / Local Corrections.

**In the final step, we represent the solution on the boundary of each ${\Omega_{R,{\boldsymbol{i}}}}$ as the sum of local convolutions induced by charges on nearby patches and values interpolated from the grid calculation, from which the local convolution values have been subtracted.

[TABLE]

Here ${\mathcal{I}}^{H}(\psi^{H})({\boldsymbol{x}})$ is an interpolation operator that takes as input values of $\psi^{H}:\mathcal{N}({\boldsymbol{x}})\rightarrow{\mathbb{R}}$ , where $\mathcal{N}({\boldsymbol{x}})\subset\{{\boldsymbol{g}}H:{\boldsymbol{g}}\in\mathbb{Z}^{3}\}$ and returns a $q_{I}^{th}$ -order accurate polynomial interpolant. In all of the algorithms described here, ${\boldsymbol{x}}$ and all of the points in $\mathcal{N}({\boldsymbol{x}})$ are coplanar, so the interpolant is particularly easy to construct. $\phi^{loc,{\boldsymbol{x}}}({\boldsymbol{x}})$ is the sum of all local convolutions the support of whose charges is sufficiently close to ${\boldsymbol{x}}$ so that they contributed to the right-hand side for the grid solution near that point.

[TABLE]

Equation (23) can be interpreted as the decomposition of the potential at a point $\boldsymbol{x}$ , into the sum of local contributions to the potential given by $\phi^{loc,{\boldsymbol{x}}}$ and corrections to include the global coupling by interpolating a corrected form of the coarse mesh global solution $\phi^{H}$ . Specifically, the correction term in (23) is computed by evaluating $\phi^{loc,{\boldsymbol{x}}}$ at the points of the interpolation stencil $\mathcal{N}(\boldsymbol{x})$ , subtracting these values from $\phi^{H}$ and interpolating the result to $\boldsymbol{x}$ . The MLC solution $\phi^{MLC}$ is specified in terms of solutions to Dirichlet problems on each ${\Omega_{R,{\boldsymbol{i}}}}$ .

[TABLE]

3.2 Error Analysis

The error of the local corrections step for $\boldsymbol{x}\in\partial{\Omega_{R,{\boldsymbol{i}}}}$ is given by:

[TABLE]

where $\epsilon_{I}^{H}(\psi)({\boldsymbol{x}})$ is the error in applying the interpolation operator ${\mathcal{I}}^{H}$ to a smooth function $\psi$ evaluated on the grid and evaluating it at ${\boldsymbol{x}}$ . There are two sources of error for the semi-discrete algorithm: one from the calculation of $\phi^{H}$ in (22), and the other due to interpolation at the local corrections step (23). To estimate the former, i.e. the second term of (26), it suffices to bound the coarse mesh error $\phi^{H}-\phi$ . To do so, we estimate the truncation error of the coarse solve (22) at points $\boldsymbol{g}$ :

[TABLE]

To bound the first term of (27), we use (14) to find that

[TABLE]

The second term of (27) is bounded in a similar fashion.

[TABLE]

where we have used

[TABLE]

which follows directly from Taylor’s theorem for $f^{{\boldsymbol{i}}}$ and the fact that $\pi=\mathbb{P}(\pi)$ for polynomials $\pi$ of degree less than $P$ . As a result, the following estimate for the coarse mesh error holds

[TABLE]

uniformly on coarse mesh points. Since convolution with $G^{H}$ and the interpolation operator ${\mathcal{I}}^{H}$ are max-norm bounded, $\epsilon_{C}^{H}\equiv\phi^{H}-\phi$ is also bounded by an expression of the form of the right-hand side of (31).

To bound the first term in (26), it follows from the fact that the interpolation method is $q_{I}^{th}$ -order accurate that

[TABLE]

where $\boldsymbol{\xi}$ is in an $O(H)$ neighborhood of $\mathcal{N}(\boldsymbol{x})$ and $L^{q_{I}}_{I}$ is a linear differential operator with terms that are derivatives of order $q_{I}$ . Using (13), a similar argument to that given in the proof of (31) leads to:

[TABLE]

so that (26) is estimated as

[TABLE]

4 Method of Local Corrections - Fully-Discrete Case

In this section, we describe the two-level algorithm as it is actually implemented. $\Omega^{h}$ is a fine-grid discretization of a bounded domain $\Omega$ , the latter containing the support of $f$ . $\Omega^{h}$ is assumed to be a finite union of rectangles of the form ${\Omega_{R,{\boldsymbol{i}}}^{h}}=n{\boldsymbol{i}}+[0,n]^{3}$ , $R=nh/2$ . We also define discrete forms of ${\Omega_{R,{\boldsymbol{i}},\alpha}^{h}}$ , ${\Omega_{R,{\boldsymbol{i}},\beta}^{h}}$ : ${\Omega_{R,{\boldsymbol{i}},\alpha}^{h}}=\mathcal{G}({\Omega_{R,{\boldsymbol{i}}}^{h}},\lceil\frac{(\alpha-1)n}{2}\rceil)$ and ${\Omega_{R,{\boldsymbol{i}},\beta}^{h}}=\mathcal{G}({\Omega_{R,{\boldsymbol{i}}}^{h}},\lceil\frac{(\beta-1)n}{2}\rceil)$ . The coarse grid $\Omega^{H}$ is assumed to cover all of the fine patch data required for the algorithm described below: ${\mathcal{G}}({\mathcal{C}}({\Omega_{R,{\boldsymbol{i}},\beta}^{h}}),b)\subset\Omega^{H}$ where $b$ is the radius of the stencil for the interpolation function ${\mathcal{I}}^{H}$ . We also define a discretized form of the characteristic function of a rectangular patch $D\subset{\mathbb{Z}}^{3}$

[TABLE]

In the fully-discrete algorithm, we replace the local convolutions with local discrete convolutions, e.g. $G*f^{\boldsymbol{i}}\rightarrow G^{h}*f^{{\boldsymbol{i}},h}$ , $f^{{\boldsymbol{i}},h}=\chi_{\Omega^{h}_{R,{\boldsymbol{i}}}}f$ , and we take $H=N_{ref}h$ .

4.1 The Fully-Discrete Two-Level Algorithm

**1. Step 1 - Local Convolutions.

**For each ${\Omega_{R,{\boldsymbol{i}}}^{h}}$ , we compute the potential induced by $f^{{\boldsymbol{i}},h}=\chi_{\Omega_{R,{\boldsymbol{i}}}^{h}}f^{h}$ .

[TABLE]

The Legendre expansion coefficients of $f^{{\boldsymbol{i}},h}$ required to compute $\mathbb{P}(f^{\boldsymbol{i}})$ are computed with composite numerical integration. We employ Boole’s rule if $f$ is given only at points of $\Omega^{h}$ or Gauss integration if $f$ is specified analytically. For each ${\Omega_{R,{\boldsymbol{i}}}^{h}}$ we also compute the associated local charges

[TABLE]

The values of $\Delta^{H}(G^{h}*Q^{{\boldsymbol{p}}})$ can be computed once and stored, reducing the calculation of $\Delta^{H}(G^{h}*\mathbb{P}^{h}(f^{{\boldsymbol{i}},h}))$ to computing linear combinations of the appropriate subset of those precomputed values.

**2. Global Coarse Solve.

**

[TABLE]

3. Local Interactions - Local Corrections.

We define the local potentials at fine boundary points $\boldsymbol{g}\in\partial{\Omega_{R,{\boldsymbol{i}}}^{h}}$ as combinations of short-range and intermediate-range components

[TABLE]

and we correct them by adding the far-field effects as in (23)

[TABLE]

The interpolation operator on coplanar points ${\mathcal{I}}^{H}$ that we are employing is the same as in [20]. Using these boundary conditions, we solve the following local Dirichlet problems on $\Omega_{{\boldsymbol{i}}}^{h}$ patches

[TABLE]

Finally, the fourth-order Mehrstellen correction (12) is applied to obtain the values of $\phi^{MLC,h}$

[TABLE]

If we want to go to higher than fourth order accuracy in $h$ , the algorithm is more complicated – the Mehrstellen correction must be applied earlier in the process. We will not discuss the details in this paper.

4.2 Error Analysis

We proceed in this section with estimating the error for the fully-discrete MLC algorithm. We want to get some idea of the impact of replacing the analytic continuous convolutions by the discretized convolutions. To do this, we use a modified equation approach, in which we assume that we can approximate the solution error by the action of the operator on the truncation error. In the present setting, this amounts to making the substitution

[TABLE]

As in the semi-discrete case, we want to estimate the error in the boundary conditions

[TABLE]

where

[TABLE]

An estimate of the contribution from (42) is obtained by bounding $\Delta^{H}(\phi^{H}-\tilde{\phi})$ , since ${\mathcal{I}}^{H}$ and convolution with $G^{H}$ are both stable in max norm. We have, by (40),

[TABLE]

The first two terms are identical to the ones that appear in the semi-discrete case, while (40), and the estimate $||(\mathbb{P}-\mathbb{P}^{h})(f^{{\boldsymbol{i}}^{\prime}})||_{\infty}=O(h^{6})$ (which holds since our quadrature rules for computing the Legendre coefficients are at least sixth-order accurate) guarantee that the remaining terms are $O(h^{4})$ or smaller. Using similar arguments to those in (44), we have

[TABLE]

and therefore, following (32), we have

[TABLE]

Thus we have

[TABLE]

The stability of the discretized boundary-value problem implies $\|\phi^{MLC,h}-\phi\|_{\infty}=O(\|\phi^{B,h}-\phi\|_{\infty})+O(h^{4})$ , so we finally have the following estimate

[TABLE]

at all fine grid points. This error can be written in the form

[TABLE]

Thus MLC differs from classical finite-difference methods in that there is a contribution to the error that does not vanish as $h\rightarrow 0$ , i.e. the right-most summand in (45). We refer to this contribution to the error as the barrier error. Note that if we take $q_{I}=q+2$ , we obtain the form of the error given in the Introduction. We have specialized this algorithm to the case of fourth-order accuracy, primarily because it allows us the simplification of applying the Mehrstellen correction (39) at the end of the calculation. However, this analysis suggests that, even with this simplification, there might be an advantage to using discretizations of the Laplacian with larger $q$ , i.e. ones that are higher order accurate when applied to harmonic functions, since the barrier error is proportional to $\beta^{-q}$ . We observe this to be the case in the results in Section 7.

5 Multilevel Method of Local Corrections

Following [20], we generalize the method in Section 4 to the case of an arbitrary number of levels $l=0,\dots,l_{max}$ , where $l_{max}$ is the finest level on which the solution is sought. We denote the discrete Laplacian with mesh size $h_{l}$ by $\Delta^{h_{l}}$ , with ${h_{l}}={{N}_{ref}}{h_{l+1}}$ . At each level we discretize the solution on a collection of node-centered cubic patches ${{\Omega_{{R_{l},{\boldsymbol{i}}}}}}$ , $R_{l}={{N}_{ref}}R_{l+1}$ , and the corresponding discretized grids ${\Omega_{R_{l},{\boldsymbol{i}}}^{h_{l}}}$ ; the combined level $l$ grid is given by $\Omega^{l,h_{l}}\equiv\bigcup_{\boldsymbol{i}}{\Omega_{R_{l},{\boldsymbol{i}}}^{h_{l}}}$ . We also define, for each ${\boldsymbol{i}}$ , localization regions ${\Omega_{R_{l},{\boldsymbol{i}},\alpha}},{\Omega_{R_{l},{\boldsymbol{i}},\beta}}$ , and their discretizations ${\Omega_{R_{l},{\boldsymbol{i}},\alpha}^{h_{l}}},{\Omega_{R_{l},{\boldsymbol{i}},\beta}^{h_{l}}}$ $1<\alpha<\beta$ . At level [math] there is only one patch $\Omega^{0,h_{0}}$ at which the coarse solve of the method is performed, just as in the two-level algorithm. We also impose a proper nesting condition: for $l=1...l_{max}$ ,

[TABLE]

The multilevel MLC comprises the following steps.

**1. Downward Pass - Initial Local Convolutions.

**Local convolutions are computed at levels $l=l_{max},\dots,1$ .

[TABLE]

where the local right hand sides are defined as

[TABLE]

**2. Global Coarse Solve **.

[TABLE]

3. Upward Pass - Local Interactions / Local Corrections for $\boldsymbol{1}{\bf,\dots,}\boldsymbol{l}_{\boldsymbol{max}}$ .

Starting from level 1, the following local Dirichlet problems are solved at levels $l=1,...,l_{max}$ :

[TABLE]

The Dirichlet boundary conditions are given by

[TABLE]

Here the local potentials $\phi^{loc,{\boldsymbol{g}},l}$ are given by:

[TABLE]

Finally, the Mehrstellen correction at all levels is applied as follows:

[TABLE]

We do not have a complete error analysis for the above algorithm corresponding to that given in the two-level case. However, we can look at error analysis of the two-level algorithm, and determine the change in the error introduced there by replacing the coarse-grid convolution with $G^{H}$ with an MLC calculation. We denote by:

•

$G^{MLC,S}(r)$ the two two-level semi-discrete method of local corrections approximation to $G*r$ , with patch radius $S$ ;

•

$N_{1}^{S}(r)({\boldsymbol{x}})\equiv\sum\limits_{{\boldsymbol{i}}:{\boldsymbol{x}}\notin\Omega_{S,{\boldsymbol{i}},\beta}}h^{q}L^{q+2}(G*r^{\boldsymbol{i}}))({\boldsymbol{x}})$ ;

•

$N_{2}^{S}(r)({\boldsymbol{x}})\equiv\sum\limits_{{\boldsymbol{i}}:{\boldsymbol{x}}\in\Omega_{S,{\boldsymbol{i}},\beta}\setminus\Omega_{S,{\boldsymbol{i}},\alpha}}h^{q}L^{q+2}(G*((\mathbb{I}-\mathbb{P})r^{\boldsymbol{i}}))({\boldsymbol{x}})$ ; and

•

$N^{S}(r)=N_{1}^{S}(r)+N_{2}^{S}(r)$ .

By (26), (27), $G^{H}*(N^{R}(f))^{H}=(G*f)^{H}-\phi^{H}$ is the only quantity in the error in which convolution with $G^{H}$ appears. Given that, it is straightforward to assess the impact of replacing the convolution with $G^{H}$ in this expression with applying the MLC algorithm for a patch size $N_{ref}R$ . To estimate this effect, we use a modified equation approach, in which the difference is approximated by $G*(N^{R}(f))-G^{MLC,N_{ref}R}(N^{R}(f))$ . Applying the error estimate (27), we obtain

[TABLE]

For this substitution to have an appropriately small impact, it is sufficient for the error to be comparable to or less than the error in the two-level algorithm. The sum of the first three terms meet this criterion – the sum of first two terms is bounded by the max norm of the two-level error multiplied by $O(\beta^{-q})$ , and the third term is bounded by $O(\alpha^{-q})$ times the max norm of the barrier error of the two-level algorithm. The final term, however, is problematic. In particular, the impact on the error of multiple applications of $\mathbb{I}-\mathbb{P}$ at increasing mesh spacings is far from clear. We will see evidence of this in the numerical results in Section 7.2, and will suggest a remedy that allows the error to be controlled.

6 Computational Issues

The analysis and demonstration of the performance of this algorithm will be the subject of a separate paper [21], so we will just make a few high-level observations to justify the pursuit of this line of research. The largest contribution to the floating–point operation count in this method comes from the initial local discrete convolutions (34). To compute these convolutions, we use a generalization of Hockney’s domain-doubling algorithm [15], which we describe in the Appendix. The floating point work per unknown for this step is $O(\alpha^{3}log(n)),\alpha>1$ , where $n^{3}$ is the number of points per patch. The next-largest computation is that of the final Dirichlet solutions (38), performed using sine transforms, which is $O(log(n))$ per unknown. The floating point work associated with computing the Legendre expansions is small, with the convolutions of Legendre polynomials with the discrete Green’s functions precomputed and stored. The memory overhead for storing these quantities scales like $O(\beta^{3}n^{3})$ . However, there is one copy of these per processor, shared across multiple patches / multiple cores. Furthermore, they are only stored either on a sampled grid coarsened by $N_{ref}$ , or on planar subsets corresponding to boundaries of patches, which reduces the memory overhead further.

The parallel implementation of this algorithm is via domain decomposition, with patches distributed to processors. For the choices of $\alpha$ and $\beta$ used in the results described below, this corresponds to a floating-point operation count about three times that of a corresponding multigrid algorithm for comparable accuracy. Roughly speaking, the communications costs, in terms of number of messages and overall volume of data moved, corresponds to that of a single multigrid V-cycle, plus the negligible costs of communicating a small number of Legendre expansion coefficients (20 per patch for the case $P=4$ ). This is to be compared to the eight multigrid V-cycles required to obtain a comparable level of accuracy. Current trends in the design of HPC processors based on low-power processor technologies indicate a rapid growth in the number of cores capable of performing floating-point operations on a processor, while the communications bandwith between processors, or between the processor and main memory, is growing much more slowly. In addition, most of the floating-point work is performed using FFTs on small patches on a single node, for which there are multiple opportunities for performance optimization. Thus the present algorithm is well-positioned to take advantage of these trends.

7 Numerical Test Cases

We present in this section several examples that demonstrate the convergence properties of the MLC method described above. In all cases, we use as a measure of the solution error the max norm error of the potential, divided by max norm of the potential

[TABLE]

For all cases, we set $n=32$ , so that $H/R=1/4$ . We refer to the special case $\beta=\alpha$ (i.e. if the long-range potentials induced by the truncated Legendre expansions of local charges are ignored) as the MLC-0 method and to the general case $\alpha<\beta$ as the MLC method. It is not difficult to see that for MLC-0, the estimate (45) reduces to

[TABLE]

Increasing $\beta$ to reduce the barrier error in (54) substantially increases the per patch computational cost of the discrete convolution in the downward pass of the method. This is, in fact, the reason we replaced the local long-range potential values with the convolutions of the local Legendre expansions in Section 3.1.

7.1 A Smooth Charge Distribution

The first test case we are considering involves computing the potential induced by a smooth charge. The computational domain is the unit cube $\Omega=[0,1]^{3}$ . The charge density is given by:

[TABLE]

and the support of the charge is a sphere of radius $R_{o}=\frac{1}{4}$ , centered at point $\boldsymbol{x}_{o}=\frac{1}{2}\boldsymbol{1}$ . The exact solution for this problem is given by:

[TABLE]

and reduces to a pure monopole field for $r\geq 1$ .

7.1.1 Two-Level Results

In Table 1 we present the fine mesh errors for the MLC-0 algorithm, with two levels, for mesh sizes $h=\frac{1}{256},\frac{1}{512},\frac{1}{1024}$ using the $L_{19}^{h}$ Mehrstellen Laplacian ( $q=4$ ). We set $b=2\rightarrow q_{I}=6$ so that dependence of the interpolation error as a function of $\alpha,\beta$ matches that of the other error terms. For this problem, the errors in all three cases are so small that they are the barrier errors; each time we double $\beta$ , the error goes down by roughly a factor of 16, as predicted by (54).

In Tables 2 and 3 we present fine mesh errors for the MLC algorithm, with $\alpha=1.5$ , for $\beta=3$ and $\beta=6$ , respectively, when refining both $h$ and $P$ . As $h\rightarrow 0$ , the error in this case approaches a barrier error for both the $P=1$ and $P=4$ cases at a rate of $O(h^{2})-O(h^{4})$ , and those barrier errors correspond to the errors for the MLC-0 calculations with same corresponding values of $\beta$ . For comparison, we also include the values of the error for the MLC-0 calculations with comparable computational costs, i.e. for $\beta=1.5$ . It is clear that for the negligible cost of adding the Legendre expansion, we obtain a decrease in the error by one-three orders of magnitude.

Next, we present the errors obtained by performing similar runs using the $L_{27}^{h}$ Mehrstellen Laplacian, for which $q=6$ . We set $b=3\rightarrow q_{I}=8$ so that dependence of the interpolation error as a function of $\alpha,\beta$ matches that of the other error terms. In this case, the barrier error is $O(\beta^{-6})$ ; hence we expect that smaller values of the $\beta$ correction radius are required to obtain errors similar with those obtained with the $L_{19}^{h}$ difference operator. Since ${3^{4}}\approx{2^{6}},{6^{4}}\approx{3.25^{6}}$ we set $\beta=2,3.25$ . First, in order to estimate the barrier values, we present the fine mesh errors for the MLC-0 method in Table 4 with $\beta=2,3.25$ using the $L_{27}^{h}$ operator. With those values of $\beta$ ; we expect comparable or smaller errors than those of the MLC-0 method with $\beta=3,6$ using the $L_{19}^{h}$ operator. This is the case, as is evident from a comparison with the error values of Table 1. Furthermore, the barrier error as a function of $\beta$ decreases by more than the factor of $18.4=(3.25/2)^{6}$ predicted by the analysis.

In Tables 5 and 6, we present the errors for the MLC algorithm, for the cases $\beta=2,3.25$ ; $\alpha=1.5$ for both cases. The $\beta=2$ calculations reach the same barrier errors as $h$ decreases. That is not the case for the $\beta=3.25$ results in Table 4, but that is not surprising – the reduction of the barrier error by nearly an order of magnitude provides more headroom for $h$ –convergence. However, we see that in Table 7, a slight increase of the inner correction radius to $\alpha=1.75$ allows us to reach the barrier error more rapidly. This is consistent with the error analysis, in that increasing $\alpha$ reduces the coefficient in front of the $O(h^{P})$ error from truncating the Legendre expansion, from which we infer that the error from that source, rather than the error from the inner local convolution, is the dominant $h$ -dependent error for this smooth example.

7.1.2 Three-Level Results

We next present similar results using the multilevel MLC algorithm of Section 5 with three levels. Since we have demonstrated a clear advantage to using the 27-point stencil, in the remaining studies we will restrict our attention to that operator. In Table 8 we show the barrier fine mesh errors obtained using the MLC-0 method for $\beta=2,3.25$ . The errors for $\beta=3.25$ are more than 18.4 times smaller than the errors for $\beta=2$ and are nearly the same to the 2-level method errors (Table 4). As predicted by the error analysis in Section 5 the error of MLC-0 is insensitive to the number of levels.

In Table 9 the errors obtained with the 3-level MLC method are shown using $\alpha=1.75$ , $\beta=3.25$ . Unlike the two-level results, the $P=4$ errors are significantly poorer than the MLC-0 errors. For example, we recover the barrier errors only for $N=4096$ , as opposed to the $N=512$ results for MLC-0. We can improve matters somewhat by increasing $P$ , but even for this very smooth problem, we do not get close to the barrier errors until $N=2048$ . This is consistent with the analysis in Section 5, and indicates that using higher values of $P$ does not solve the problem. We will propose a different solution in Section 7.2.

7.2 An Oscillatory Charge Test Case

We further consider a case of three oscillatory charges that has been previously studied in [20]. The computational domain is again the unit cube $\Omega=[0,1]^{3}$ . Here we define a local charge density, whose support is a sphere of radius $R_{o}$ centered at point $\boldsymbol{x}_{o}$ , by:

[TABLE]

The exact solution associated with this charge density is given by:

[TABLE]

and is a pure monopole for $\>r\geq 1\>$ . For our test case we consider three charges of the form (55), of radius $R_{o}=\frac{5}{100}$ , centered at points $\boldsymbol{c}_{1}=\left(\frac{3}{16},\frac{7}{16},\frac{13}{16}\right)$ , $\boldsymbol{c}_{2}=\left(\frac{7}{16},\frac{13}{16},\frac{3}{16}\right)$ and $\boldsymbol{c}_{3}=\left(\frac{13}{16},\frac{3}{16},\frac{7}{16}\right)$ . The total charge and total potential are given via linear superposition by:

[TABLE]

We first present the results using three levels (Table 10) and four levels (Table 11) using MLC-0. The primary features of the convergence properties of the solution are that the errors are nearly uniform as a function of level, and are the same in both the three and four level cases. There is some indication of slowing down of the convergence rate on the finest two levels, but the convergence is still faster than $O(h^{2})$

In the MLC convergence results in Table 12, we see substantial deviations from the MLC-0 convergence results. The error shows no consistent behavior as a function of resolution, and in fact is worse at the finest resolution (N = 8192) in Table 12 than it is at the N=2048 resolution in Table 11. We see no analogous problems in the MLC-0 calculations. Examining the error analysis in Section 5, we identified the terms in a three-level calculation that might lead to problems. Even in the smooth example above, it is clear that the increasing $P$ does not have sufficient impact to solve this problem. A different approach, suggested by the form of the error, is to reduce the difference $\beta-\alpha$ at coarser levels. In fact, there is likely a mechanism for defining a systematic strategy for doing this, since $(\mathbb{I}-\mathbb{P})f^{\boldsymbol{i}}$ is easily computed. We defer that to later work. For the moment, we demonstrate this by setting $\alpha$ as an empiricially-determined slowly decreasing function of level, holding $\beta$ fixed (Table 13). We see that we can recover close to the errors in the MLC-0 calculation. In addition, the cost of increasing $\alpha$ slightly at coarser levels has a small impact of the overall cost of a multiresolution calculation, since these are applied to calculations at the coarser resolutions, which remain a small fraction of the overall cost of the method, even with the increased values of $\alpha$ .

8 Conclusions

We have presented a domain decomposition method for the numerical solution of Poisson’s equation with infinite domain boundary conditions in three dimensions on a nested hierarchy of structured grids. The method is an extension of Anderson’s Method of Local Corrections for particles [3] to gridded data and generalizes the scheme of McCorquodale, et al. [20]. In the present method, local potentials are computed as volume potentials of local charges up to an inner localization radius, combined with volume potentials induced by order $P-1$ truncated Legendre expansions of the local charges up to an outer localization radius. The remaining global coupling is represented using a coarse-grid version of the same representation. This generalizes the method in [20], which corresponds to the $P=1$ special case in the current method. Also, in [20] the local potentials were computed by means of the James-Lackner representation [16, 17] of infinite–domain boundary conditions. In the present work, this is replaced by a representation using discrete convolution operators, which can be computed efficiently using FFTs via Hockney’s algorithm. This approach eliminates the complicated quadratures that are necessary for the extension of the James-Lackner algorithm to three dimensions, while the FFT-based approach leads to compact compute kernels that can be highly optimized. The resulting algorithm is well-suited for high performance on HPC computing platforms made up of multicore processors; in [21], we will present a systematic study of the performance and scaling of the algorithm on such systems.

In this paper, we have focused primarily on the analytical foundations of the MLC method and have provided a detailed error analysis. The errors are of the form $O(h^{P})+O(h^{4})+O(h^{2}\beta^{-q})+O(\beta^{-q})$ , where $h$ is the mesh spacing, $\beta$ is the nondimensionalized outer localization radius which is independent of $h$ , and $q$ is the order of accuracy of the Mehrstellen operator on harmonic functions. Numerical experiments indicate that the observed convergence behavior of the method is consistent with the analysis. For computationally practical values of the localization radius, and using the 27-point Mehrstellen operator (for which $q=6$ ), the barrier error corresponds to relative solution error norms of $10^{-8}-10^{-9}$ . While the $\beta^{-q}$ term looks like an $O(1)$ error relative to the mesh spacing $h$ , it is better to think of it as a separate discretization parameter that governs the accuracy of the representation of the nonlocal coupling. Doubling $\beta$ decreases the error by a factor of $2^{-q}$ , analogous to the impact of halving $h$ .

For the two-level algorithm, the results indicate that, for a given choice of the Mehrstellen operator, the two localization radii, and for $P=4$ , the method converges at a rates in the range $O(h^{4})$ – $O(h^{2})$ , until the error reaches the barrier, i.e. consistent with the error analysis. We have also defined and implemented the extension to more than two levels, following the approach in [20]. A preliminary analysis of that algorithm indicates the need to control errors at coarser levels coming from the field induced between the inner and outer localization radii by the truncation of the Legendre expansion. The analysis suggests that these might be controlled by increasing the inner localization radius $\alpha$ at coarser levels. The numerical examples indicate that the problem is real, and that the proposed solution represents a viable approach. More generally, an important question that needs to be addressed is turning the error analysis in this work into practical strategies for choosing discretization parameters. For example, what are the tradeoffs between decreasing $\beta-\alpha$ and decreasing $h$ in order to improve the accuracy of a calculation, versus the cost of doing each? We will address these issues in [21].

There are various possible ways to extend the present work. Perhaps most straightforward are extensions to finite–volume discretizations and the implementation of other boundary conditions on rectangular domains (including periodic boundary conditions) using a method–of–images approach. Another possibility would be to apply even higher–order Mehrstellen discretizations of the Laplacian to see whether it results in smaller values of the barrier errors than those reported in this work. As was seen in Section 7, the $\>L_{27}^{h}\>$ ( $q=6$ ) Mehrstellen Laplacian leads to comparable barrier errors to those obtained using the $\>L_{19}^{h}\>$ ( $q=4$ ) stencil, but using smaller localization radii, in a manner consistent with the $O(\beta^{-q})$ scaling of that error. It is possible to derive Mehrstellen stencils for which $q=10$ , with the stencil contained in a $5\times 5\times 5$ block around the evaluation point. This leads to only a modest increase in the computational cost and complexity: for example, the per-patch computational cost of the most compute-intensive component of the algorithm – the local discrete convolutions – does not depend on the size of the stencil. Finally, it would be interesting to investigate extensions of this method to other elliptic problems in mathematical physics employing different Green’s functions and high-order discretizations of the associated differential operators. The error analysis of the method as extended to other kernels should be essentially the same with what is discussed in the present study. Moreover, Hockney’s algorithm is kernel-independent and can be readily applied with minor modifications. More generally, the present work uses some detailed analytic tools for understanding the discrete potential theory on locally–structured grids associated with the combination of finite difference localization in [19] and the local interactions / local corrections construction underlying [3]. It would be interesting to go back to the original MLC method for particles and to other particle-grid methods, such as particle-in-cell and immersed boundary methods, and apply these tools to better understand the error properties of these methods.

Appendix A Appendix

A.1 $L_{19}^{h}\>$ and $\>L_{27}^{h}\>$ Mehrstellen Discretizations of the Laplacian

The stencil coefficients for the $\>L_{19}^{h}\>$ and $\>L_{27}^{h}\>$ Mehrstellen Laplacians are $\>a_{\boldsymbol{j}}=\frac{1}{h^{2}}b_{|\boldsymbol{j}|}$ , where $\>|\boldsymbol{j}|\>$ is the number of non-zero components of $\>\boldsymbol{j}$ and $\>b_{k}\>$ are defined as:

[TABLE]

The corresponding expressions for the truncation errors $\tau_{19}^{h}$ , $\tau_{27}^{h}$ for $\>L_{19}^{h}\>$ , $\>L_{27}^{h}\>$ , are given by:

[TABLE]

and

[TABLE]

where the $L^{(q)}$ ’s are homogeneous constant–coefficient $q^{th}$ -order differential operators.

We need to compute an approximation to the discrete Green’s function (8) for the 19-point and 27-point operators, restricted to a domain of the form $\>D=[-n,\>n]^{3}$ . We do this by solving the following inhomogeneous Dirichlet problem on a larger domain $D_{\zeta}=[-\zeta n,\zeta n]^{3}$ .

[TABLE]

where $G=G({\boldsymbol{x}})$ is the Green’s function (2), and $L^{h}$ is either the 19-point or 27-point operator. Then our approximation to $G^{h=1}$ on $D$ is the solution computed on $D_{\zeta}$ , restricted to $D$ . To compute this solution, we put the inhomogeneous boundary condition into residual-correction form, and solve the resulting homogeneous Dirichlet problem using the discrete sine transform. The error estimate (12) applied here implies that the error in replacing the correct discrete boundary conditions with those of the exact Green’s function scales like $O({(\zeta n)}^{-4})$ in max norm. In the calculations presented here, we computed $G^{h=1}$ using $n\geq 128$ and $\zeta=2$ , leading to at least 10 digits of accuracy for $G^{h=1}$ .

A.2 Hockney’s Method for Fast Evaluation of Discrete Convolutions

Hockney ([15],p.180–181; see also [9]) observed that discrete convolutions with one of the functions having support on a bounded domain in $\mathbb{Z}^{\mathbf{D}}$ , and evaluated on a bounded domain, can be computed exactly in terms of discrete Fourier transforms. For completeness, we describe that method. We show this first for the case ${\mathbf{D}}=1$ , and state the general result for any number of dimensions. Given $\Psi,f:\mathbb{Z}\rightarrow\mathbb{R}$ , $supp(f)\subseteq[0,b]$ , we want to compute

[TABLE]

First, we observe that the infinite sum can be replaced by a finite sum.

[TABLE]

for any $b^{\prime}\geq b$ . Second, we observe that $\Psi$ , $f$ can be replaced in (57) by periodic extensions of those functions restricted to the interval $[-b^{\prime},n]$ .

[TABLE]

Finally, we express the periodic convolution in (58) in terms of discrete Fourier transforms.

[TABLE]

where $\mathcal{F}$ , $\mathcal{F}^{-1}$ are the discrete complex Fourier transform and its inverse on the interval $[-b^{\prime},n]\subset\mathbb{Z}$ .

This generalizes to rectangular domains in any number of dimensions. For example, for cubic domains, given $\Psi,f:\mathbb{Z}^{\mathbf{D}}\rightarrow\mathbb{R}^{\mathbf{D}}$ , $supp(f)\subseteq[0,b]^{\mathbf{D}}$ ,

[TABLE]

where $b^{\prime}\geq b$ and $\mathcal{F}$ , $\mathcal{F}^{-1}$ are the complex discrete Fourier transform and its inverse on the cube $[-b^{\prime},n]^{\mathbf{D}}\subset\mathbb{Z}^{\mathbf{D}}$ . In practice, this is efficient for a broad range of $(b,n)$ since we can choose $b^{\prime}$ so that the radices of the FFTs are highly composite, with the size of the problem changing by only a small amount. In the case where $b=n$ , the length of the domain doubles in each direction, hence this is often referred to as Hockney’s domain-doubling algorithm. However, in the present application, we want to use the more general case, since the size of the support of the localized charge distributions and the size of the grid on which the local fields are defined differ by a significant amount.

Acknowledgments

The authors would like to thank Brian Van Straalen and Peter McCorquodale for a number of helpful discussions. This research was supported at the Lawrence Berkeley National Laboratory by the Office of Advanced Scientific Computing Research of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and at the National Energy Research Scientific Computing Center by the DOE Petascale Initiative in Computational Science and Engineering.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. S. Almgren, A Fast Adaptive Vortex Method using Local Corrections, Ph D Dissertation, University of California at Berkeley, Berkeley, 1991.
2[2] A. S. Almgren, T. Buttke, P. Colella, A Fast Adaptive Vortex Method in Three Dimensions, Journal of Computational Physics, vol. 113, 1994.
3[3] C.R. Anderson, A Method of Local Corrections for Computing the Velocity Field Due to a Distribution of Vortex Blobs, Journal of Computational Physics, vol. 62, 1986.
4[4] G.T. Balls, A Finite Difference Domain Decomposition Method using Local Corrections for the Solution of Poisson’s Equation, Ph D Thesis, University of California at Berkeley, 1999.
5[5] G.T. Balls, P. Colella, A Finite Difference Domain Decomposition Method using Local Corrections for the Solution of Poisson’s Equation, Journal of Computational Physics, vol. 180, 2002.
6[6] J. Barnes, P. Hut, A Hierarchical O(N log N) Force-Calculation Algorithm, Nature, vol. 324, 1986.
7[7] J. Carrier, L. Greengard, V. Rokhlin, A Fast Adaptive Multipole Algorithm for Particle Simulations, SIAM Journal on Scientific Computing, vol. 9, 1988.
8[8] L. Collatz, The Numerical Treatment of Differential Equations, Springer-Verlag, Berlin-Heidelberg-New York, 1966.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Computation of Volume Potentials on Structured Grids via the Method of Local Corrections

Abstract

1 Introdu]ction

2 Mehrstellen Discretization and Finite Difference Localization

3 Method of Local Corrections - Semi-Discrete Case

3.1 The Semi-Discrete MLC Algorithm

3.2 Error Analysis

4 Method of Local Corrections - Fully-Discrete Case

4.1 The Fully-Discrete Two-Level Algorithm

4.2 Error Analysis

5 Multilevel Method of Local Corrections

6 Computational Issues

7 Numerical Test Cases

7.1 A Smooth Charge Distribution

7.1.1 Two-Level Results

7.1.2 Three-Level Results

7.2 An Oscillatory Charge Test Case

8 Conclusions

Appendix A Appendix

A.1 L19h L_{19}^{h}\>L19h​ and L27h \>L_{27}^{h}\>L27h​ Mehrstellen Discretizations of the Laplacian

A.2 Hockney’s Method for Fast Evaluation of Discrete Convolutions

Acknowledgments

A.1 $L_{19}^{h}\>$ and $\>L_{27}^{h}\>$ Mehrstellen Discretizations of the Laplacian