Modern Monte Carlo Variants for Uncertainty Quantification in Neutron   Transport

Ivan G. Graham; Matthew J. Parkinson; Robert Scheichl

arXiv:1702.03561·math.NA·October 18, 2017

Modern Monte Carlo Variants for Uncertainty Quantification in Neutron Transport

Ivan G. Graham, Matthew J. Parkinson, Robert Scheichl

PDF

Open Access

TL;DR

This paper advances Monte Carlo methods for uncertainty quantification in neutron transport, demonstrating significant computational gains through hybrid solvers and multilevel quasi-Monte Carlo techniques in complex stochastic settings.

Contribution

It introduces novel theoretical convergence results and practical algorithms for UQ in neutron transport with low-regularity random fields, including hybrid solvers and multilevel quasi-Monte Carlo methods.

Findings

01

Multilevel quasi-Monte Carlo reduces computational cost by up to 100 times.

02

Hybrid iterative/direct solver improves efficiency for each realization.

03

Numerical experiments confirm theoretical gains in high-dimensional stochastic problems.

Abstract

We describe modern variants of Monte Carlo methods for Uncertainty Quantification (UQ) of the Neutron Transport Equation, when it is approximated by the discrete ordinates method with diamond differencing. We focus on the mono-energetic 1D slab geometry problem, with isotropic scattering, where the cross-sections are log-normal correlated random fields of possibly low regularity. The paper includes an outline of novel theoretical results on the convergence of the discrete scheme, in the cases of both spatially variable and random cross-sections. We also describe the theory and practice of algorithms for quantifying the uncertainty of a linear functional of the scalar flux, using Monte Carlo and quasi-Monte Carlo methods, and their multilevel variants. A hybrid iterative/direct solver for computing each realisation of the functional is also presented. Numerical experiments show the…

Tables2

Table 1. Table 1: Summary of estimated rates in ( 28 ), ( 29 ), ( 34 ) and ( 38 ).

	$α$	$β$	$γ$	$λ$
Matérn field	1.9	4.1	2.2	0.62
Exponential field	1.7	1.9	2.2	0.71

Table 2. Table 2: Comparison of the estimated theoretical and actual computational ϵ italic-ϵ \epsilon -cost rates, for different Monte Carlo methods, using the hybrid solver.

	MC		QMC		MLMC		MLQMC
Field	Estimated	Actual	Estimated	Actual	Estimated	Actual	Estimated	Actual
Matérn	3.2	3.4	2.4	2.7	2.0	2.1	1.2	1.5
Exponential	3.3	3.6	2.7	2.4	2.2	2.5	1.9	1.9

Equations114

μ \frac{d ψ}{d x} (x, μ) + σ (x) ψ (x, μ)

μ \frac{d ψ}{d x} (x, μ) + σ (x) ψ (x, μ)

where ϕ (x)

ψ (0, μ) = 0, \mbox f or μ > 0 and ψ (1, μ) = 0, \mbox f or μ < 0 .

ψ (0, μ) = 0, \mbox f or μ > 0 and ψ (1, μ) = 0, \mbox f or μ < 0 .

μ \frac{d ψ}{d x} (x, μ, ω) + σ (x, ω) ψ (x, μ, ω) = σ_{S} (x, ω) ϕ (x, ω) + f (x),

μ \frac{d ψ}{d x} (x, μ, ω) + σ (x, ω) ψ (x, μ, ω) = σ_{S} (x, ω) ϕ (x, ω) + f (x),

where ϕ (x, ω) = \int_{- 1}^{1} ψ (x, μ^{'}, ω) d μ^{'}

0 < σ_{A, m i n} \leq σ_{A} (x) \leq σ_{A, m a x} < \infty, for all x \in [0, 1],

0 < σ_{A, m i n} \leq σ_{A} (x) \leq σ_{A, m a x} < \infty, for all x \in [0, 1],

C_{ν} (x, y) = σ_{v a r}^{2} \frac{2 ^{1 - ν}}{Γ ( ν )} (2 ν \frac{∣ x - y ∣}{λ _{C}})^{ν} K_{ν} (2 ν \frac{∣ x - y ∣}{λ _{C}}) .

C_{ν} (x, y) = σ_{v a r}^{2} \frac{2 ^{1 - ν}}{Γ ( ν )} (2 ν \frac{∣ x - y ∣}{λ _{C}})^{ν} K_{ν} (2 ν \frac{∣ x - y ∣}{λ _{C}}) .

lo g σ_{S} (x, ω) = i = 1 \sum \infty ξ_{i} η_{i} (x) Z_{i} (ω),

lo g σ_{S} (x, ω) = i = 1 \sum \infty ξ_{i} η_{i} (x) Z_{i} (ω),

μ_{k}

μ_{k}

= σ_{S, j - 1/2} Φ_{j - 1/2} + F_{j - 1/2}, j = 1, ..., M, ∣ k ∣ = 1, \dots, N,

Φ_{j - 1/2} = \frac{1}{2} ∣ k ∣ = 1 \sum N w_{k} \frac{Ψ _{k, j} + Ψ _{k, j - 1}}{2}, j = 1, ..., M .

Φ_{j - 1/2} = \frac{1}{2} ∣ k ∣ = 1 \sum N w_{k} \frac{Ψ _{k, j} + Ψ _{k, j - 1}}{2}, j = 1, ..., M .

(T - P - Σ_{S} I) (Ψ Φ) = (F 0) .

(T - P - Σ_{S} I) (Ψ Φ) = (F 0) .

(I - P T^{- 1} Σ_{S}) Φ = P T^{- 1} F,

(I - P T^{- 1} Σ_{S}) Φ = P T^{- 1} F,

\mbox t h eor e t i c a l cos t o f t h e d i r ec t so l v er \sim O (M^{2} (M + N)) .

\mbox t h eor e t i c a l cos t o f t h e d i r ec t so l v er \sim O (M^{2} (M + N)) .

Φ^{(k)} = P T^{- 1} (Σ_{S} Φ^{(k - 1)} + F),

Φ^{(k)} = P T^{- 1} (Σ_{S} Φ^{(k - 1)} + F),

\mbox t h eor e t i c a l cos t o f so u r ce i t er a t i o n \sim O (M N K) .

\mbox t h eor e t i c a l cos t o f so u r ce i t er a t i o n \sim O (M N K) .

\bigg{\|}\sigma^{1/2}\left(\phi-\phi^{(K)}\right)\bigg{\|}_{2}\ \leq\ C^{\prime}\left(\eta\bigg{\|}\frac{\sigma_{S}}{\sigma}\bigg{\|}_{\infty}\right)^{K}\ ,

\bigg{\|}\sigma^{1/2}\left(\phi-\phi^{(K)}\right)\bigg{\|}_{2}\ \leq\ C^{\prime}\left(\eta\bigg{\|}\frac{\sigma_{S}}{\sigma}\bigg{\|}_{\infty}\right)^{K}\ ,

\|\phi\ -\ \phi^{(K)}\|_{2}\ \leq\ C\bigg{\|}\frac{\sigma_{S}}{\sigma}\bigg{\|}_{\infty}^{K}\ ,

\|\phi\ -\ \phi^{(K)}\|_{2}\ \leq\ C\bigg{\|}\frac{\sigma_{S}}{\sigma}\bigg{\|}_{\infty}^{K}\ ,

μ \frac{d u}{d x} + σ u = g, with u (0) = 0, when μ > 0 and u (1) = 0 when μ < 0,

μ \frac{d u}{d x} + σ u = g, with u (0) = 0, when μ > 0 and u (1) = 0 when μ < 0,

μ (\frac{U _{j} - U _{j - 1}}{h _{j}}) + σ_{j - 1/2} (\frac{U _{j} + U _{j - 1}}{2}) = g_{j - 1/2}, \mbox f or j = 1, ..., M,

μ (\frac{U _{j} - U _{j - 1}}{h _{j}}) + σ_{j - 1/2} (\frac{U _{j} + U _{j - 1}}{2}) = g_{j - 1/2}, \mbox f or j = 1, ..., M,

\int_{I_{j}}\left(\mu\frac{{\mathrm{d}}u^{h}}{{\mathrm{d}}x}+{\widetilde{\sigma}}u^{h}\right)\ =\ \int_{I_{j}}\widetilde{g}\ ,\quad j=1,\ldots,M,\quad\text{where}\quad I_{j}=(x_{j-1},x_{j}),\

\int_{I_{j}}\left(\mu\frac{{\mathrm{d}}u^{h}}{{\mathrm{d}}x}+{\widetilde{\sigma}}u^{h}\right)\ =\ \int_{I_{j}}\widetilde{g}\ ,\quad j=1,\ldots,M,\quad\text{where}\quad I_{j}=(x_{j-1},x_{j}),\

u = S_{μ} g and u^{h} = S_{μ}^{h} g .

u = S_{μ} g and u^{h} = S_{μ}^{h} g .

(K g) (x) := \frac{1}{2} \int_{- 1}^{1} (S_{μ} g) (x) d μ, and (K^{h, N} g) (x) = \frac{1}{2} ∣ k ∣ = 1 \sum N w_{k} (S_{μ_{k}}^{h} g) (x) .

(K g) (x) := \frac{1}{2} \int_{- 1}^{1} (S_{μ} g) (x) d μ, and (K^{h, N} g) (x) = \frac{1}{2} ∣ k ∣ = 1 \sum N w_{k} (S_{μ_{k}}^{h} g) (x) .

(K g) (x) = \frac{1}{2} \int_{0}^{1} E_{1} (∣ τ (x, y) ∣) g (y) d y,

(K g) (x) = \frac{1}{2} \int_{0}^{1} E_{1} (∣ τ (x, y) ∣) g (y) d y,

ψ (x, μ) = S_{μ} (σ_{S} ϕ + f), so that ϕ = K (σ_{S} ϕ + f) .

ψ (x, μ) = S_{μ} (σ_{S} ϕ + f), so that ϕ = K (σ_{S} ϕ + f) .

ϕ^{h, N} := \frac{1}{2} ∣ k ∣ = 1 \sum N w_{k} ψ_{k}^{h, N} \in V^{h},

ϕ^{h, N} := \frac{1}{2} ∣ k ∣ = 1 \sum N w_{k} ψ_{k}^{h, N} \in V^{h},

\int_{I_{j}} (μ_{k} \frac{d ψ _{k}^{h, N}}{d x} + σ ψ_{k}^{h, N}) = \int_{I_{j}} g^{h, N}, where g^{h, N} = σ_{S} ϕ^{h, N} + f .

\int_{I_{j}} (μ_{k} \frac{d ψ _{k}^{h, N}}{d x} + σ ψ_{k}^{h, N}) = \int_{I_{j}} g^{h, N}, where g^{h, N} = σ_{S} ϕ^{h, N} + f .

ψ_{k}^{h, N} = S_{μ_{k}}^{h} (σ_{S} ϕ^{h, N} + f), so that ϕ^{h, N} = K^{h, N} (σ_{S} ϕ^{h, N} + f) .

ψ_{k}^{h, N} = S_{μ_{k}}^{h} (σ_{S} ϕ^{h, N} + f), so that ϕ^{h, N} = K^{h, N} (σ_{S} ϕ^{h, N} + f) .

ϕ - ϕ^{h, N} = (I - K^{h, N} σ_{S})^{- 1} (K - K^{h, N}) (σ_{S} ϕ + f),

ϕ - ϕ^{h, N} = (I - K^{h, N} σ_{S})^{- 1} (K - K^{h, N}) (σ_{S} ϕ + f),

∥ ϕ - ϕ^{h, N} ∥_{\infty} \leq ∥ (I - K^{h, N} σ_{S})^{- 1} ∥_{\infty} ∥ (K - K^{h, N}) (σ_{S} ϕ + f) ∥_{\infty} .

∥ ϕ - ϕ^{h, N} ∥_{\infty} \leq ∥ (I - K^{h, N} σ_{S})^{- 1} ∥_{\infty} ∥ (K - K^{h, N}) (σ_{S} ϕ + f) ∥_{\infty} .

∥ (I - K^{h, N} σ_{S})^{- 1} ∥_{\infty} \leq C_{1},

∥ (I - K^{h, N} σ_{S})^{- 1} ∥_{\infty} \leq C_{1},

∥ (K - K^{h, N}) (σ_{S} ϕ + f) ∥_{\infty} \leq (C_{2} h lo g N + C_{3} h^{η} + C_{4} \frac{1}{N}) ∥ f ∥_{η},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNuclear reactor physics and engineering · Probabilistic and Robust Engineering Design · Nuclear Physics and Applications

Full text

11institutetext: Ivan G. Graham (✉) 22institutetext: Matthew J. Parkinson 33institutetext: Robert Scheichl 44institutetext: University of Bath, Claverton Down, Bath, BA2 7AY, UK

44email: [email protected]; [email protected]; [email protected]

Modern Monte Carlo Variants for Uncertainty Quantification in Neutron Transport

Ivan G. Graham

Matthew J. Parkinson

Robert Scheichl

Abstract

We describe modern variants of Monte Carlo methods for Uncertainty Quantification (UQ) of the Neutron Transport Equation, when it is approximated by the discrete ordinates method with diamond differencing. We focus on the mono-energetic 1D slab geometry problem, with isotropic scattering, where the cross-sections are log-normal correlated random fields of possibly low regularity. The paper includes an outline of novel theoretical results on the convergence of the discrete scheme, in the cases of both spatially variable and random cross-sections. We also describe the theory and practice of algorithms for quantifying the uncertainty of a linear functional of the scalar flux, using Monte Carlo and quasi-Monte Carlo methods, and their multilevel variants. A hybrid iterative/direct solver for computing each realisation of the functional is also presented. Numerical experiments show the effectiveness of the hybrid solver and the gains that are possible through quasi-Monte Carlo sampling and multilevel variance reduction. For the multilevel quasi-Monte Carlo method, we observe gains in the computational $\varepsilon$ -cost of up to 2 orders of magnitude over the standard Monte Carlo method, and we explain this theoretically. Experiments on problems with up to several thousand stochastic dimensions are included.

Dedicated to Ian H. Sloan on the occasion of his 80th birthday.

Keywords: Reactor Modelling, Neutron (Boltzmann) Transport Equation, Radiative Transport, Monte Carlo, QMC, MLMC, Source Iteration.

1 Introduction

In this paper we will consider the Neutron Transport equation (NTE), sometimes referred to as the Boltzmann transport equation. This is an integro-differential equation which models the flux of neutrons in a reactor. It has particular applications for nuclear reactor design, radiation shielding and astrophysics SaMc:82 . There are many potential sources of uncertainty in a nuclear reactor, such as the geometry, material composition and reactor wear. Here, we will consider the problem of random spatial variation in the coefficients (the cross-sections) in the NTE, represented by correlated random fields with potentially low smoothness. Our aim is to understand how uncertainty in the cross-sections propagates through to (functionals of) the neutron flux. This is the forward problem of Uncertainty Quantification.

We will quantify the uncertainty using Monte Carlo (MC) type methods, that is, by simulating a finite number of pseudo-random instances of the NTE and by averaging the outcome of those simulations to obtain statistics of quantities of interest. Each statistic can be interpreted as an expected value of some (possibly nonlinear) functional of the neutron flux with respect to the random cross-sections. The input random fields typically need to be parametrised with a significant number of random parameters leading to a problem of high-dimensional integration. MC methods are known to be particularly well-suited to this type of problem due to their dimension independent convergence rates.

However, convergence of the MC algorithm is slow and determined by $\sqrt{\mathbb{V}(\cdot)/N\ }$ , where $\mathbb{V}(\cdot)$ is the variance of the quantity of interest and $N$ is the number of samples. For this reason, research is focussed on improving the convergence, whilst retaining dimensional independence. Advances in MC methods can broadly be split into two main categories: improved sampling and variance reduction. Improved sampling methods attempt to find samples that perform better than the pseudo-random choice. Effectively, they aim to improve the $\sqrt{1/N\ }$ term in the error estimate. A major advance in sampling methods has come through the development of quasi-Monte Carlo (QMC) methods. Variance reduction methods, on the other hand, attempt to reduce the $\mathbb{V}(\cdot)$ term in the error estimate and thus reduce the number of samples needed for a desired accuracy. Multilevel Monte Carlo (MLMC) methods (initiated in He:01 ; Gi:08 and further developed in, e.g., GiWa:09 ; BaScZo:11 ; Cl:11 ; ChSc:13 ; KuScSl:15 ; Kuo:15 ; TeJa:15 ; HaAli:15 ) fall into this category. A comprehensive review of MLMC can be found in Gi:15 .

The rigorous theory of all of the improvements outlined above requires regularity properties of the solution, the verification of which can be a substantial task. There are a significant number of published papers on the regularity of parametric elliptic PDEs, in physical and parameter space, as they arise, e.g., in flow in random models of porous media ChSc:13 ; KuScSl:12 ; DiKuGi:14a ; DiKuGi:14 ; GrKu:15 ; KuScSl:15 ; Kuo:15 . However, for the NTE, this regularity question is almost untouched. Our complementary paper GrPaSc:17 contains a full regularity and error analysis of the discrete scheme for the NTE with spatially variable and random coefficients. Here we restrict to a summary of those results.

The field of UQ has grown very quickly in recent years and its application to neutron transport theory is currently of considerable interest. There are a number of groups that already work on this problem, e.g. AyEa:15 ; Fi:11 ; Gi6:13 and references therein. Up to now, research has focussed on using the polynomial chaos expansion (PCE), which comes in two forms; the intrusive and non-intrusive approaches. Both approaches expand the random flux in a weighted sum of orthogonal polynomials. The intrusive approach considers the expansion directly in the differential equation, which in turn requires a new solver (‘intruding’ on the original solver). In contrast, the non-intrusive approach attempts to estimate the coefficients of the PCE directly, by projecting onto the PCE basis cf. (AyEa:15, , eq.(40)). This means the original solver can be used as a ‘black box’ as in MC methods. Both of the approaches then use quadrature to estimate the coefficients in the PCE. The main disadvantage of standard PCE is that typically the number of terms grow exponentially in the number of stochastic dimensions and in the order of the PCE, the so-called curse of dimensionality.

Fichtl and Prinja Fi:11 were some of the first to numerically tackle the 1D slab geometry problem with random cross-sections. Gilli et al. Gi6:13 improved upon this work by using (adaptive) sparse grid ideas in the collocation method, to tackle the curse of dimensionality. Moreover, AyPaEa:14 constructed a hybrid PCE using a combination of Hermite and Legendre polynomials, observing superior convergence in comparison to the PCE with just Hermite polynomials. More recently AyEa:15 tackled the (time-independent) full criticality problem in three spatial, two angular and one energy variable. They consider a second expansion, the high-dimensional model representation (HDMR), which allows them to expand the response (e.g. functionals of the flux) in terms of low-dimensional subspaces of the stochastic variable. The PCE is used on the HDMR terms, each with their own basis and coefficients. We note however, that none of these papers provide any rigorous error or cost analysis.

The structure of this paper is as follows. In Section 2, we describe the model problem, a 1D slab geometry simplification of the Neutron Transport Equation with spatially varying and random cross-sections. We set out the discretisation of this equation and discuss two methods for solving the resultant linear systems; a direct and an iterative solver. In Section 3, the basic elements of a fully-discrete error analysis of the discrete ordinates method with diamond differencing applied to the model problem are summarised. The full analysis will be given in GrPaSc:17 . In Section 4, we introduce a number of variations on the Monte Carlo method for quantifying uncertainty. This includes a summary of the theoretical computational costs for each method. Finally, Section 5 contains numerical results relating to the rest of the paper. We first present a hybrid solver that combines the benefits of both direct and iterative solvers. Its cost depends on the particular realisation of the cross-sections. Moreover, we present simulations for the UQ problem for the different variants of the Monte Carlo methods, and compare the rates with those given by the theory.

2 The Model Problem

The Neutron Transport Equation (NTE) is a physically derived balance equation, that models the angular flux $\psi(\vec{r},\Theta,E)$ of neutrons in a domain, where $\vec{r}$ is position, $\Theta$ is angle and $E$ is energy. Neutrons are modelled as non-interacting particles travelling along straight line paths with some energy $E$ . They interact with the larger nuclei via absorption, scattering and fission. The rates $\sigma_{A}$ , $\sigma_{S}$ and $\sigma_{F}$ at which these events occur are called the absorption, scattering and fission cross-sections, respectively. They can depend on the position $\vec{r}$ and the energy $E$ of the neutron. The scattering cross-sections also depend on the energy $E^{\prime}$ after the scattering event, as well as on the angles $\Theta$ and $\Theta^{\prime}$ before and after the event.

The two main scenarios of interest in neutron transport are the so-called fixed source problem and the criticality problem. We will focus on the former, which concerns the transport of neutrons emanating from some fixed source term $f$ . It has particular applications in radiation shielding. We will further simplify our model to the 1D slab geometry case by assuming

•

no energy dependence;

•

dependence only on one spatial dimension and infinite extent of the domain in the other two dimensions;

•

no dependence of any cross-sections on angle;

•

no fission.

The resulting simplified model is an integro-differential equation for the angular flux $\psi(x,\mu)$ such that

[TABLE]

for any $x\in(0,1)$ and $\mu\in[-1,1]$ , subject to the no in-flow boundary conditions

[TABLE]

Here, the angular domain is reduced from $\mathbb{S}_{2}$ to the unit circle $\mathbb{S}_{1}$ and parametrised by the cosine $\mu\in[-1,1]$ of the angle. The equation degenerates at $\mu=0$ , i.e. for neutrons moving perpendicular to the $x$ -direction. The coefficient function $\sigma(x)$ is the total cross-section given by $\sigma=\sigma_{S}+\sigma_{A}$ . For more discussion on the NTE see DaLi:12 ; LeMi:84 .

2.1 Uncertainty Quantification

An important problem in industry is to quantify the uncertainty in the fluxes due to uncertainties in the cross-sections. Most materials, in particular shielding materials such as concrete, are naturally heterogeneous or change their properties over time through wear. Moreover, the values of the cross-sections are taken from nuclear data libraries across the world and they can differ significantly between libraries LeLe:07 . This means there are large amounts of uncertainty on the coefficients, and this could have significant consequences on the system itself.

To describe the random model, let $(\Omega,\mathcal{A},\mathbb{P})$ be a probability space with $\omega\in\Omega$ denoting a random event from this space. Consider a (finite) set of partitions of the spatial domain, where on each subinterval we assume that $\sigma_{S}=\sigma_{S}(x,\omega)$ and $\sigma=\sigma(x,\omega)$ are two (possibly dependent or correlated) random fields. Then the angular flux and the scalar flux become random fields and the model problem (1), (2) becomes

[TABLE]

and $\psi(\cdot,\cdot,\omega)$ satisfies the boundary conditions (3). The set of equations (4), (5), (3) have to hold for almost all realisations $\omega\in\Omega$ .

For simplicity, we restrict ourselves to deterministic $\sigma_{A}=\sigma_{A}(x)$ with

[TABLE]

and assume a log-normal distribution for $\sigma_{S}(x,\omega)$ . The total cross-section $\sigma(x,\omega)$ is then simply the log-normal random field with values $\sigma(x,\omega)=\sigma_{S}(x,\omega)+\sigma_{A}(x)$ . In particular, we assume that $\log\sigma_{S}$ is a correlated zero mean Gaussian random field, with covariance function defined by

[TABLE]

This class of covariances is called the Matérn class. It is parametrised by the smoothness parameter $\nu\geq 0.5$ ; $\lambda_{C}$ is the correlation length, $\sigma_{var}^{2}$ is the variance, $\Gamma$ is the gamma function and $K_{\nu}$ is the modified Bessel function of the second kind. The limiting case, i.e. $\nu\to\infty$ , corresponds to the Gaussian covariance function $C_{\infty}(x,y)\ =\ \sigma_{var}^{2}\exp(-|x-y|^{2}/\lambda_{C}^{2})$ .

To sample from $\sigma_{S}$ we use the Karhunen-Loève (KL) expansion of $\log\sigma_{S}$ , i.e.,

[TABLE]

where $Z_{i}\sim\mathcal{N}(0,1)$ i.i.d. Here $\xi_{i}$ and $\eta_{i}$ are the eigenvalues and the $L^{2}(0,1)$ -orthogonal eigenfunctions of the covariance integral operator associated with kernel given by the covariance function in (7). In practice, the KL expansion needs to be truncated after a finite number of terms (here denoted $d$ ). The accuracy of this truncation depends on the decay of the eigenvalues Lord:14 . For $\nu<\infty$ , this decay is algebraic and depends on the smoothness parameter $\nu$ . In the Gaussian covariance case the decay is exponential. Note that for the Matérn covariance with $\nu=0.5$ , the eigenvalues and eigenfunctions can be computed analytically Lord:14 . For other cases of $\nu$ , we numerically compute the eigensystem using the Nyström method - see, for example, EiErUl:07 .

The goal of stochastic uncertainty quantification is to understand how the randomness in $\sigma_{S}$ and $\sigma$ propagates to functionals of the scalar or angular flux. Such quantities of interest may be point values, integrals or norms of $\phi$ or $\psi$ . They are random variables and the focus is on estimating their mean, variance or distribution.

2.2 Discretisation

For each realisation $\omega\in\Omega$ , the stochastic 1D NTE (4), (5), (3) is an integro-differential equation in two variables, space and angle. For ease of presentation, we suppress the dependency on $\omega\in\Omega$ for the moment.

We use a $2N$ -point quadrature rule $\int_{-1}^{1}f(\mu)d\mu\approx\sum_{|k|=1}^{N}w_{k}f(\mu_{k})$ with nodes $\mu_{k}\in[-1,1]\backslash\{0\}$ and positive weights $w_{k}$ to discretise in angle, assuming the (anti-) symmetry properties $\mu_{-k}=-\mu_{k}$ and $w_{-k}=w_{k}$ . (In later sections, we construct such a rule by using $N$ -point Gauss-Legendre rules on each of $[-1,0)$ and $(0,1]$ .)

To discretise in space, we introduce a mesh $0=x_{0}<x_{1}<\ldots<x_{M}=1$ which is assumed to resolve any discontinuities in the cross-sections $\sigma,\sigma_{S}$ and is also quasiuniform - i.e. the subinterval lengths $h_{j}:=x_{j}-x_{j-1}$ satisfy $\gamma h\leq h_{j}\leq h:=\max_{j=1,\ldots M}h_{j},$ for some constant $\gamma>0$ . Employing a simple Crank-Nicolson method for the transport part of (4), (5) and combining it with the angular quadrature rule above we obtain the classical diamond-differencing scheme:

[TABLE]

where

[TABLE]

Here $\sigma_{j-1/2}$ denotes the value of $\sigma$ at the mid-point of the interval $I_{j}=(x_{j-1},x_{j})$ , with the analogous meaning for $\sigma_{S,j-1/2}$ and $F_{j-1/2}$ . The notation reflects the fact that (in the next section) we will associate the unknowns $\Psi_{k,j}$ in (9) with the nodal values $\psi_{k,h}(x_{j})$ of continuous piecewise-linear functions $\psi_{k,h}\approx\psi(\cdot,\mu_{k})$ .

Finally, (9) and (10) have to be supplemented with the boundary conditions $\Psi_{k,0}=0$ , for $k>0$ and $\Psi_{k,M}=0$ , for $k<0$ . If the right-hand side of (9) were known, then (9) could be solved simply by sweeping from left to right (when $k>0$ ) and from right to left (when $k<0$ ). The appearance of $\Phi_{j-1/2}$ on the right-hand side means that (9) and (10) consitute a coupled system with solution $(\Psi,\Phi)\in\mathbb{R}^{2NM}\times\mathbb{R}^{M}$ . It is helpful to think of $\Psi$ as being composed of $2N$ subvectors $\Psi_{k}$ , each with $M$ entries $\Psi_{k,j}$ , consisting of approximations to $\psi(x_{j},\mu_{k})$ with $x_{j}$ ranging over all free nodes.

The coupled system (9) and (10) can be written in matrix form as

[TABLE]

Here, the vector $\Phi\in\mathbb{R}^{M}$ contains the approximations of the scalar flux at the $M$ midpoints of the spatial mesh. The matrix $T$ is a block diagonal $2NM\times 2NM$ matrix, representing the left hand side of (9). The $2N$ diagonal blocks of $T$ , one per angle, are themselves bi-diagonal. The $2NM\times M$ matrix $\Sigma_{S}$ simply consists of $2N$ identical diagonal blocks, one per angle, representing the multiplication of $\Phi$ by $\sigma_{S}$ at the midpoints of the mesh. The $M\times 2NM$ matrix $P$ represents the right hand side of (10), i.e. averaging at the midpoints and quadrature. The matrix $I$ denotes the $M\times M$ identity matrix. The vector $F\in\mathbb{R}^{2NM}$ contains $2N$ copies of the source term evaluated at the $M$ midpoints of the spatial mesh.

2.3 Direct and Iterative Solvers

We now wish to find the (approximate) fluxes in the linear system (11). We note that the matrix $T$ is invertible and has a useful sparsity structure that allows its inverse to be calculated in $\mathcal{O}(MN)$ operations. However, the bordered system (11) is not as easy to invert, due to the presence of $\Sigma_{S}$ and $P$ .

To exploit the sparsity of $T$ , we do block elimination on (11) obtaining the Schur complement system for the scalar flux, i.e.,

[TABLE]

which now requires the inversion of a smaller (dense) matrix. Note that (12) is a finite-dimensional version of the reduction of the integro-differential equation (4), (5) to the integral form of the NTE, see (20). In this case, the two dominant computations with $\mathcal{O}(M^{2}N)$ and $\mathcal{O}(M^{3})$ operations respectively, are the triple matrix product $PT^{-1}\Sigma_{S}$ in the construction of the Schur complement and the $LU$ factorisation of the $M\times M$ matrix $\left(I-PT^{-1}\Sigma_{S}\right)$ . This leads to a total

[TABLE]

We note that for stability reasons (see §3, also PiSc:83 in a simpler context), the number of spatial and angular points should be related. A suitable choice is $M\sim N$ , leading to a cost of the direct solver of $\mathcal{O}(M^{3})$ in general.

The second approach for solving (11) is an iterative solver commonly referred to as source iteration, cf. Bl:16 . The form of (12) naturally suggests the iteration

[TABLE]

where $\Phi^{(k)}$ is the approximation at the $k$ th iteration, with $\Phi^{(0)}=PT^{-1}F$ . This can be seen as a discrete version of an iterative method for the integral equation (20).

In practice, we truncate after $K$ iterations. The dominant computations in the source iteration are the $K$ multiplications with $PT^{-1}\Sigma_{S}$ . Exploiting the sparsity of all the matrices involved, these multiplications cost $\mathcal{O}(MN)$ operations, leading to an overall

[TABLE]

Our numerical experiments in Section 5 show that for $N=2M$ the hidden constants in the two estimates (13) and (15) are approximately the same. Hence, whether the iterative solver is faster than the direct solver depends on whether the number of iterations $K$ to obtain an accurate enough solution is smaller or larger than $M$ .

There are sharp theoretical results on the convergence of source iteration for piecewise smooth cross-sections (Bl:16, , Thm 2.20). In particular, if $\phi^{(K)}(\omega)$ denotes the approximation to $\phi(\omega)$ after $K$ iterations, then

[TABLE]

for some constant $C^{\prime}$ and $\eta\leq 1$ . That is, the error decays geometrically with rate no slower than the spatial maximum of $\sigma_{S}/\sigma$ . This value depends on $\omega$ and will change pathwise. Using this result as a guide together with (6), we assume that the convergence of the $L^{2}$ -error with respect to $K$ can be bounded by

[TABLE]

for some constant $C$ that we will estimate numerically in Section 5.

3 Summary of Theoretical Results

The rigorous analysis of UQ for PDEs with random coefficients requires estimates for the error when discretisations in physical space (e.g. by finite differences) and probability space (e.g. by sampling techniques) are combined. The physical error estimates typically need to be probabilistic in form (e.g. estimates of expectation of the physical error). Such estimates are quite well-developed for elliptic PDEs - see for example ChSc:13 but this question is almost untouched for the transport equation (or more specifically the NTE). We outline here some results which are proved in the forthcoming paper GrPaSc:17 . This paper proceeds by first giving an error analysis for (1), (2) with variable cross-sections, which is explicit in $\sigma,\sigma_{S}$ , and then uses this to derive probabilistic error estimates for the spatial discretisation (9), (10).

The numerical analysis of the NTE (and related integro-differential equation problems such as radiative transfer) dates back at least as far as the work of H.B. Keller Ke:60 . After a huge growth in the mathematics literature in the 1970’s and 1980’s, progress has been slower since. This is perhaps surprising, since discontinuous Galerkin (DG) methods have enjoyed a massive recent renaissance and the solution of the neutron transport problem was one of the key motivations behind the original introduction of DG ReHi:73 . Even today, an error analysis of the NTE with variable (even deterministic) cross-sections (with explicit dependence on the data) is still not available, even for the model case of mono-energetic 1D slab geometry considered here.

The fundamental paper on the analysis of the discrete ordinates method for the NTE is PiSc:83 . Here a full analysis of the combined effect of angular and spatial discretisation is given under the assumption that the cross-sections $\sigma$ and $\sigma_{S}$ in (4) are constant. The delicate relation between spatial and angular discretisation parameters required to achieve stability and convergence is described there. Later research e.g. As:98 , As:09 produced analogous results for models of increasing complexity and in higher dimensions, but the proofs were mostly confined to the case of cross-sections that are constant in space. A separate and related sequence of papers (e.g. LaNe:82 , Vi:84 , and AlViGa:89 ) allow for variation in cross-sections, but error estimates explicit in this data are not available there.

The results outlined here are orientated to the case when $\sigma,\sigma_{S}$ have relatively rough fluctuations. As a precursor to attacking the random case, we first consider rough deterministic coefficients defined as follows. We assume that there is some partition of $[0,1]$ and that $\sigma,\sigma_{S}$ are $C^{\eta}$ functions on each subinterval of the partition (with $\eta\in(0,1]$ ), but that $\sigma,\sigma_{S}$ may be discontinuous across the break points. We assume that the mesh $x_{j}$ introduced in §2.2 resolves these break points. (Here $C^{\eta}$ is the usual Hölder space of index $\eta$ with norm $\|\cdot\|_{\eta}$ .) We also assume that the source function $f\in C^{\eta}$ .

When discussing the error when (9), (10) is applied to (1), (2), it is useful to consider the “pure transport” problem:

[TABLE]

and with $g\in C$ a generic right-hand side (where $\mu$ is now a parameter). Application of the Crank-Nicolson method (as in (9)) yields

[TABLE]

with analogous boundary conditions, where, for any continuous function $c$ , we use $c_{j-1/2}$ to denote $c(x_{j-1/2})$ . Letting $V^{h}$ denote the space of continuous piecewise linear functions with respect to the mesh $\{x_{j}\}$ , (19) is equivalent to seeking a $u^{h}\in V^{h}$ (with nodal values $U_{j}$ ) such that

[TABLE]

and $\widetilde{c}$ denotes the piecewise constant function with respect to the grid $\{x_{j}\}$ which interpolates $c$ at the mid-points of subintervals.

It is easy to show that both (18) and (19) have unique solutions and we denote the respective solution operators by $\mathcal{S}_{\mu}$ and $\mathcal{S}^{h}_{\mu}$ , i.e.

[TABLE]

Bearing in mind the angular averaging process in (2) and (10), it is useful to then introduce the corresponding continuous and discrete spatial operators:

[TABLE]

It is easy to see (and well known classically - e.g. KaKe:77 ) that

[TABLE]

where $E_{1}$ is the exponential integral and the function $\tau(x,y)=\int_{x}^{y}\sigma$ is known as the optical path. In fact (even when $\sigma$ is merely continuous), $\mathcal{K}$ is a compact Fredholm integral operator on a range of function spaces and $\mathcal{K}^{h,N}$ is a finite rank approximation to it. The study of these integral operators in the deterministic case is a classical topic, e.g. Sl:75 . In the case of random $\sigma$ , $\mathcal{K}$ is an integral operator with a random kernel which merits further investigation. Returning to (1), (2), we see readily that

[TABLE]

Moreover (9) and (10) correspond to a discrete analogue of (20) as follows. Introduce the family of functions $\psi^{h,N}_{k}\in V^{h}$ , $|k|=1,\ldots,N$ , by requiring $\psi^{h,N}_{k}$ to have nodal values $\Psi_{k,j}$ . Then set

[TABLE]

and it follows that (9) and (10) may be rewritten (for each $j=1,...,M$ )

[TABLE]

and thus

[TABLE]

The numerical analysis of (9) and (10) is done by analysing (the second equation in) (21) as an approximation of the second equation in (20). This is studied in detail in PiSc:83 for constant $\sigma,\sigma_{S}$ . In GrPaSc:17 we discuss the variable case, obtaining all estimates explicitly in $\sigma,\sigma_{S}$ . Elementary manipulation on (20) and (21) shows that

[TABLE]

and so

[TABLE]

The error analysis in GrPaSc:17 proceeds by estimating the two terms on the right-hand side of (23) separately. We summarise the results in the lemmas below. To avoid writing down the technicalities (which will be given in detail in GrPaSc:17 ), in the following results, we do not give the explicit dependence of the constants $C_{i},\ i=1,2,\ldots,$ on the cross sections $\sigma$ and $\sigma_{S}$ . For simplicity we restrict our summary to the case when the right-hand side of (19) is the average of $g$ over $I_{j}$ (rather than the point value $g_{j-1/2}$ ). The actual scheme (19) is then analysed by a perturbation argument, see GrPaSc:17 .

Lemma 1

Suppose $N$ is sufficiently large and $h\log N$ is sufficiently small. Then

[TABLE]

where $C_{1}$ depends on $\sigma$ and $\sigma_{S}$ , but is independent of $h$ and $N$ .

Sketch of proof The proof is obtained by first obtaining an estimate of the form (24) for the quantity $\|(I-\mathcal{K}\sigma_{S})^{-1}\|_{\infty}$ , and then showing that the perturbation $\|\mathcal{K}-\mathcal{K}^{h,N}\|_{\infty}$ is small, when $N$ is sufficiently large and $h\log N$ is sufficiently small. (The constraint linking $h$ and $\log N$ arises because the transport equation (1) has a singularity at $\mu=0$ .) The actual values of $h,N$ which are sufficient to ensure that the bound (24) holds depend on the cross-sections $\sigma$ , $\sigma_{S}$ .

Lemma 2

[TABLE]

where $C_{2},C_{3},C_{4}$ depend again on $\sigma$ and $\sigma_{S}$ , but are independent of $h,N$ and $f$ .

Sketch of proof Introducing the semidiscrete operator:

[TABLE]

(corresponding to applying quadrature in angle but no discretisation in space), we then write $\mathcal{K}-\mathcal{K}^{h,N}=(\mathcal{K}-\mathcal{K}^{N})+(\mathcal{K}^{N}-\mathcal{K}^{h,N})$ and consider, separately, the semidiscrete error due to quadrature in angle:

[TABLE]

and the spatial error for a given $N$ :

[TABLE]

The estimate for (25) uses estimates for the regularity of $\psi$ with respect to $\mu$ (which are explicit in the cross-sections), while (26) is estimated by proving stability of the Crank-Nicolson method and a cross-section-explicit bound on $\|\phi\|_{\eta}$ .

Putting together Lemmas 1 and 2, we obtain the following.

Theorem 3.1

Under the assumptions outlined above,

[TABLE]

Returning to the case when $\sigma,\sigma_{S}$ are random functions, this theorem provides pathwise estimates for the error. In GrPaSc:17 , these are turned into estimates in the corresponding Bochner space provided the coefficients $C_{i}$ are bounded in probability space. Whether this is the case depends on the choice of the random model for $\sigma,\sigma_{S}$ .

In particular, using the results in (ChSc:13, , §2), GrKu:15 , it can be shown that $C_{i}\in L^{p}(\Omega)$ , for all $1\leq p<\infty$ , for the specific choices of $\sigma$ and $\sigma_{S}$ in §2. Hence, we have:

Corollary 1

For all $1\leq p<\infty$ ,

[TABLE]

where $C$ is independent of $h,N$ and $f$ .

4 Modern Variants of Monte Carlo

Let $Q(\omega)\in\mathbb{R}$ denote a functional of $\phi$ or $\psi$ representing a quantity of interest. We will focus on estimating ${\mathbb{E}}[Q]$ , the expected value of $Q$ . Since we are not specific about what functionals we are considering, this includes also higher order moments or CDFs of quantities of interest. The expected value is a high-dimensional integral and the goal is to apply efficient quadrature methods in high dimensions. We consider Monte Carlo type sampling methods.

As outlined above, to obtain samples of $Q(\omega)$ the NTE has to be approximated numerically. First, the random scattering cross section $\sigma_{S}$ in (4) is sampled using the KL expansion of $\log\sigma_{S}$ in (8) truncated after $d$ terms. The stochastic dimension $d$ is chosen sufficiently high so that the truncation error is smaller than the other approximation errors. For each $n\in\mathbb{N}$ , let $Z^{n}\in\mathbb{R}^{d}$ be a realisation of the multivariate Gaussian coefficient $Z:=(Z_{i})_{i=1,\ldots,d}$ in the KL expansion (8). Also, denote by $Q_{h}(Z^{n})$ the approximation of the $n$ th sample of $Q$ obtained numerically using a spatial grid with mesh size $h$ and $2N$ angular quadrature points. We assume throughout that $N\sim 1/h$ , so there is a single discretisation parameter $h$ .

We will consider various unbiased, sample-based estimators $\widehat{Q}_{h}$ for the expected value ${\mathbb{E}}[Q]$ and we will quantify the accuracy of each estimator by its mean square error (MSE) $e(\widehat{Q}_{h})^{2}$ . Since $\widehat{Q}_{h}$ is assumed to be an unbiased estimate of ${\mathbb{E}}[Q_{h}]$ , i.e. ${\mathbb{E}}[\widehat{Q}_{h}]={\mathbb{E}}[Q_{h}]$ , the MSE can be expanded as

[TABLE]

i.e., the squared bias due to the numerical approximation plus the sampling (or quadrature) error $\mathbb{V}[\widehat{Q}_{h}]={\mathbb{E}}[(\widehat{Q}_{h}-{\mathbb{E}}[Q_{h}])^{2}]$ . In order to compare computational costs of the various methods we will consider their $\epsilon$ -cost $\mathcal{C}_{\epsilon}$ , that is, the number of floating point operations to achieve a MSE $e(\widehat{Q}_{h})^{2}$ less than $\epsilon^{2}$ .

To bound the $\epsilon$ -cost for each method, we make the following assumptions on the discretisation error and on the average cost to compute a sample from $Q_{h}$ :

[TABLE]

for some constants $\alpha,\gamma>0$ . We have seen in Section 2 that (29) holds with $\gamma$ between $2$ and $3$ . The new theoretical results in Section 3 guarantee that (28) also holds for some $0<\alpha\leq 1$ . Whilst the results of Section 3 (and GrPaSc:17 ) are shown to be sharp in some cases, the practically observed values for $\alpha$ in the numerical experiments here are significantly bigger, with values between 1.5 and 2.

In recent years, many alternative methods for high-dimensional integrals have emerged that use tensor product deterministic quadrature rules combined with sparse grid techniques to reduce the computational cost XiKa:02 ; BaNoTe:07 ; NoTeWe:08 ; GuWeZh:14 ; AyEa:15 ; Fi:11 ; Gi6:13 . The efficiency of these approaches relies on high levels of smoothness of the parameter to output map and in general their cost may grow exponentially with the number of parameters (the curse of dimensionality). Such methods are not competitive with Monte Carlo type methods for problems with low smoothness in the coefficients, where large numbers of parameters are needed to achieve a reasonable accuracy. For example, in our later numerical tests we will consider problems in up to 3600 stochastic dimensions.

However, standard Monte Carlo methods are notoriously slow to converge, requiring thousands or even millions of samples to achieve acceptable accuracies. In our application, where each sample involves the numerical solution of an integro-differential equation this very easily becomes intractable. The novel Monte Carlo approaches that we present here, aim to improve this situation in two complementary ways. Quasi-Monte Carlo methods reduce the number of samples to achieve a certain accuracy dramatically by using deterministic ideas to find well distributed samples in high dimensions. Multilevel methods use the available hierarchy of numerical approximations to our integro-differential equation to shift the bulk of the computations to cheap, inaccurate coarse models while providing the required accuracy with only a handful of expensive, accurate model solves.

4.1 Standard Monte Carlo

The (standard) Monte Carlo (MC) estimator for ${\mathbb{E}}[Q]$ is defined by

[TABLE]

where $N_{MC}$ is the number of Monte Carlo points/samples $Z^{n}\sim\mathcal{N}(0,I)$ . The sampling error of this estimator is $\mathbb{V}[\widehat{Q}_{h}^{MC}]=\mathbb{V}[Q_{h}]/N_{MC}$ .

A sufficient condition for the MSE to be less than $\epsilon^{2}$ is for both the squared bias and the sampling error in (27) to be less than $\epsilon^{2}/2$ . Due to assumption (28), a sufficient condition for the squared bias to be less than $\epsilon^{2}/2$ is $h\sim\epsilon^{1/\alpha}$ . Since $\mathbb{V}[Q_{h}]$ is bounded with respect to $h\to 0$ , the sampling error of $\widehat{Q}_{h}^{MC}$ is less than $\epsilon^{2}/2$ for $N_{MC}\sim\epsilon^{-2}$ . With these choices of $h$ and $N_{MC}$ , it follows from Assumption (29) that the mean $\epsilon$ -cost of the standard Monte Carlo estimator is

[TABLE]

Our aim is to find alternative methods that have a lower $\epsilon$ -cost.

4.2 Quasi-Monte Carlo

The first approach to reduce the $\epsilon$ -cost is based on using quasi-Monte Carlo (QMC) rules, which replace the random samples in (30) by carefully chosen deterministic samples and treat the expected value with respect to the $d$ -dimensional Gaussian $Z$ in (8) as a high-dimensional integral with Gaussian measure.

Initially interest in QMC points arose within number theory in the 1950’s, and the theory is still at the heart of good QMC point construction today. Nowadays, the fast component-by-component construction (CBC) NuCo:06 provides a quick method for generating good QMC points, in very high-dimensions. Further information on the best choices of deterministic points and QMC theory can be found in e.g. SlWo:98 ; DiPi:10 ; Ni:10 ; DiFr:13 .

The choice of QMC points can be split into two categories; lattice rules and digital nets. We will only consider randomised rank-1 lattice rules here. In particular, given a suitable generating vector $z\in\mathbb{Z}^{d}$ and $R$ independent, uniformly distributed random shifts $(\Delta_{r})_{r=1}^{R}$ in $[0,1]^{d}$ , we construct $N_{QMC}=R\,P$ lattice points in the unit cube $[0,1]^{d}$ using the simple formula

[TABLE]

where “frac” denotes the fractional part function applied componentwise and the number of random shifts $R$ is fixed and typically small e.g. $R=8,16$ . To transform the lattice points $v^{n}\in[0,1]^{d}$ into “samples” $\widetilde{Z}^{n}\in\mathbb{R}^{d}$ , $n=1,\ldots,N_{QMC}$ , of the multivariate Gaussian coefficients $Z$ in the KL expansion (8) we apply the inverse cumulative normal distribution. See Gr:11 for details.

Finally, the QMC estimator is given by

[TABLE]

Note that this is essentially identical in its form to the standard MC estimator (30), but crucially with deterministically chosen and then randomly shifted $\widetilde{Z}^{n}$ . The random shifts ensure that the estimator is unbiased, i.e. ${\mathbb{E}}[\widehat{Q}_{h}^{QMC}]={\mathbb{E}}[Q_{h}]$ .

The bias for this estimator is identical to the MC case, leading again to a choice of $h\sim\varepsilon^{1/\alpha}$ to obtain a MSE of $\varepsilon^{2}$ . Here the MSE corresponds to the mean square error of a randomised rank-1 lattice rule with $P$ points averaged over the shift $\Delta\sim\mathcal{U}([0,1]^{d})$ . In many cases, it can be shown that the quadrature error, i.e., the second term in (27), converges with $\mathcal{O}(N_{QMC}^{-1/2\lambda})$ , with $\lambda\in(\frac{1}{2},1]$ . That is, we can potentially achieve $\mathcal{O}(N_{QMC}^{-1})$ convergence for $\widehat{Q}_{h}^{QMC}$ as opposed to the $\mathcal{O}(N_{MC}^{-1/2})$ convergence for $\widehat{Q}_{h}^{MC}$ . A rigorous proof of the rate of convergence requires detailed analysis of the quantity of interest (the integrand), in an appropriate weighted Sobolev space, e.g. GrKu:15 . Such an analysis is still an open question for this class of problems, and we do not attempt it here. Moreover, the generating vector $z$ does in theory have to be chosen problem specific. However, standard generating vectors, such as those available at Kuo:QMClattice , seem to also work well (and better than MC samples). Furthermore, we note the recent developments in “higher-order nets” GoDi:15 ; DiKuGi:14a , which potentially increase the convergence of QMC methods to $\mathcal{O}(N_{QMC}^{-q})$ , for $q\geq 2$ .

Given the improved rate of convergence of the quadrature error and fixing the number of random shifts to $R=8$ , it suffices to choose $P\ \sim\ \epsilon^{-2\lambda}$ for the quadrature error to be $\mathcal{O}(\varepsilon^{2})$ . Therefore it follows again from Assumption (29) that the $\epsilon$ -cost of the QMC method satisfies

[TABLE]

When $\lambda\to\frac{1}{2}$ , this is essentially a reduction in the $\epsilon$ -cost by a whole order of $\epsilon$ . In the case of non-smooth random fields, we typically have $\lambda\approx 1$ and the $\epsilon$ -cost grows with the same rate as that of the standard MC method. However, in our experiments and in experiments for diffusion problems Gr:11 , the absolute cost is always reduced.

4.3 Multilevel Methods

The main issue with the above methods is the high cost for computing the samples $\{Q_{h}(Z^{(n)})\}$ , each requiring us to solve the NTE. The idea of the multilevel Monte Carlo (MLMC) method is to use a hierarchy of discrete models of increasing cost and accuracy, corresponding to a sequence of decreasing discretisation parameters $h_{0}>h_{1}>...>h_{L}=h$ . Here, only the most accurate model on level $L$ is designed to give a bias of $\mathcal{O}(\epsilon)$ by choosing $h_{L}=h\sim\epsilon^{1/\alpha}$ as above. The bias of the other models can be significantly higher.

MLMC methods were first proposed in an abstract way for high-dimensional quadrature by Heinrich He:01 and then popularised in the context of stochastic differential equations in mathematical finance by Giles Gi:08 . MLMC methods were first applied in uncertainty quantification in BaScZo:11 ; Cl:11 . The MLMC method has quickly gained popularity and has been further developed and applied in a variety of other problems. See Gi:15 for a comprehensive review. In particular, the multilevel approach is not restricted to standard MC estimators and can also be used in conjunction with QMC estimators GiWa:09 ; KuScSl:15 ; Kuo:15 or with stochastic collocation TeJa:15 . Here, we consider multilevel variants of standard MC and QMC.

MLMC methods exploit the linearity of the expectation, writing

[TABLE]

Each of the expected values on the right hand side is then estimated separately. In particular, in the case of a standard MC estimator with $N_{\ell}$ samples for the $\ell$ th term, we obtain the MLMC estimator

[TABLE]

Here, $\{Z^{\ell,n}\}_{n=1}^{N_{\ell}}$ denotes the set of i.i.d. samples on level $\ell$ , chosen independently from the samples on the other levels.

The key idea in MLMC is to avoid estimating ${\mathbb{E}}[Q_{h}]$ directly. Instead, the expectation ${\mathbb{E}}[Y_{0}]={\mathbb{E}}[Q_{h_{0}}]$ of a possibly strongly biased, but cheap approximation of $Q_{h}$ is estimated. The bias of this coarse model is then estimated by a sum of correction terms ${\mathbb{E}}[Y_{\ell}]$ using increasingly accurate and expensive models. Since the $Y_{\ell}$ represent small corrections between the coarse and fine models, it is reasonable to conjecture that there exists $\beta>0$ such that

[TABLE]

i.e., the variance of $Y_{\ell}$ decreases as $h_{\ell}\to 0$ . This is verified for diffusion problems in ChSc:13 . Therefore the number of samples $N_{\ell}$ to achieve a prescribed accuracy on level $\ell$ can be gradually reduced, leading to a lower overall cost of the MLMC estimator. More specifically, we have the following cost savings:

•

On the coarsest level, using (29), the cost per sample is reduced from $\mathcal{O}(h^{-\gamma})$ to $\mathcal{O}(h_{0}^{-\gamma})$ . Provided $\mathbb{V}[Q_{h_{0}}]\approx\mathbb{V}[Q_{h}]$ and $h_{0}$ can be chosen independently of $\epsilon$ , the cost of estimating ${\mathbb{E}}[Q_{h_{0}}]$ to an accuracy of $\varepsilon$ in (33) is reduced to $\mathcal{O}(\epsilon^{-2})$ .

•

On the finer levels, the number of samples $N_{\ell}$ to estimate ${\mathbb{E}}[Y_{\ell}]$ to an accuracy of $\varepsilon$ in (33) is proportional to $\mathbb{V}[Y_{\ell}]\epsilon^{-2}$ . Now, provided $\mathbb{V}[Y_{\ell}]=\mathcal{O}(h_{\ell}^{\beta})$ , for some $\beta>0$ , which is guaranteed if $Q_{h_{\ell}}$ converges almost surely to $Q$ pathwise, then we can reduce the number of samples as $h_{\ell}\to 0$ . Depending on the actual values of $\alpha,\;\beta$ and $\gamma$ , the cost to estimate ${\mathbb{E}}[Y_{L}]$ on the finest level can, in the best case, be reduced to $\mathcal{O}(\epsilon^{-\gamma/\alpha})$ .

The art of MLMC is to balance the number of samples across the levels to minimise the overall cost. This is a simple constrained optimisation problem to achieve $\mathbb{V}[\widehat{Q}_{h}^{MLMC}]\leq\epsilon^{2}/2$ . As shown in Gi:08 , using the technique of Lagrange Multipliers, the optimal number of samples on level $\ell$ is given by

[TABLE]

where $\mathcal{C}_{\ell}:={\mathbb{E}}\left[\mathcal{C}(Y_{\ell})\right]$ . In practice, it is necessary to estimate $\mathbb{V}[Y_{\ell}]$ and $\mathcal{C}_{\ell}$ in (35) from the computed samples, updating $N_{\ell}$ as the simulation progresses.

Using these values of $N_{\ell}$ it is possible to establish the following theoretical complexity bound for MLMC Cl:11 .

Theorem 4.1

Let us assume that (28), (34) and (29) hold with $\alpha,\beta,\gamma>0$ . Then, with $L\sim\log(\epsilon^{-1})$ and with the choice of $\{N_{\ell}\}_{l=0}^{L}$ in (35) we have

[TABLE]

When $\beta=\gamma$ , then there is an additional factor $\log(\epsilon^{-1})$ .

Using lattice points $\widetilde{Z}^{\ell,n}$ , as defined in Section 4.2, instead of the random samples $Z^{\ell,n}$ we can in the same way define a multilevel quasi-Monte Carlo (MLQMC) estimator

[TABLE]

The optimal values for $\widetilde{N}_{\ell}$ can be computed in a similar way to those in the MLMC method. However, they depend strongly on the rate of convergence of the lattice rule and in particular on the value of $\lambda$ which is difficult to estimate accurately. We will give a practically more useful approach below.

It is again possible to establish a theoretical complexity bound, cf. KuScSl:15 ; Kuo:15 .

Theorem 4.2

Let us assume that (28) and (29) hold with $\alpha,\gamma>0$ and that there exists $\lambda\in(\frac{1}{2},1]$ and $\beta>0$ such that

[TABLE]

Let the number of random shifts on each level be fixed to $R$ and let $L\sim\log(\epsilon^{-1})$ . Then, there exists a choice of $\{N_{\ell}\}_{l=0}^{L}$ such that

[TABLE]

When $\beta\lambda=\gamma$ , then there is an additional factor $\log(\epsilon^{-1})^{1+\lambda}$ .

The convergence rate can be further improved by using higher order QMC rules DiKuGi:14 , but we will not consider this here.

It can be shown, for the theoretically optimal values of $N_{\ell}$ , that there exists a constant $C$ such that

[TABLE]

independently of the level $\ell$ and of the value of $\lambda$ (cf. (Kuo:15, , Sect. 3.3)). The same holds for MLMC. This leads to the following adaptive procedure to choose $N_{\ell}$ suggested in GiWa:09 , which we use in our numerical experiments below instead of (35) .

In particular, starting with an initial number of samples on all levels, we alternate the following two steps until $\mathbb{V}[\widehat{Q}_{h}^{MLMC}]\leq\epsilon^{2}/2$ :

(i)

Estimate $\mathcal{C}_{\ell}$ and $\mathbb{V}_{\Delta}[\widehat{Y}^{QMC}_{\ell}]$ (resp. $\mathbb{V}[\widehat{Y}^{MC}_{\ell}]$ ). 2. (ii)

Compute

[TABLE]

and double the number of samples on level $\ell^{*}$ .

This procedure ensures that, on exit, (40) is roughly satisfied and the numbers of samples across the levels $N_{\ell}$ are quasi-optimal.

We use this adaptive procedure for both the MLMC and the MLQMC method. The lack of optimality typically has very little effect on the actual computational cost. Since the optimal formula (35) for MLMC also depends on estimates of $\mathcal{C}_{\ell}$ and $\mathbb{V}[Y_{\ell}]$ , it sometimes even leads to a better performance. An additional benefit in the case of MLQMC is that the quadrature error in rank-1 lattice rules is typically lowest when the numbers of lattice points is a power of 2.

5 Numerical Results

We now present numerical results to confirm the gains that are possible with the novel multilevel and quasi-Monte Carlo method applied to our 1D NTE model (1), (2), (3). We assume that the scattering cross-section $\sigma_{S}$ is a log-normal random field as described in Section 2.1 and that the absorption cross section is constant, $\sigma_{A}\equiv\exp(0.25)$ . We assume no fission, $\sigma_{F}\equiv 0$ , and a constant source term $f=\exp(1)$ . We consider two cases, characterised by the choice of smoothness parameter $\nu$ in the Matérn covariance function (7). For the first case, we choose $\nu=0.5$ . This corresponds to the exponential covariance and in the following is called the “exponential field”. For the second case, denoted the “Matérn field”, we choose $\nu=1.5$ . The correlation length and the variance are $\lambda_{C}=1$ and $\sigma_{var}^{2}=1$ , respectively. The quantity of interest we consider is

[TABLE]

For the discretisation, we choose a uniform spatial mesh with mesh width $h=1/M$ and a quadrature rule (in angle) with $2N=4M$ points. The KL expansion of $\log(\sigma_{S})$ in (8) is truncated after $d$ terms. We heuristically choose $d$ to ensure that the error due to this truncation is negligible compared to the discretisation error. In particular, we choose $d=8h^{-1}$ for the Matérn field and $d=225h^{-1/2}$ for the exponential field, leading to a maximum of 2048 and 3600 KL modes, respectively, for the finest spatial resolution in each case. Even for such large numbers of KL modes, the sampling cost does not dominate because the randomness only exists in the (one) spatial dimension.

We introduce a hierarchy of levels $\ell=0,...,L$ corresponding to a sequence of discretisation parameters $h_{\ell}=2^{-\ell}h_{0}$ with $h_{0}=1/4$ , and approximate the quantity of interest in (41) by

[TABLE]

To generate our QMC points we use an (extensible) randomised rank-1 lattice rule (as presented in Section 4.2), with $R=8$ shifts. We use the generating vector lattice-32001-1024-1048576.3600, which is downloaded from Kuo:QMClattice .

5.1 A Hybrid Direct-Iterative Solver

To compute samples of the neutron flux and thus of the quantity of interest, we propose a hybrid version of the direct and the iterative solver for the Schur complement system (12) described in Section 2.3.

The cost of the iterative solver depends on the number $K$ of iterations that we take. For each $\omega$ , we aim to choose $K$ such that the $L_{2}$ -error $\|\phi(\omega)-\phi^{(K)}(\omega)\|_{2}$ is less than $\epsilon$ . To estimate $K$ we fix $h=1/1024$ and $d=3600$ and use the direct solver to compute $\phi_{h}$ for each sample $\omega$ . Let $\rho(\omega):=\|\sigma_{S}(\cdot,\omega)/\sigma(\cdot,\omega)\|_{\infty}$ . For a sufficiently large number of samples, we then evaluate

[TABLE]

and find that this quotient is less than $\log(0.5)$ in more than 99% of the cases, for $K=1,\ldots,150$ , so that we can choose $C=0.5$ in (17). We repeat the experiment also for larger values of $h$ and smaller values of $d$ to verify that this bound holds in at least 99% of the cases independently of the discretisation parameter $h$ and of the truncation dimensions $d$ .

Hence, a sufficient, a priori condition to achieve $\|\phi_{h}(\omega)-\phi_{h}^{(K)}(\omega)\|_{2}<\epsilon$ in at least 99% of the cases is

[TABLE]

where $\lceil\cdot\rceil$ denotes the ceiling function. It is important to note that $K$ is no longer a deterministic parameter for the solver (like $M$ or $N$ ). Instead, $K$ is a random variable that depends on the particular realisation of $\sigma_{S}$ . It follows from (42), using the results in (ChSc:13, , §2), GrKu:15 as in Section 3, that ${\mathbb{E}}[K(\epsilon,\cdot)]=\mathcal{O}(\log(\epsilon))$ and $\mathbb{V}[K(\epsilon,\cdot)]=\mathcal{O}\left(\log(\epsilon)^{2}\right)$ , with more variability in the case of the exponential field.

Recall from (13) and (15) that, in the case of $N=2M$ , the costs for the direct and iterative solvers are $C_{1}M^{3}$ and $C_{2}KM^{2}$ , respectively. In our numerical experiments, we found that in fact $C_{1}\approx C_{2}$ , for this particular relationship between $M$ and $N$ . This motivates a third “hybrid” solver, presented in Algorithm 1, where the iterative solver is chosen when $K(\omega)<M$ and the direct solver when $K(\omega)\geq M$ . This allows us to use the optimal solver for each particular sample.

We finish this section with a study of timings in seconds (here referred to as the cost) of the three solvers. In Fig. 1, we plot the average cost (over $2^{14}$ samples) divided by $M_{\ell}^{3}$ , against the level parameter $\ell$ . We observe that, as expected, the (scaled) expected cost of the direct solver is almost constant and the iterative solver is more efficient for larger values of $M_{\ell}$ . Over the range of values of $M_{\ell}$ considered in our experiments, a best fit for the rate of growth of the cost with respect to the discretisation parameter $h_{\ell}$ in (29) is $\gamma\approx 2.2$ , for both fields. Thus our solver has a practical complexity of $\mathcal{O}(n^{1.1})$ , where $n\sim M^{2}$ is the total number of degrees of freedom in the system.

5.2 A-Priori Error Estimates

Studying the complexity theorems of Section 4, we can see that the effectiveness of the various Monte Carlo methods depends on the parameters $\alpha$ , $\beta$ , $\gamma$ and $\lambda$ in (28), (29), (34) and (38). In this section, we will (numerically) estimate these parameters in order to estimate the theoretical computational cost for each approach.

We have already seen that $\gamma\approx 2.2$ for the hybrid solver. In Fig. 2, we present estimates of the bias ${\mathbb{E}}[Q-Q_{h_{\ell}}]$ , as well as of the variances of $Q_{h_{\ell}}$ and of $Y_{\ell}$ , computed via sample means and sample variances over a sufficiently large set of samples. We only explicitly show the curves for the Matérn field. The curves for the exponential field look similar. From these plots, we can estimate $\alpha\approx 1.9$ and $\beta\approx 4.1$ , for the Matérn field, and $\alpha\approx 1.7$ and $\beta\approx 1.9$ , for the exponential field.

To estimate $\lambda$ in (38), we need to study the convergence rate of the QMC method with respect to the number of samples $N_{QMC}$ . This study is illustrated in Fig. 3. As expected, the variance of the standard MC estimator converges with $\mathcal{O}(N_{MC}^{-1})$ . On the other hand, we observe that the variance of the QMC estimator converges approximately with $\mathcal{O}(N_{QMC}^{-1.6})$ and $\mathcal{O}(N_{QMC}^{-1.4})$ (or $\lambda=0.62$ and $\lambda=0.71$ ) for the Matérn field and for the exponential field, respectively.

We summarise all the estimated rates in Table 1.

5.3 Complexity Comparison of Monte Carlo Variants

For a fair comparison of the complexity of the various Monte Carlo estimators, we now use the a priori bias estimates in Section 5.2 to choose a suitable tolerance $\epsilon_{L}$ for each choice of $h=h_{L}$ . Let $\tau_{\ell}$ be the estimated bias on level $\ell$ . Then, for each $L=2,\ldots,6$ , we choose $h=h_{L}$ and $\epsilon_{L}:=\sqrt{2}\,\tau_{L}$ , and we plot in Fig. 4 the actual cost of each of the estimators described in Section 4 against the estimated bias on level $L$ . The numbers of samples for each of the estimators are chosen such that $\mathbb{V}[\widehat{Q}_{h}]\leq\epsilon_{\ell}^{2}/2$ . The coarsest mesh size in the multilevel methods is always $h_{0}=1/4$ . We can clearly see the benefits of the QMC sampling rule and of the multilevel variance reduction, and the excellent performance of the multilevel QMC estimator confirms that the two improvements are indeed complementary. As expected, the gains are more pronounced for the smoother (Matérn) field.

We finish by comparing the actual, observed $\epsilon$ -cost of each of the methods with the $\epsilon$ -cost predicted theoretically using the estimates for $\alpha$ , $\beta$ , $\gamma$ and $\lambda$ in Section 5.2. Assuming a growth of the $\epsilon$ -cost proportional to $\epsilon^{-r}$ , for some $r>0$ , we compare in Table 2 estimated and actual rates $r$ for all the estimators. Some of the estimated rates in Section 5.2 are fairly crude, so the good agreement between estimated and actual rates is quite impressive.

6 Conclusions

To summarise, we have presented an overview of novel error estimates for the 1D slab geometry simplification of the Neutron Transport Equation, with spatially varying and random cross-sections. In particular, we consider the discrete ordinates method with Gauss quadrature for the discretisation in angle, and a diamond differencing scheme on a quasi-uniform grid in space. We represent the spatial uncertainties in the cross-sections by log-normal random fields with Matérn covariances, including cases of low smoothness. These error estimates are the first of this kind. They allow us to satisfy key assumptions for the variance reduction in multilevel Monte Carlo methods.

We then use a variety of recent developments in Monte Carlo methods to study the propagation of the uncertainty in the cross-sections, through to a linear functional of the scalar flux. We find that the Multilevel Quasi Monte Carlo method gives us significant gains over the standard Monte Carlo method. These gains can be as large as almost two orders of magnitude in the computational $\epsilon$ -cost for $\epsilon=10^{-4}$ .

As part of the new developments, we present a hybrid solver, which automatically switches between a direct or iterative method, depending on the rate of convergence of the iterative solver which varies from sample to sample. Numerically, we observe that the hybrid solver is almost an order of magnitude cheaper than the direct solver on the finest mesh, on the other hand the direct solver is almost an order of magnitude cheaper than the iterative solver on the coarsest mesh we considered.

We conclude that modern variants of Monte Carlo based sampling methods are extremely useful for the problem of Uncertainty Quantification in Neutron Transport. This is particularly the case when the random fields are non-smooth and a large number of stochastic variables are required for accurate modelling.

Acknowledgements.

We thank EPSRC and AMEC Foster Wheeler for financial support for this project and we particularly thank Professor Paul Smith (AMECFW) for many helpful discussions. Matthew Parkinson is supported by the EPSRC Centre for Doctoral Training in Statistical Applied Mathematics at Bath (SAMBa), under project EP/L015684/1. This research made use of the Balena High Performance Computing (HPC) Service at the University of Bath.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Allen, E.J., Victory Jr, H.D., Ganguly, K.: On the convergence of finite-differenced multigroup, discrete-ordinates methods for anisotropically scattered slab media. SIAM J. Numer. Anal. 26 , 88–106 (1989).
2[2] Asadzadeh, M.: A finite element method for the neutron transport equation in an infinite cylindrical domain. SIAM J. Numer. Anal. 35 , 1299–1314 (1998).
3[3] Asadzadeh, M., Thevenot, L.: On discontinuous Galerkin and discrete ordinates approximations for neutron transport equation and the critical eigenvalue. Nuovo Cimento C 33 , 21–29 (2010).
4[4] Ayres, D.A.F., Eaton, M.D.: Uncertainty quantification in nuclear criticality modelling using a high dimensional model representation. Ann. Nucl. Energy 80 , 379–402 (2015).
5[5] Ayres, D.A.F., Park, S., Eaton, M.D.: Propagation of input model uncertainties with different marginal distributions using a hybrid polynomial chaos expansion. Ann. Nucl. Energy 66 , 1–4 (2014).
6[6] Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal. 45 , 1005–1034 (2007).
7[7] Barth, A., Schwab, C., Zollinger, N.: Multi-level Monte Carlo finite element method for elliptic PD Es with stochastic coefficients. Numer. Math. 119 , 123–161 (2011).
8[8] Blake, J.C.H.: Domain decomposition methods for nuclear reactor modelling with diffusion acceleration. Ph D Thesis, University of Bath (2016).