Propagation of Uncertainties in Density-Driven Flow
Alexander Litvinenko, Dmitry Logashenko, Raul Tempone, Gabriel Wittum,, David Keyes

TL;DR
This paper presents a parallelized method using generalized polynomial chaos to efficiently quantify how uncertainties in permeability and porosity affect density-driven flow and pollution dispersal in subsurface environments.
Contribution
It introduces a low-cost gPC surrogate model combined with parallelized solvers to accurately propagate uncertainties in density-driven flow simulations.
Findings
Effective uncertainty quantification in subsurface flow models.
Parallelized gPC method reduces computational cost.
Validated approach on Elder-like benchmark problem.
Abstract
Accurate modeling of contamination in subsurface flow and water aquifers is crucial for agriculture and environmental protection. Here, we demonstrate a parallel method to quantify the propagation of the uncertainty in the dispersal of pollution in subsurface flow. Specifically, we consider the density-driven flow and estimate how uncertainty from permeability and porosity propagates to the solution. We take an Elder-like problem as a numerical benchmark and we use random fields to model the limited knowledge on the porosity and permeability. We construct a low-cost generalized polynomial chaos expansion (gPC) surrogate model, where the gPC coefficients are computed by projection on sparse and full tensor grids. We parallelize both the numerical solver for the deterministic problem based on the multigrid method, and the quadrature over the parametric space
| Parameters | Values and Units | Description |
|---|---|---|
| 0.1 [-] | mean value of the porosity | |
| [] | molecular diffusion | |
| [] | mean value of the permeability | |
| [] | gravity | |
| [] | density of pure water | |
| [] | density of brine | |
| [] | viscosity |
| time, years | point | mean | st.d. | five quantiles | ||||
|---|---|---|---|---|---|---|---|---|
| 1.375 | 0.021 | 0.04 | 0.058 | 0.08 | 0.124 | |||
| 2.75 | 0.273 | 0.305 | 0.322 | 0.337 | 0.356 | |||
| 5.5 | 0.339 | 0.362 | 0.379 | 0.397 | 0.420 | |||
| 1.375 | 0.122 | 0.124 | 0.126 | 0.128 | 0.13 | |||
| 2.75 | 0.0025 | 0.0028 | 0.0032 | 0.0035 | 0.0038 | |||
| 5.5 | 0.099 | 0.103 | 0.106 | 0.108 | 0.111 | |||
| 1.375 | 0.147 | 0.18 | 0.211 | 0.253 | 0.352 | |||
| 2.75 | 0.001 | 0.0016 | 0.002 | 0.003 | 0.005 | |||
| 5.5 | 0.016 | 0.020 | 0.024 | 0.028 | 0.035 | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Meteorological Phenomena and Simulations · Wind and Air Flow Studies
11institutetext: RWTH Aachen, Kackertstr. 9C, Aachen, Germany, Phone: +492418099203, 11email: litvinenko, [email protected]
KAUST, Thuwal/Jeddah, Saudi Arabia, 11email: dmitry.logashenko, gabriel.wittum, [email protected]
G-CSC, Frankfurt University, Kettenhofweg 139, Frankfurt, Germany
Propagation of Uncertainties in Density-Driven Flow
Alexander Litvinenko 11
Dmitry Logashenko 22
Raul Tempone 1122
Gabriel Wittum 2233
David Keyes 22112233
Abstract
Accurate modeling of contamination in subsurface flow and water aquifers is crucial for agriculture and environmental protection. Here, we demonstrate a parallel method to quantify the propagation of the uncertainty in the dispersal of pollution in subsurface flow. Specifically, we consider the density-driven flow and estimate how uncertainty from permeability and porosity propagates to the solution. We take an Elder-like problem as a numerical benchmark and we use random fields to model the limited knowledge on the porosity and permeability. We construct a low-cost generalized polynomial chaos expansion (gPC ) surrogate model, where the gPC coefficients are computed by projection on sparse and full tensor grids. We parallelize both the numerical solver for the deterministic problem based on the multigrid method, and the quadrature over the parametric space.
1 Introduction
Accidental contamination of groundwater can be extremely hazardous and thus, accurately predicting the fate of pollutants in groundwater is essential. Certain pollutants are soluble in water and can leak into groundwater systems, such as seawater into coastal aquifers or wastewater leaks. Indeed, some pollutants can change the density of a fluid and induce density-driven flows within the aquifer. This causes faster propagation of the contamination due to convection. Thus, simulation and analysis of this density-driven flow plays an important role in predicting how pollution can migrate through an aquifer Kobus2000 ; Density-driven_Fan97 .
In contrast to transport of pollution by molecular diffusion, convection due to density-driven flow is an unstable process that can undergo quite complicated patterns of distribution. The Elder problem is a simplified but comprehensive model that describes the intrusion of salt water from a top boundary into an aquifer Voss_Souza ; Elder_Overview17 . The evolution of the concentration profile is typically referred to as fingering. Note that due to the nonlinear nature of the model, the distribution of the contamination strongly depends on the model parameters, so that the system may have several stationary solutions Johannsen2003 .
Hydrogeological formations typically have a complicated and heterogeneous structure, and geological media may consist of layers of porous media with different porosities and permeability coefficients ScheiderKroehnPueschel2012 ; ReiterLogashenkoVogelWittum2017 . Difficulty in specifying hydrogeological parameters of the media, as well as measuring the position and configuration of the layers, gives rise to certain errors. Typically, the averaged values of these quantities are used. However, due to the nonlinearities of the flow model, averaging of the parameters does not necessarily lead to the correct mathematical solution. Uncertainties arise from various factors, such as inaccurate measurements of the different parameters and the inability to measure parameters at each spatial location at any given time. To model the uncertainties, we can use random variables, random fields, and random processes. However, uncertainties in the input data can spread through the model and therefore also make the solution uncertain or incorrect. Thus, an accurate estimation of the output uncertainties is crucial, for example, for optimal control and design of the experiment.
Many techniques are used to investigate the propagation of uncertainties associated with porosity and permeability into the solution, such as classical Monte Carlo sampling. Although Monte Carlo sampling is dimension-independent, it converges slowly and therefore requires a lot of samples and these simulations may be time-consuming and costly. Alternative, less costly techniques use collocations, sparse grids and surrogate methods, each with their own advantages and disadvantages. But even advanced techniques may require hundreds to thousands of simulations and assume a certain smoothness in the quantity of interest. Our simulations contain up to 0.5-2Mi spatial mesh points and 1000-3000 time-steps, and are therefore run on a massive parallel cluster. Here, we develop methods that require much fewer simulations, from a few dozen to a few hundred simulations.
Perturbation methods CREMER15_Fingers are another class of methods that decompose the quantiles of interest (QoIs) with respect to random parameters in a Taylor series. For small perturbations, the higher order terms can be neglected, thus simplifying the analysis and numerics. These methods assume that random perturbations are small, up to 5 of the mean, depending on the problem in question. For larger perturbations, these methods are usually not valid.
Here, we use a generalized polynomial chaos expansion (gPC ) technique, where we compute the gPC coefficients by applying a Clenshaw-Curtis quadrature rule Barthelmann2000 . We use gPC to compute QoIs such as the mean, the variance and the exceedance probabilities of the mass fraction (in this case, a saltwater solution). We validate our obtained results using the quasi-Monte Carlo approach. Both methods require the computation of multiple simulations (scenarios) for variable porosity and permeability coefficients.
To the best of our knowledge, no other reported works have solved the Elder’s problem Voss_Souza ; Elder_Overview17 with uncertain porosity and permeability parameters using gPC . The highly unstable free convective flow has been studied Xie12_Fingers , where the number of fingers and the deepest plume front, the vertical center of solute mass, total solute mass, and solute flux through the source zone was investigated. The authors concluded that the development of a stochastic framework for modeling free convection is required, which has significant consequences for model simulation and testing, as well as process prediction. Overviews of the uncertainties in modeling groundwater solute transport OverviewUncert93 and modeling soil processes SoilOverview16 have been performed, as well as reconnecting stochastic methods with hydrogeological applications NowakStochMethods18 , which included recommendations for optimization and risk assessment. Fundamentals of stochastic hydrogeology, an overview of stochastic tools and accounting for uncertainty have been described in rubin2003applHydro .
Recent advances in uncertainty quantification, probabilistic risk assessment, and decision-making under uncertainty in hydrogeologic applications have been reviewed in TARTAKOVSKYI_Risk13 , where the author reviewed probabilistic risk assessment methods in hydrogeology under parametric, geologic and model uncertainties. Density-driven vertical transport of saltwater through the freshwater lens has been modeled in POST17_Density-driven .
Various methods can be used to compute the desired statistics, such as direct integration methods (MC, qMC, collocation) and surrogate-based methods (generalized polynomial chaos approximation, stochastic Galerkin) Philipp12 ; matthiesKeese03cmame ; babuska2004galerkin . Direct integration methods compute statistics by sampling unknown input coefficients (they are undefined and therefore uncertain) and solving the corresponding PDE, while the surrogate-based method computes a low-cost functional (polynomial, exponential, trigonometrical) approximation of QoI. Examples of the surrogate-based methods include radial basis functions liu2014 ; bompard2010 ; Loeven2007 ; giunta2004 , sparse polynomials chkifa-adapt-stochfem-2015 ; Sudret_sparsePCE , polynomial chaos expansion Xiu ; Habib09_PCE ; ConradMarzouk13 ; Dongbin and low-rank tensor approximation DolgLitv15 ; Kim2006 ; dolgov2014computation ; LitvSampling13 ; Litvin11PAMM . A nonintrusive stochastic Galerkin was introduced in Giraldi2014 . The surrogate-based methods have an additional advantage if the gradients are available liu2017quantification . Sparse grids methods to integrate high-dimensional integrals are considered in smoljak63 ; Griebel_Bungartz ; Griebel ; spiterp ; novakRitter97 ; Novak1996 ; gerstnerGriebel98-numint ; novakRitter99-simple ; ConradMarzouk13 ; CONSTANTINE12 . A software package is available in petrasSmolpak .
The quantification of uncertainties in stochastic PDEs can pose a great challenge due to the possibly large number of random variables involved, and the high cost of each deterministic solution.
For problems with a large number of random variables, the methods based on a regular grid (in stochastic space) are less preferable due to the high computational cost. Instead, methods based on a scattered sampling scheme (MC, qMC) give more freedom in choosing the sample number and protect against sample failures. The MC quadrature and its variance-reduced variants have dimension-independent convergence , and qMC has the worst case convergence , where is the dimension of the stochastic space matthies2007 . Collocations on sparse or full grids are affected by the dimension of the integration domain babuska_collocation ; nobile-sg-mc-2015 ; NobileTemponeWebster08 . In this work, we use the Halton rule to generate quasi-Monte Carlo quadrature points caf1998 ; joe2008 . A numerical comparison of other qMC sequences has been described in radovic1996 .
The construction of a gPC -based surrogate model Matthies_encicl is similar to computing the Fourier expansion, where only the Fourier coefficients need to be computed. Some well-known functions, such as multivariate Legendre, Hermite, Chebyshev and Laguerre functions, are taken as a basis xiuKarniadakis02a . These functions should possess some useful properties, e.g., orthogonality. Usually a small number of quadrature points are used to compute these coefficients. Once the surrogate is constructed, its sampling could be performed at a lower cost than a sampling of the original stochastic PDE.
To tackle the high numerical complexity, we implement a two-level parallelization. In the first level, we run all available scenarios in parallel, with each scenario on a separate computing node. On the second level, each scenario is computed in parallel on cores.
This work is structured as follows: In Section 2, we outline the model of the density-driven groundwater flow in porous media and the numerical methods for this type of problem. We describe the stochastic modeling, integration methods and the generalized polynomial chaos expansion technique in Section 3. We present details of the parallelization of the computations in Section 4. Our multiple numerical results, described in Section 5, demonstrate the practical performance of the parallelized solver for the Elder-type problems with uncertainty coefficients in 2D computational domains. We conclude this work with a discussion in Section 6.
Our contribution. We applied an established gPC technique to estimate uncertainties in a density-driven flow. To this end, we solved a time-dependent, nonlinear, second order differential equation with uncertain coefficients. We estimated the propagation of input uncertainties associated with the porosity and permeability into the QoI, which could be a) the mean and the variance of the mass fraction in a sub-domain at each time step, b) the exceedance probability or quantiles in selected points, or c) probability density function in a point. We propose that these statistics could be further used for more efficient Bayesian inference, data assimilation, optimal design of the experiment and optimal control. Our overall goal is to model the dispersion of water pollution and monitor the movement of subsurface water flow. In addition, we used the novel technique of parallelization in spatial and stochastic spaces. We solved each deterministic problem and all stochastic realizations in parallel.
Although we obtained promissing preliminary results using the gPC technique, many questions remain. For example, a more advanced sparse grid rule may be needed to approximate the gPC coefficients of the solution.
2 Problem setup
2.1 Density-driven groundwater flow problem
In this section, we summarize the model of the density-driven groundwater flow. For further details, we refer the reader to Bear-2 ; Bear1979 ; Bear-Bachmat ; DierschKolditz2002 ; POST17_Density-driven .
We consider a domain , , filled with two phases: a solid phase and a liquid phase. The solid phase is an immobile solid porous matrix with a mobile liquid in its pores. The matrix is characterized by its porosity and its permeability . The liquid phase is a solution of salt (sodium chloride) in water. We denote the mass fraction of the brine (a saturated saltwater solution) in the liquid phase by . The liquid phase has density and viscosity .
The motion of the liquid phase through the solid matrix is characterized by the Darcy velocity . In these terms, the conservation laws for the liquid phase and the dissolved salt can be written as
[TABLE]
where the tensor field represents the molecular diffusion and the mechanical dispersion of the salt. In addition, some momentum equation must be considered for . We assume the Darcy’s law for :
[TABLE]
where denotes the hydrostatic pressure and is the gravity vector.
For simplicity, we assume an isotropic porous medium characterized by a scalar permeability
[TABLE]
We use the linear dependence for the density:
[TABLE]
where and denote the densities of pure water and the brine, respectively. Thus, , where corresponds to the pure water and to the saturated solution. Furthermore, we neglect the dispersion and assume that
[TABLE]
where is the coefficient of the molecular diffusion. The viscosity is considered to be constant. Fields and will depend on the stochastic variables. Values of the other parameters used in this work are presented in Table 1.
The problem (1–6) must be closed by the specification for the boundary conditions for equations (1) and (2), i.e. for and , as well as the initial conditions for . In our tests, we consider variations of the Elder problem Voss_Souza ; Elder_Overview17 : a heavily concentrated solution intrudes due to gravitation through the upper boundary into an aquifer filled initially with pure water. During this process, a complicated flow arises leading to a specific distribution of the intruding solution, in a process usually described as “fingering” Johannsen2003 . The time evolution of the solution in (1–3) is determined by the initial conditions for . However, the system (1–3) is nonlinear, thus the model can have several stationary states Johannsen2003 . In addition, for certain parameter settings, the flow could be very sensitive to small variations of the initial data. Small fluctuations may lead to a principally different asymptotic distribution of the mass fraction. Spatial variations of the porosity and permeability fields make the situation more complicated. The same initial and boundary data may correspond to principally different asymptotic behaviors of the solution for particular distributions of the parameters of the porous medium. The influence of this factor can be estimated by the application of the stochastic models.
In this work we consider a 2D reservoir geometry (cf. Fig. 1). We set the Dirichlet boundary conditions for on the top and bottom boundaries according to this scheme. On the left and right sides, we impose the no-flux boundary conditions for equation (2). For equation (1), we impose the no-flux boundary conditions on the whole . However, to fix , we set it to [math] at the two upper corners of the domain (denoted by the small circles on the scheme).
2.2 Modeling of porosity, permeability and mass fraction
We model unknown input parameters - the porosity () and permeability () of the solid phase by random fields. The mass fraction is the unknown output QoI, and is a function of and . We introduce a randomness in and assume to be isotropic and dependent on :
[TABLE]
The distribution of , , is determined by a set of stochastic parameters . Each component is a random variable. The dependence (7) is specific for every material and there is no a general law. In our experiments, we use a Kozeny-Carman-like dependence Costa_2006 :
[TABLE]
where the scaling factor is chosen to satisfy the equality . Further parameters of the standard Elder problem are listed in Table 1).
Quantities of interest. The random variability in the porosity may result in very different mass fraction distributions. The number of fingers, the velocity and time of mass fraction propagation in the media and the form and intensity may vary. We compute the mean value and the variance of the mass fraction. The regions with high variance indicate regions with a high variability/uncertainty. Such regions may require additional attention from specialists (a finer mesh or additional observations). For pre-selected points in , we evaluate the probability density function and exceedance probabilities. Such probabilities may help to monitor the quality of drinking water and to estimate the risks of contamination.
2.3 Numerical solution of the flow model
For the estimation of the uncertainty, we computed realizations of solutions (1–2) for sets of the stochastic parameters . We used plugin d3f of the simulation framework ug4 , ug4_ref1_2013 ; ug4_ref2_2013 . This framework presents a flexible tool for the numerical solution of non-stationary and nonlinear systems of PDEs, and is parallelized and highly scalable. For the solution, Equations 1–2 are discretized by a vertex-centered finite-volume method on unstructured grids in the geometric space, as presented in Frolkovic-MaxPrinciple . In particular, the consistent velocity approach is used for the approximation of the Darcy velocity (3), Frolkovic-ConsVel ; Frolkovic-Knaber-ConsVel . The implicit Euler scheme is used for the time discretization. The implicit time-stepping scheme is chosen for its unconditional stability, as the velocity depends on the unknowns of the system. This is especially important for variable coefficients as it is difficult to predict the range of the variation of the velocity in the realizations.
In this discretization, we obtain a large sparse system of nonlinear algebraic equations in every time step, and we solve it using the Newton method with a line search method. The sparse linear system, which appears in the nonlinear iterations, is solved by the BiCGStab method (Templates ) with the multigrid preconditioning (V-cycle, Hackbusch85 ), which proved to be very efficient in this case. In the multigrid cycle, we use the ILUβ-smoothers Hackbusch_Iter_Sol .
3 Stochastic Modeling and Methods
In CREMER15_Fingers , the authors compare six perturbation methods under saturated and unsaturated flow conditions to generate fingers in laboratory sand tank experiments. They showed that it is possible to reproduce dense plume fingering for saturated flows. Additionally, the authors studied various perturbation mechanisms to induce instabilities in the numerical solution of variable-density systems. In this section, we review well-known basic theories and generalized polynomial chaos surrogates.
3.1 Sampling Methods
Let be a vector of independent random variables. We begin by defining a probability space , where denotes a sample space, a -algebra on , and a probability measure on . As stated in wiener38 , any RV with finite variance can be represented in terms of the polynomial chaos expansion, e.g. as a multi-variate Hermite polynomial of Gaussian RVs. Alternatively to Hermite polynomials, other orthogonal polynomials (Legendre, Chebyshev, Laguerre) Askey85 ; xiuKarniadakis02a ; Xiu , or splines, for instance, can be used. We assume is a basis of . The cardinality of the multi-index set is infinite, therefore, we truncate it and obtain a finite set and a subspace . The solution lies in the tensor product space , where is a time interval, where the initial problem is solved. After a discretization, we obtain , where is the number of basis functions in the physical space, dimension of the stochastic space, and dimension of the temporal space.
The mean value of can be computed as an integral over the multidimensional domain :
[TABLE]
where is a probability measure. This integral can be computed numerically with some sparse or dense quadrature rules with points and weights :
[TABLE]
where is the approximate solution produced by the deterministic solver for the realization . The formula for the sampled variance is:
[TABLE]
Other well-known methods to compute Equations 10 and 11 are quasi-Monte Carlo, Monte Carlo and multi-level Monte Carlo methods.
3.2 Generalized Polynomial Chaos based surrogate model
As outlined in Dongbin ; xiuKarniadakis02a ; wiener38 , the random variable (field) can be represented as a series of multivariate Legendre polynomials in uncorrelated and independent uniformly distributed random variables :
[TABLE]
where is a multivariate Legendre basis, is a multiindex and is a multiindex set.
This expansion is called the generalized polynomial chaos expansion (gPC ) (see more in the Appendix or in wiener38 ; xiuKarniadakis02a ; Xiu ; ernst_mugler_starkloff_ullmann_2012 ). For numerical purposes, we truncate this infinite gPC series and obtain a finite series approximation, i.e., we keep only random variables, , and limit the maximal order of the multi-variate polynomial by . We denote the new multiindex subset as . Then the unknown random field can be approximated as follows:
[TABLE]
Here are multi-variate Legendre polynomials defined as follows
[TABLE]
are Legendre monomials, defined in Eq. A.1. The scalar product is
[TABLE]
where , , are the normalization constants and is the -dimensional Kronecker delta function.
Decomposition (12) can be understood as a response surface for . As soon as the response surface is built, the value can be evaluated for any almost for free (only by evaluating the polynomial (12)). It can be very practical if samples of are needed, for example. Since Legendre polynomials are -orthogonal, the coefficients can be computed by projection:
[TABLE]
where are weights and quadrature points, defined, for instance, by a Gauss-Legendre integration rule. Once all gPC coefficients are computed we substitute them into Eq. 12. For estimates of truncation and aliasing errors, see ConradMarzouk13 . There are different strategies to truncate the multi-index set to : a) or b) or c) , . Here is the multi-variate polynomial degree. In the multi-index set we keep only RVs, i.e., and is the monomial degree w.r.t. the variable .
Remark 1
If only the variance and the mean values are of interest, then we recommend to use the collocation method with some sparse or full-tensor grid. If some additional detail is required, such as sensitivity analysis, parameter identification, or computing cdf or pdf, then we recommend to compute the gPC expansion.
Quadrature grids for computing multi-dimensional integrals can be constructed from products of 1D integration rules. To keep computational costs to a minimum, a sparse grid technique can be used for certain class of smooth functions Griebel_Bungartz . To compute Eq. 15, we apply the well-known Clenshaw Curtis quadrature rules CC60 .
Since , it becomes apparent that the mean value is the first gPC coefficient
[TABLE]
The variance is the sum of squared gPC coefficients xiuKarniadakis02a ; GhanemBook17 ; Knio10 :
[TABLE]
where is a Kronecker product.
3.3 Computing probability density functions
After we calculated the gPC surrogate in Eq. 13, we can compute the probability density functions (see Fig. 7), exceedance probabilities, and quantilies (see Tab. 2) in the selected points.
One should evaluate the multi-variate polynomial in a sufficiently large number of points . This evaluations are of low cost. Later, in Sect. 5.2 we compute pdf in the given point.
Approximation of the probability density function in a point is computed by sampling the multivariate polynomial on the right-hand side in Eq. 19.
[TABLE]
The random variable can be sampled, e.g., times (no additional extensive simulations are required). The obtained sample can be used to evaluate the required statistics, and the probability density function. The exceedance probability for some critical value can be estimated as follows:
[TABLE]
Having large enough sample set it is also possible estimate quantiles (see Sect. 5.3). We remind that for a random variable and for a fixed value , the -quantile is the number such that .
3.4 Truncation and approximation errors
Analysis of the truncation and approximation errors of gPC has been studied in, e.g. CONSTANTINE12 ; Sinsbeck_2015 ; ConradMarzouk13 . From Eq. 13 it follows that the truncation error is
[TABLE]
Additionally, the coefficients are also approximated by . This gives the approximation error
[TABLE]
Summarizing the effect of both errors, we obtain
[TABLE]
Remark 2
Spatial resolution plays an important role in the simulations. Thus, the distribution of the parameters of the porous medium should be represented correctly. Furthermore, smoothing the flow on a coarse grid due to numerical diffusion can lead to a loss of some phenomena and therefore reduce the accuracy of the uncertainty quantification. Thus we must cover with sufficiently fine grids and use small time steps. To achieve reasonable computation times, this also assumes parallelization of the evaluation of every realization.
4 Parallelization
The computation of the stochastic properties can, therefore, be divided into two phases: computation of the realizations of the flow model at the points of the stochastic grid, and the numerical quadrature over the stochastic domain. For the computations presented in this work, we parallelized both phases. We performed computations on a Cray XC40 parallel cluster Shaheen II provided by the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. The cluster has 6174 nodes with 2 Intel Haswell 2.3 GHz CPUs per node, and 16 processor cores per CPU. Every node has 128 GB RAM. The nodes communicate via the Cray Aries interconnect network with Dragonfly topology. The system is equipped with the Sonexion Lustre 2000 storage appliance.
In the computations, we computed all the realizations concurrently, and the results for the essential time steps were stored in the parallel file system.
Parallelization of the framework ug4 is based on the distribution of the spatial grid between the processes and is implemented using MPI ug4_ref1_2013 . As the computation of the realizations involves the nonlinear iterations and the solution of the large sparse linear systems (see Section 2.3), quite intensive communication is required. We performed the computation of every realization on the 32 cores of one node. We computed different realizations on different nodes. Shaheen II used the SLURM system to manage parallel jobs, and the job arrays were used for the concurrent computations of the realizations. This was the most time-consuming phase, requiring hours of computation time.
Following computation of the realization, we computed the quadrature on 32 cores of one node. The saved data were from the file system. As the quadrature rules were local in the geometric space, they do not require much communication. The computation time for this quadrature required minutes, i.e., much shorter than the computation of the realizations desribed above.
5 Numerical Experiments
We performed several numerical experiments with random variables. The coefficients of gPC were computed with full and sparse Gauss-Legendre/Clenshaw-Curtis tensor grids CONSTANTINE12 . The maximal polynomial order lay between 2 and 5, depending on the porosity function.
5.1 Experiment 1. Elder’s problem with 3 RVs
In this experiment, we solve the problem from Section 2 (cf. Fig. 1) with the following smooth porosity field
[TABLE]
parametrized with 3 stochastic parameters . One realization of is shown in Fig. 2.
Dependence of the solution on the spatial and temporal grid.
The considered Elder’s problem is known to be sensitive to the spatial and temporal grid resolution. Choice of a too coarse grid may produce an essentially different numerical solution. For that, before we start to compare results of gPC and qMC , we should check if the chosen spatial mesh and the time step are sufficiently fine. Referring to the porosity in Eq. 24, we take , and compute the solution at time years on two different meshes: the first with quadrilaterals, and the second with quadrilaterals, . The second grid is obtained from the first by two regular refinements. The results are presented in Fig. 3. Both pictures looks very similar, i.e. the solution , computed on coarse and fine meshes are almost equal. Although this fact is verified only for one realization, we expect the same result for the other sufficiently smooth porosities in this domain. Thus, the chosen spacial and temporal year resolutions are assumed to be sufficient for the following experiments with uncertain porosity.
Comparison of gPC and qMC
Now, we compare solutions computed by quasi Monte Carlo and gPC methods. The number of the qMC simulations is 600. The gPC polynomial order is . The 69 Clenshaw-Curtis quadrature points were used to compute gPC coefficients. Every simulation have been computed on a structured grid of quadrilaterals, there were degrees of freedom per realization. The time step was year.
Every realization was computed in parallel on a separate node. Thus, to compute 600 qMC simulations we used 600 parallel nodes. Each node consists of 32 parallel cores. The computing time of one realization is 1-2 minutes. The total computing time of all realizations is also about 1-2 minutes (due to parallelism).
In Fig. 4(a)–(b) the mean values computed by qMC and gPC , respectively, at time years (1000 time steps for every realization) are shown. Both mean values are very similar. The variances are also computed by qMC and gPC and shown in Fig. 4(c)-(d). Both variances are very similar too: , . However, there are differences in the details, which can be explained by the approximation and integration errors.
5.2 Experiment 2. Elder’s problem with 5 RVs and three layers
The further example has more realistic settings. We consider the problem from Section 2 (cf. Fig. 1) with three hydrogeological layers. The porosity field depends on 5 random variables:
[TABLE]
Here corresponds to the bottom and to the top. The porosity has jumps between the layers. One of its realizations is shown in Fig. 6(e), where takes values in the interval .
In this experiment, every realization has been computed on the regular grid of quadrilaterals with degrees of freedom. The time step is year. Every realization was computed in parallel on a separate node with 32 parallel cores. The computing time on each node is hours.
Dependence of the solution on the spatial and temporal grid
Similar to the previous experiment, before we start to compare results of gPC and qMC , we check if the chosen spatial mesh and the time step are sufficiently fine. We take , and compute the solution at time years on two different meshes: the first with quadrilaterals, and the second with quadrilaterals, . The second grid is obtained from the first by one regular refinement. The results are presented in Fig. 5. Both pictures looks very similar, i.e. the solution , computed on coarse and fine meshes are almost equal. Thus, this experiment let us assume that quadrilaterals with the time step year are sufficient.
Comparison of gPC and qMC
We compare the mean values of computed by the gPC (Fig. 6(a)) and by the qMC method (Fig. 6(b)) at years (after 1100 time steps). We used 241 Clenshaw-Curtis quadrature points to compute gPC coefficients. Both mean values are very similar, and both values and variate in . Figures 6(c)-(d) show the variances and respectively. The difference between the deterministic solution (corresponding to ) and the mean value, computed for years, is presented in Fig. 6(f). This difference varies in the interval and indicates the error which we find when we replace stochastic by . This large difference provides further indicator that it is necessary to estimate the propagation of uncertainties.
Computing probability density functions
Approximation of the probability density function of in a fixed point is computed by sampling the multivariate polynomial on the right-hand side in Eq. 19. The random variable can be sampled, e.g., times (no additional extensive simulations are required). The obtained samples can be used to evaluate the required statistics, and the probability density function.
Figure 7(a) shows two pdfs computed with gPC of orders and , respectively, in point and after time iterations (one iteration is 0.005 year). This figure shows that gPC of order produces a pdf with a smaller variance. The mean values are equal. Figure 7(b) shows six pdfs, computed with gPC of orders in point and after time steps or years. We can observe how the pdf function in that point is changing in time and that the variance is growing with time. The mean values are equal.
5.3 Computing quantiles
We compute quantiles at three different points, e.g., , , . Table 2 presents the mean and the standard deviation for these three points after , and years. We can see that the variance is increasing with increasing time steps (from bottom to top).
Below we calculate the quantiles for the cumulative probabilities . For instance, the number in the first row and in the last column (Table 2) means that , i.e. the probability that the mass fraction is smaller than is almost 1.
6 Conclusion
We quantified the propagation of uncertainties in the density-driven groundwater flow problem with unknown porosity and permeability. Namely, we estimated the mean value, variance, exceedance probabilities and probability density function of the mass fraction. The considered problem is time-dependent and non-linear. The solution process requires fine resolution in time and space. Random perturbations in the porosity may change the stochastic solution, i.e., the number of fingers, their intensity, shape, propagation time and velocity.
We applied the already known generalized polynomial chaos expansion (gPC ) method to the Elder’s problem in a 2D rectangular domain, where the porosity was modeled as a random field with 3-5 random variables. We demonstrate that there is a significant difference between the deterministic and stochastic problem setups and solutions. The results, obtained by the gPC method were compared with the results, obtained by the qMC method (Halton sequence).
We demonstrate that by using the gPC method, we can significantly reduce the computational cost of the classical (quasi) Monte Carlo method. Both methods for a moderate variance of the porosity give similar values for the mean and the variance. We have also demonstrated how to use the parallel multigrid solver ug4 as a black-box solver in the stochastic framework.
The practical novelty of this work in the implementation of the gPC method on a distributed memory system, where all solutions and gPC coefficients are distributed over multiple parallel nodes. The maximal number of parallel processing units was , where 600 is the number of parallel nodes, and 32 is the number of computing cores on each node. We see great potential in using the ug4 library as a black box solver for quantification of uncertainties. Our results are reproducible, and the code is available online on the GitHub repository111https://github.com/UG4/ughub.wiki.git.
The obtained results can be helpful to further understanding of the dispersion of seawater intrusions into coastal aquifers, radioactive waste disposal or contaminant plumes.
In this work, we did not aim to determine the optimal parameters of the gPC method. We also did not aim to compare different UQ methods. This point will be addressed in our future publications. As a next step, we plan to couple the ug4 multi-grid solver with the Multi-Level Monte Carlo method. We intend to compute the majority of samples on low-cost coarse grids and just a few more on an expensive fine grid.
Acknowledgements.
This work was supported by the King Abdullah University of Science and Technology (KAUST) and by the Alexander von Humboldt Foundation. We used the resources of the Supercomputing Laboratory at KAUST, under the development project k1051. We would like to thank the KAUST core lab for the assistance with Shaheen II supercomputer. We would also like to acknowledge developers of the ug4 simulation framework from Frankfurt University.
Appendix A Legendre polynomials
For unknown parameters which have uniform distribution, we employ multivariate Legendre polynomials as the basis for our gPC . Multivariate Legendre polynomials are defined on , where is the dimension of the stochastic space. The first few Legendre polynomials are
[TABLE]
The Legendre polynomials are orthogonal with respect to the norm on the interval . The recursive formula is
[TABLE]
The scalar product is
[TABLE]
where the density function is a constant (is equal 1/2 on ), and where denotes the Kronecker delta. The norm and the weighted norm is . The gPC coefficients are
[TABLE]
And for we have
[TABLE]
Appendix B Difficulties in computing statistics
We consider the following uncertain porosity (input parameter)
[TABLE]
Figure B.1 shows four different realizations of the solution (the mass fraction). These realisations correspond to 1st, 7th, 12th and 16th quadrature points. The number of fingers, their shapes, location and propagation are different. This fact makes it difficult to compute and to interpret the mean value, the variance and other statistics.
The porosity fields, corresponding to Fig. B.1(a)-(b) are shown in Fig. B.2(a)-(b).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) R. Askey, . Wilson, James Arthur, and A. M. Society. Some basic hypergeometric orthogonal polynomials that generalize Jacobi polynomials . Providence, R.I., U.S.A. : American Mathematical Society, 1985. March 1985, volume 54, number 319 (first of 6 numbers).
- 2(2) I. Babuška, F. Nobile, and R. Tempone. A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal. , 45(3):1005–1034 (electronic), 2007.
- 3(3) I. Babuška, R. Tempone, and G. Zouraris. Galerkin finite element approximations of stochastic elliptic partial differential equations. SIAM Journal on Numerical Analysis , 42(2):800–825, 2004.
- 4(4) V. Barthelmann, E. Novak, and K. Ritter. High dimensional polynomial interpolation on sparse grids. Advances in Computational Mathematics , 12(4):273–288, 2000.
- 5(5) J. Bear. Dynamics of fluid in Porous Media . Dover Publications, INC. New York, 1972.
- 6(6) J. Bear. Hydraulics of Groundwater . Dover Publications. Inc., Mineola, 1979.
- 7(7) J. Bear and Y. Bachmat. Introduction to Modeling of Transport Phenomena in Porous Media . Kluwer Academic Publishers, Dordrecht, Boston, London, 1990.
- 8(8) G. Blatman and B. Sudret. An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis. Probabilistic Engineering Mechanics , 25(2):183 – 197, 2010.
