Solution of the 3D density-driven groundwater flow problem with uncertain porosity and permeability
Alexander Litvinenko, Dmitry Logashenko, Raul Tempone, Gabriel Wittum, and David Keyes

TL;DR
This paper models 3D density-driven groundwater flow with uncertain properties, using advanced parallelized simulations and stochastic methods to analyze contaminant transport and build a benchmark for future studies.
Contribution
It introduces a parallelized simulation toolbox for 3D density-driven flow with uncertainty, employing stochastic expansions and benchmark solutions.
Findings
Successful simulation of 3D flow with uncertainty using exttt{myug} toolbox.
Quantitative analysis of mean, variance, and exceedance probabilities.
Benchmark solution for Elder-like problem in 3D domain.
Abstract
As groundwater is an essential nutrition and irrigation resource, its pollution may lead to catastrophic consequences. Therefore, accurate modeling of the pollution of the soil and groundwater aquifer is highly important. As a model, we consider a density-driven groundwater flow problem with uncertain porosity and permeability. This problem may arise in geothermal reservoir simulation, natural saline-disposal basins, modeling of contaminant plumes, and subsurface flow. This strongly nonlinear time-dependent problem describes the convection of the two-phase flow. This liquid streams under the gravity force, building so-called "fingers". The accurate numerical solution requires fine spatial resolution with an unstructured mesh and, therefore, high computational resources. Consequently, we run the parallelized simulation toolbox \myug with the geometric multigrid solver on Shaheen II…
| Parameters | Values and Units | Description |
|---|---|---|
| 0.1 [-] | mean value of the porosity | |
| [] | molecular diffusion | |
| [] | mean value of the permeability | |
| [] | gravity | |
| [] | density of pure water | |
| [] | density of brine | |
| [] | viscosity |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: RWTH Aachen, Kackertstr. 9C, Aachen, Germany, Phone: +492418099203, 11email: litvinenko, [email protected]
KAUST, Thuwal/Jeddah, Saudi Arabia, 11email: dmitry.logashenko, gabriel.wittum, [email protected]
G-CSC, Frankfurt University, Kettenhofweg 139, Frankfurt, Germany
Solution of the 3D density-driven groundwater flow problem with uncertain porosity and permeability
Alexander Litvinenko 11
Dmitry Logashenko 22
Raul Tempone 1122
Gabriel Wittum 2233
David Keyes 22112233
Abstract
As groundwater is an essential nutrition and irrigation resource, its pollution may lead to catastrophic consequences. Therefore, accurate modeling of the pollution of the soil and groundwater aquifer is highly important. As a model, we consider a density-driven groundwater flow problem with uncertain porosity and permeability. This problem may arise in geothermal reservoir simulation, natural saline-disposal basins, modeling of contaminant plumes, and subsurface flow. This strongly nonlinear time-dependent problem describes the convection of the two-phase flow. This liquid streams under the gravity force, building so-called “fingers”. The accurate numerical solution requires fine spatial resolution with an unstructured mesh and, therefore, high computational resources. Consequently, we run the parallelized simulation toolbox ug4 with the geometric multigrid solver on Shaheen II supercomputer. The parallelization is done in physical and stochastic spaces. Additionally, we demonstrate how the ug4 toolbox can be run in a black-box fashion for testing different scenarios in the density-driven flow. As a benchmark, we solve the Elder-like problem in a 3D domain. For approximations in the stochastic space, we use the generalized polynomial chaos expansion. We compute the mean, variance, and exceedance probabilities of the mass fraction. As a reference solution, we use the solution, obtained from the quasi-Monte Carlo method.
Keywords: uncertainty quantification, ug4, multigrid, density driven flow, reservoir, groundwater, salt formations
1 Motivation
This work is a 3D extension of our previous 2D work LitvElder2D . The problem setup is the same; the difference is only in the aquifer. In this work, we consider two 3D aquifers. These new 3D geometries contain a much larger number of degrees of freedom (DoFs), and, therefore, numerical experiments require significantly more computational resources.
Accurate prediction of the contamination of the groundwater is highly essential. Certain pollutants are soluble in water and can leak into groundwater systems, such as seawater into coastal aquifers or wastewater leaks. Indeed, some pollutants can change the density of a fluid and induce density-driven flows within the aquifer. This causes faster propagation of the contamination due to convection. Thus, simulation and analysis of this density-driven flow play an important role in predicting how pollution can migrate through an aquifer Kobus2000 ; Density-driven_Fan97 .
In contrast to the transport of pollution by molecular diffusion, convection due to density-driven flow is an unstable process that can undergo quite complicated patterns of distribution. The Elder problem is a simplified but comprehensive model that describes the intrusion of salt water from a top boundary into an aquifer Elder_1967a ; Elder_1967b ; Voss_Souza ; Elder_Overview17 . The evolution of the concentration profile is typically referred to as fingering. Note that due to the nonlinear nature of the model, the distribution of the contamination strongly depends on the model parameters, so that the system may have several stationary solutions Johannsen2003 .
Hydrogeological formations typically have a complicated and heterogeneous structure, and geological media may consist of layers of porous media with different porosities and permeability coefficients ScheiderKroehnPueschel2012 ; ReiterLogashenkoVogelWittum2017 . Difficulty in specifying hydrogeological parameters of the media, as well as measuring the position and configuration of the layers, gives rise to certain errors. Typically, the averaged values of these quantities are used. However, due to the nonlinearities of the flow model, the averaging of the parameters does not necessarily lead to the correct mathematical solution. Uncertainties arise from various factors, such as inaccurate measurements of the different parameters and the inability to measure parameters at each spatial location at any given time. To model the uncertainties, we can use random variables, random fields, and random processes. However, uncertainties in the input data spread through the model and make the solution uncertain too. Thus, an accurate estimation of the output uncertainties is crucial.
Many techniques are used to investigate the propagation of uncertainties associated with porosity and permeability into the solution, such as classical Monte Carlo sampling. Although Monte Carlo sampling is dimension-independent, it converges slowly and therefore requires a lot of samples, and these simulations may be time-consuming and costly. Alternative, less costly techniques use collocations, sparse grids, and surrogate methods, each with their advantages and disadvantages. But even advanced techniques may require hundreds to thousands of simulations and assume a certain smoothness in the quantity of interest. Our simulations contain up to 4.5MI spatial mesh points and 1000-3000 time-steps and are therefore run on a massive parallel cluster. Here, we develop methods that require much fewer simulations, from a few dozen to a few hundred.
Perturbation methods, which decompose the quantity of interest (QoI) with respect to random parameters in a Taylor series, were considered and compared in CREMER15_Fingers .
Here, we use the well-established generalized polynomial chaos expansion (gPC ) technique xiuKarniadakis02a , where we compute the gPC coefficients by applying a Clenshaw-Curtis quadrature rule Barthelmann2000 . We use gPC to compute QoIs such as the mean, the variance, and the exceedance probabilities of the mass fraction (in this case, a saltwater solution). We validate our obtained results using the quasi-Monte Carlo approach. Both methods require the computation of multiple simulations (scenarios) for variable porosity and permeability coefficients.
To the best of our knowledge, no other reported works have solved the Elder’s problem Voss_Souza ; Elder_Overview17 with uncertain porosity and permeability parameters using gPC .
Fingers in a highly unstable free convective flow have been studied in Xie12_Fingers . Overviews of the uncertainties in modeling groundwater solute transport OverviewUncert93 and modeling soil processes SoilOverview16 have been performed, as well as reconnecting stochastic methods with hydrogeological applications NowakStochMethods18 , which included recommendations for optimization and risk assessment. Fundamentals of stochastic hydrogeology, an overview of stochastic tools and accounting for uncertainty have been described in rubin2003applHydro .
Recent advances in uncertainty quantification, probabilistic risk assessment, and decision-making under uncertainty in hydrogeologic applications have been reviewed in TARTAKOVSKYI_Risk13 , where the author reviewed probabilistic risk assessment methods in hydrogeology under parametric, geologic, and model uncertainties. Density-driven vertical transport of saltwater through the freshwater lens has been modeled in POST17_Density-driven .
Various methods can be used to compute the desired statistics, such as direct integration methods (Monte Carlo, quasi Monte Carlo, collocation) and surrogate-based methods (generalized polynomial chaos approximation, stochastic Galerkin) Philipp12 ; matthiesKeese03cmame ; babuska2004galerkin . Direct integration methods compute statistics by sampling unknown input coefficients (they are undefined and therefore uncertain) and solving the corresponding PDE, while the surrogate-based method computes a low-cost functional (polynomial, exponential, trigonometrical) approximation of QoI. Examples of the surrogate-based methods include radial basis functions liu2014 ; bompard2010 ; Loeven2007 ; giunta2004 , sparse polynomials chkifa-adapt-stochfem-2015 ; Sudret_sparsePCE , polynomial chaos expansion Xiu ; Habib09_PCE ; ConradMarzouk13 ; Dongbin and low-rank tensor approximation DolgLitv15 ; Kim2006 ; dolgov2014computation ; LitvSampling13 ; Litvin11PAMM . A nonintrusive stochastic Galerkin was introduced in Giraldi2014 . The surrogate-based methods have an additional advantage if the gradients are available liu2017quantification . Sparse grids methods to integrate high-dimensional integrals are considered in smoljak63 ; Griebel_Bungartz ; Griebel ; spiterp ; novakRitter97 ; Novak1996 ; gerstnerGriebel98-numint ; novakRitter99-simple ; ConradMarzouk13 ; CONSTANTINE12 . The sparse grid software package is available in petrasSmolpak .
The quantification of uncertainties in stochastic PDEs can pose a great challenge due to the possibly large number of random variables involved, and the high cost of each deterministic solution.
For problems with a large number of random variables, the methods based on a regular grid (in stochastic space) are less preferable due to the high computational cost. Instead, methods based on a scattered sampling scheme (MC, qMC) give more freedom in choosing the sample number and protect against sample failures. The MC quadrature and its variance-reduced variants have dimension-independent convergence , and qMC has the worst case convergence , where is the dimension of the stochastic space matthies2007 . Collocations on sparse or full grids are affected by the dimension of the integration domain babuska_collocation ; nobile-sg-mc-2015 ; NobileTemponeWebster08 . In this work, we use the Halton rule to generate quasi-Monte Carlo quadrature points caf1998 ; joe2008 . A numerical comparison of other qMC sequences has been described in radovic1996 .
The construction of a gPC -based surrogate model Matthies_encicl is similar to computing the Fourier expansion, where only the Fourier coefficients need to be computed. Some well-known functions, such as multivariate Legendre, Hermite, Chebyshev, and Laguerre functions, are taken as a basis xiuKarniadakis02a . These functions should possess some useful properties, e.g., orthogonality. Usually, a small number of quadrature points are used to compute these coefficients. Once the surrogate is constructed, its sampling could be performed at a lower cost than a sampling of the original stochastic PDE.
To tackle the high numerical complexity, we implement a two-level parallelization. First, we run all available quadrature or sampling points (also say scenarios) in parallel, with each scenario on a separate computing node. Second, each scenario is computed in parallel on cores.
This work is structured as follows: In Section 2, we outline the model of the density-driven groundwater flow in porous media and the numerical methods for this type of problem. We describe the stochastic modeling, integration methods, and the generalized polynomial chaos expansion technique in Section 3. We present details of the parallelization of the computations in Section 4. Our multiple numerical results, described in Section 5, demonstrate the practical performance of the parallelized solver for the Elder-type problems with uncertainty coefficients in 3D computational domains. We conclude this work with a discussion in Section 6.
1.1 Our contribution
We applied the gPC expansion to approximate the solution of the density driven flow. This is a time-dependent, nonlinear, second order differential equation. We estimated propagation of input uncertainties in the porosity and permeability into the mass fraction.
2 Problem setup
2.1 Density-driven groundwater flow
The groundwater in the porous medium of the aquifers is subjected to gravitation. For this, any spatial variation of its density induces a flow. In this work, we regard the dependence of the density on the mass fraction of the dissolved salt. Traditional approaches for modelling of this system are described in Bear-2 ; Bear1979 ; Bear-Bachmat ; DierschKolditz2002 ; POST17_Density-driven . In this section, we briefly review the model to introduce the notation.
We consider a domain , consisting of two phases: the immobile solid porous matrix and a mobile liquid in its pores. We denote the porosity of the matrix by and its permeability and the permeability by . The liquid is a solution of in water. The mass fraction of the brine (a saturated saltwater solution) in the liquid phase is . The density liquid phase is and its viscosity .
The motion of the liquid phase through the solid matrix is characterized by the Darcy velocity . Mass conservation of the liquid phase and the salt can be written as
[TABLE]
where the tensor field represents the molecular diffusion and the mechanical dispersion of the salt. For the velocity , Darcy’s law is used:
[TABLE]
Here, is the hydrostatic pressure and the gravity vector.
In this work, we consider a special case of this general model. We assume that the porous medium is isotropic and characterized by the scalar permeability
[TABLE]
For the density, we set
[TABLE]
where and are the densities of pure water and the brine, respectively. Note that , where corresponds to pure water and to the saturated solution. The viscosity is assumed to be constant. Finally, the dispersion is neglected:
[TABLE]
where is the molecular diffusion coefficient.
Fields and will depend on the stochastic variables. Values of the other parameters used in this work are presented in Table 1.
In our numerical experiments, variants of the Elder problem Voss_Souza ; Elder_Overview17 are considered. In it, concentrated solution intrudes the upper boundary into an aquifer filled with pure water. Due to the difference in the density, a complicated, unstable flow arises. This leads to a specific distribution of the mass fraction typically described as “fingering” Johannsen2003 . The time evolution of the mass fraction and the pressure in (1–3) is determined by the initial conditions for . Note that the system (1–3) is nonlinear, and the model may have several stationary states Johannsen2003 . Any changes of the porosity and permeability fields may have an essential effect on the solution: The same initial and boundary data may correspond to principally different asymptotic behaviors. The influence of this factor can be estimated by the application of the stochastic models.
We consider two spatial 3D domains, cf. Fig. 1(a): A rectangular parallelepiped (Fig. 1(a)(a)) and an elliptic cylinder (Fig. 1(a)(b)) with (the long side), (the short side) and (the depth). We impose the Dirichlet boundary condition on the red area on the top of (), and on the green area on the top and on the other boundaries. For equation (1), we impose the no-flux boundary conditions on the bottom and the vertical walls. The boundary conditions for differ for both geometries: For the parallelepiped, we impose the no-flux boundary conditions on all the boundaries. However, to fix , we set on the edges between the green and the blue faces. For the cylinder, we set on the whole top boundary.
For the spatial discretization of (1)–(3), a vertex-centered FV scheme is used. Degrees of freedom (DoFs) for and are located in the grid nodes of the primary mesh. The domain in Fig. 1(a) is covered by a grid of hexahedra, so that the total number of the DoFs is . For the domain in Fig. 1(b), we use an unstructured mesh of tetrahedra with DoFs.
Resolution of the spatial and temporal discretizations play an important role in the accuracy of the simulations. The distribution of the hydrogeological parameters of the porous medium should be represented correctly. Besides that, smoothing the flow on a coarse grid due to the numerical diffusion can lead to a loss of some phenomena and therefore reduce the accuracy of the uncertainty quantification. For this, the grid convergence of the simulations w.r.t. the spatial mesh has been tested. We ran our simulation on the twice regularly refined spatial mesh for geometry from Fig. 1(b). It consists of cells and has DoFs. The obtained solution was similar to the solution, computed on a coarser mesh mentioned above. Thus the further refinement does not change the behaviour of the solution essentially. To achieve reasonable computation times, parallelization of the evaluation of every realization is performed.
2.2 Stochastic modeling of porosity, permeability and mass fraction
Due to lack of knowledge or inaccurate measurements, the two input parameters porosity and permeability are unknown. One of the approaches to deal with this situation is to say that both input parameters are uncertain. For example, we can introduce a prior distribution for these two values. We also can, based on expert knowledge, assume that the mean and the variance values are known.
In the following we model the porosity () by a random field (\phi(\mathbf{x},\mbox{\boldmath\theta})). We also assume that the permeability () is a function of as shown in (7) and (8):
[TABLE]
Since parameters are uncertain the solution is uncertain too, and this solution is a function of and . Here we assume to be isotropic. Usually the distribution of , , is non-trivial or even very complicated. Therefore, it is comfortable from the numerical point of view, to represent this complicated random field in some easy to compute basis, e.g. with Gaussian random variables. Often, many Gaussian random variables are required. We introduce \mbox{\boldmath\theta}=(\theta_{1},...,\theta_{M},...), where are Gaussian random variables.
The dependence (7) is specific for every material and there is no a general law. In our experiments, we use a Kozeny-Carman-like dependence Costa_2006 :
[TABLE]
where the scaling factor is chosen to satisfy the equality . Further parameters of the standard Elder problem are listed in Table 1.
2.3 Solution of the stochastic flow model
We start with sampling random variables \{\mbox{\boldmath\theta}_{i}\} and computing the realisations of porosity \phi_{i}(\mathbf{x})=\phi(\mathbf{x},\mbox{\boldmath\theta}_{i}) and permeability {\mathbf{K}}_{i}(\mathbf{x})={\mathbf{K}}(\mathbf{x},\mbox{\boldmath\theta}_{i}), and then solving equations (1–2). To compute the solution we used plugin d3f of the simulation framework ug4 , ug4_ref1_2013 ; ug4_ref2_2013 . This framework presents a flexible tool for the numerical solution of non-stationary and nonlinear systems of PDEs, and is parallelized and highly scalable. Equations 1–2 are discretized by a vertex-centered finite-volume method on unstructured grids in the geometric space, as presented in Frolkovic-MaxPrinciple . In particular, the consistent velocity approach is used for the approximation of the Darcy velocity (3), Frolkovic-ConsVel ; Frolkovic-Knaber-ConsVel . The implicit Euler scheme is used for the time discretization. The implicit time-stepping scheme is chosen for its unconditional stability, as the velocity depends on the unknowns of the system. This is especially important for variable coefficients as it is difficult to predict the range of the variation of the velocity in the realizations.
In this discretization, we obtain a large sparse system of nonlinear algebraic equations in every time step, and we solve it using the Newton method with a line search method. The sparse linear system, which appears in the nonlinear iterations, is solved by the BiCGStab method Templates with the multigrid preconditioning (V-cycle, Hackbusch85 ), which proved to be very efficient in this case. In the multigrid cycle, we use the ILUβ-smoothers Hackbusch_Iter_Sol .
3 Stochastic Modeling and Methods
Let \mbox{\boldmath\theta}=(\theta_{1},\ldots,\theta_{M}) be a vector of independent random variables. Let be a probability space. Here denotes a sample space, a -algebra on , and a probability measure on . Random variable with finite variance can be represented in terms of the polynomial chaos expansion, e.g. as a multi-variate Hermite polynomial of Gaussian RVs, wiener38 . Alternatively to Hermite polynomials, other orthogonal polynomials (Legendre, Chebyshev, Laguerre) Askey85 ; xiuKarniadakis02a ; Xiu , or splines, for instance, can be used. We assume is a basis of . The cardinality of the multi-index set is infinite, therefore, we truncate it and obtain a finite set and a subspace . The solution c(t,\mathbf{x},\mbox{\boldmath\theta}) lies in the tensor product space , where is a time interval, where the initial problem is solved. After a discretization, we obtain , where is the number of basis functions in the physical space, dimension of the stochastic space, and dimension of the temporal space.
The empirical mean value and the variance of c(t,\mathbf{x},\mbox{\boldmath\theta}) can be computed as follow
[TABLE]
with some quadrature points \mbox{\boldmath\theta}_{i} and weights . Here c(t,\mathbf{x},\mbox{\boldmath\theta}_{i}) is the solution evaluated at fixed point \mbox{\boldmath\theta}=\mbox{\boldmath\theta}_{i}\in\mathbb{R}^{M}. The sampled variance is computed as follow
[TABLE]
Other well-known methods to compute the mean and the variance are quasi-Monte Carlo, Monte Carlo and multi-level Monte Carlo methods.
3.1 Generalized Polynomial Chaos Expansion
As outlined in Dongbin ; xiuKarniadakis02a ; wiener38 , the random variable (field) can be represented as a series of multivariate Legendre polynomials in uncorrelated and independent uniformly distributed random variables :
[TABLE]
where is a multivariate Legendre basis, is a multiindex and is a multiindex set.
This expansion is called the generalized polynomial chaos expansion (gPC ) (see more in the Appendix or in wiener38 ; xiuKarniadakis02a ; Xiu ; ernst_mugler_starkloff_ullmann_2012 ). For numerical purposes, we truncate this infinite gPC series and obtain a finite series approximation, i.e., we keep only random variables, , and limit the maximal order of the multi-variate polynomial by . We denote the new multiindex subset as . Then the unknown random field can be approximated as follows:
[TABLE]
Here \Psi_{\beta}(\mbox{\boldmath\theta}) are multi-variate Legendre polynomials defined as follows
[TABLE]
are Legendre monomials, defined in (29). The scalar product is
[TABLE]
where , , are the normalization constants and is the -dimensional Kronecker delta function.
Decomposition (11) can be understood as a response surface for . As soon as the response surface is built, the value \mathbf{c}(t,\mathbf{x},\mbox{\boldmath\theta}) can be evaluated for any almost for free (only by evaluating the polynomial (11)). It can be very practical if samples of \mathbf{c}(t,\mathbf{x},\mbox{\boldmath\theta}) are needed, for example. Since Legendre polynomials are -orthogonal, the coefficients can be computed by projection:
[TABLE]
where are weights and quadrature points, defined, for instance, by a Gauss-Legendre integration rule. Once all gPC coefficients are computed we substitute them into (11). For estimates of truncation and aliasing errors, see ConradMarzouk13 . There are different strategies to truncate the multi-index set to . In the multi-index set we keep only RVs, i.e., and is the monomial degree w.r.t. the variable . For the total multi-variate polynomial degree we pose one of the following condition a) or b) or c) , .
Remark 1
If only the variance and the mean values are of interest, we recommend to use the collocation method with some sparse or full-tensor grid. If some additional statistics are required, such as sensitivity analysis, parameter identification, or computing cdf or pdf, we recommend to compute the gPC expansion.
Quadrature grids for computing multi-dimensional integrals can be constructed from products of 1D integration rules. To keep computational costs to a minimum, a sparse grid technique can be used for certain class of smooth functions Griebel_Bungartz . To compute (14), we apply the well-known Clenshaw Curtis quadrature rules CC60 .
Since \Psi_{\textbf{0}}(\mbox{\boldmath\theta})=1, it becomes apparent that the mean value is the first gPC coefficient
[TABLE]
. The variance is the sum of squared gPC coefficients xiuKarniadakis02a ; GhanemBook17 ; Knio10 :
[TABLE]
where is the Kronecker product.
3.2 Computing probability density functions
After we calculated the gPC surrogate in (12), we can compute the probability density functions, exceedance probabilities, and quantilies in the selected points. For this, one should evaluate the multi-variate polynomial in a sufficiently large number of points
[TABLE]
The random variable can be sampled, e.g., times (no additional extensive simulations are required). The obtained sample can be used to evaluate the required statistics, and the probability density function. The exceedance probability for some threshold can be estimated as follows:
[TABLE]
Having a sufficiently large sample set makes it possible to estimate quantiles LitvElder2D .
3.3 Numerical errors
Applying gPC approximation, we introduce the truncation and approximation errors CONSTANTINE12 ; Sinsbeck_2015 ; ConradMarzouk13 ; ernst_mugler_starkloff_ullmann_2012 . Truncating gPC coefficients , we introduce the truncation error
[TABLE]
Additionally, we approximate the coefficients by . This introduces the approximation error
[TABLE]
The sum of both errors is
[TABLE]
Remark 2
Spatial resolution plays an important role in the simulations. Thus, the distribution of the parameters of the porous medium should be represented correctly. Furthermore, smoothing the flow on a coarse grid due to numerical diffusion can lead to a loss of some phenomena and therefore reduce the accuracy of the uncertainty quantification. Thus we must cover with sufficiently fine grids and use small time steps. To achieve reasonable computation times, this also assumes parallelization of the evaluation of every realization.
4 Parallelization
The computation of the mean value, variance, and other statistics are done in parallel. There are two levels of parallelization. First, we compute the solution in all quadrature or sampling points in parallel no communication is needed). Second, the solution in each single quadrature point is computed also in parallel (communication is needed).
We performed computations on a Cray XC40 parallel cluster Shaheen II provided by the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. Shaheen II has 6174 nodes with 2 Intel Haswell 2.3 GHz CPUs per node, and 16 processor cores per CPU. Every node has 128 GB RAM.
On the first level, we allocated, depending on the experiment, up to 600 computing nodes. It allowed us to compute (up to) 600 qMC simulations simultaneously. All these simulations are independent, and, therefore, no communication between nodes was required. The total computing time is equal to the time, required for computing one simulation. On the second level, on each node, we allocated 32 computing cores to resolve each single simulation. Thus, in total, we run our code on up to parallel cores. If a simulation is large or requires a lot of memory, we can allocate more than one node for it. ug4 allows allocating all available nodes (6174) for solving just one expensive simulation. Theoretically, ug4 allows using all available nodes and cores, i.e., cores.
Parallelization of the framework ug4 is based on the distribution of the spatial grid between the processes and is implemented using MPI ug4_ref1_2013 . As the computation of the realizations involves the nonlinear iterations and the solution of the large sparse linear systems (see Section 2.3), quite intensive communication is required. Shaheen II used the SLURM system to manage parallel jobs, and the job arrays were used for the concurrent computations of the realizations. This was the most time-consuming phase, which required hours of computing time.
5 Numerical Experiments
5.1 Solving the deterministic problem
The deterministic solver ug4 uses the multigrid solver with a hierarchy of nested spatial grids in . There are nodes on each level . Each grid is distributed over all available processors on the node (32 in the tests below).
Equations 1–2 are discretized by a vertex-centered finite-volume method on unstructured grids in space, as presented in Frolkovic-MaxPrinciple . In particular, we use the “consistent velocity” for the approximation of the Darcy velocity (3), cf. Frolkovic-ConsVel ; Frolkovic-Knaber-ConsVel . The implicit Euler scheme is used for the time discretization. The choice of the implicit time stepping scheme is motivated by its unconditional stability, as the velocity depends on the unknowns of the system.
After discretization one obtains a large sparse system of nonlinear algebraic equations in every time step. It is solved by the Newton method with a line search method. The sparse linear system, which appears in the nonlinear iterations, is solved by the BiCGStab method (cf. Templates ) with the multigrid preconditioning (V-cycle, cf. Hackbusch85 ) which proved to be very efficient in this case. In the multigrid cycle, we use the ILUβ-smoothers Hackbusch_Iter_Sol . The matrices are assembled as the Jacobians of the discretized nonlinear systems for each grid. On each step of the Newton method the inversion of the Jacobian is computed. Developers of ug4 also efficiently solve the problem of optimal distribution of the hierarchy of grids onto parallel processes.
Software. ug4 is a novel flexible software system for simulating PDE based models on high performance parallel clusters ug4_ref1_2013 ; ug4_ref2_2013 . The software toolbox ug4 has been developed since the early 1990’s. This software framework has successfully been applied to a variety of problems such as density-driven and thermohaline flow in porous media, Navier-Stokes equation, drug diffusion through human skin, level-set methods. The aim of the software framework ug4 is to solve the above mentioned problems on adaptive hybrid grid hierarchies in 2D and 3D dimensions. Special care is taken to parallel performance issues, scalability, linear algebra algorithms, and data structures. In addition, ug4 is specially designed for user-friendly handling and functionality is made available to non-programming users by the usage of scripts and graphical representations. The numerical solution of the systems (1–2) has been performed using the D3F plugin of the simulation software package ug4 , cf. ug4_ref1_2013 ; ug4_ref2_2013 .
5.2 3D computational domains
The geometry of real-life 3D computational domains can be very complicated. Discretisation of such domains (reservoirs) may require non-trivial 3D meshes. This could be very challenging in the stochastic context, which we have here. Therefore, we start with two relatively trivial reservoir geometries: 1) a parallelepiped (see Fig. 1(a)); 2) an elliptic cylinder with the semi-axes and for the base ellipse and height . Thus, the longest radius is , , the shortest radius is , , the height and the center is in the point (see Fig. 1(b)). The polluted area is a circle with the center in , and the radius .
5.3 Parallelepiped with 3 RVs
We consider an experiment with three random variables and with the following porosity coefficient
[TABLE]
where (parallelepiped).
In Fig. 2, we visualize isosurfaces after 1500 time steps ( years), computed using (a) 200 qMC simulations (Halton sequence) and (b) gPC response surface of order 4. All gPC coefficients were computed via a 3D Gauss-Legendre quadrature rule over the domain with 125 quadrature points.
Note that in the present problem setting, the flow field is very complicated. The main process tends to represent the same behaviour as in the 2D version of the Elder problem. The heavy concentrated solution from the top boundary falls down and replaces the pure water located initially at the bottom. This pure water is pressed up. As the diffusion is very small, the transport of the salt by these opposite flows creates a non-trivial pattern of the concentration field typically described as “fingering”. The particular pattern depends on the profile of the boundary conditions for and the initial flow. In our case, we see 5 “fingers” (4 at the corners and one at the center). As we see, the isosurfaces are almost identical for both the methods.
The same holds for the variance. Fig. 3 presents the isosurface after these 1500 time steps for (a) 200 qMC simulations (Halton sequence), (b) gPC response surface of order 4 and (c) comparison of the isosurfaces from (a) and (b). We see that positions, numbers, and shapes of all fingers are similar. The isosurfaces, computed via qMC , are slightly larger, especially one in the middle (see Fig. 3(c)).
Five different contour lines (isosurfaces) of the deterministic solution c(\mathbf{x},\mbox{\boldmath\theta}=0) for the porosity, defined in (23), are shown in Figures 4.
5.4 Elliptical cylinder with 3 RVs
In this example we consider a cylindrical domain (see Fig. 1(b)). The goal is to see the influence of the computing domain on the solution. As we can see, the fingers have different shapes and numbers.
The porosity coefficient has three layers and is defined with three RVs as follows:
[TABLE]
Figure 5 shows two random realizations of the porosity field.
In Fig. 6 we present a comparison of the variance of , computed via qMC (200 simulations) and gPC of order . All =35 gPC coefficients are computed with a full Clenshaw-Curtis quadrature, containing 125 quadrature points. In Figures 6(a) and 6(b) the cutting plane has the normal vector , and in Fig. 6(c) and 6(d) vector . The variances, computed via qMC and gPC lie in the intervals , and respectively. The small difference is not really visible in the pictures. Thus, we conclude that the gPC surrogate model, computed on the full Clenshaw-Curtis quadrature grid can be used for approximating the mean and the variance of .
To demonstrate more similarities between the solutions, computed via qMC and gPC we visualize isosurfaces of the variance of the mass fraction in the cylindrical 3D reservoir (Figures 7 and 8).
In Figure 7 we demonstrate two isosurfaces computed via a) qMC method, and b) gPC of order 4 surrogate. The Figure 7(c) shows the comparison of both isosurfaces. As we can see the results are very similar, the isosurface computed via qMC is slightly larger.
In Figure 8 we demonstrate two isosurfaces computed via a) qMC method, Fig. 8(a), and b) gPC of order 4 surrogate, Fig. 8(b). The Figure 8(c) shows the comparison of both isosurfaces. As we can see the results are very similar, the isosurface computed via qMC is slightly larger.
In Figure 9 we compare two isosurfaces computed via qMC method and gPC of order 4 surrogate. As we can see the results (shape and number) are very similar again. The isosurface computed via qMC is slightly larger.
5.5 Time evolution of and
This is again an experiment with as in (24)-(28). The gPC coefficients were computed with 125 Gauss-Legendre quadrature points. Figure 10 shows evolution of the mean of the mass fraction in the cylindrical 3D reservoir after a) 0, b) 0.55, c) 1.1, d) 2.2 years. The cutting plane is . One can clearly see different stages of the ”finger" building.
Figure 11 shows the variance of the mass fraction in the cylindrical 3D reservoir after years (after 100, 200, and 400 time steps, respectively).
5.6 Three layers example with cylindrical domain
The porosity coefficient inside of each level is random and is defined by the following equation
\displaystyle\phi(t,\mathbf{x},\mbox{\boldmath\theta})
with c_{z}=\bigg{\{}\begin{array}[]{cc}0.01,&-150\leq z<-100\\ 0.1,&-100\leq z<-50\\ 1.0,&-50\leq z\leq 0.\\ \end{array}
Here, \mbox{\boldmath\theta}=(\theta_{1},\theta_{2},\theta_{3}) is a quadrature point on a full Gaussian-Legendre grid (level 3, dimension 3).
Fig. One realization of the porosity field, .
The maximum and the minimum over all porosity samples and over all are and , respectively. One random realization of the porosity is shown in Figure on the right. The random porosity has three horizontal layers along -axes: the first , second , and third .
Figure 12 shows three different isosurfaces of the mean value of the mass fraction: (a) , (b) , and (c) . The large blue area visualizes the reservoir contour and the smaller dark isosurface represents a isosurface of the mean value.
Three isosurfaces of the variance of the mass fraction after 500 time steps ( years) are shown in Fig. 13. These isosurfaces are , (b) , and (c) , respectively. The gPC surrogate model of order with random variables was used to compute the mean and the variance of the mass fraction.
5.7 Best practices
After numerous numerical tests, we created the best practice guide as below:
The number of uncertain variables to describe the porosity coefficient depends on the model and assumptions. We recommend to start with 1-3 variables to understand the phenomena better. We note that a heterogeneous porosity field in 3D may require hundreds of random variables. That means thousands of gPC coefficients. This is computationally un-affordable unless one uses some advance low-rank tensor techniques dolgov2014computation or multi-scale/homogenization strategy. In this work, we considered three random variables. The more random variables are present, the harder it could be to interpret the results, and additional sensitivity analysis may be required. 2. 2.
It is assumed a smooth dependence of the output QoI on the input uncertain parameters. We recommend to check it at least visually before computing gPC . 3. 3.
The gPC -based surrogate, for the cases when the variance of the porosity is not so high, showed reliable results. For the validation, one can use a qMC method. 4. 4.
We were not able to get a correct variance of with the gPC method in tests where the variance of the porosity was large. 5. 5.
We recommend the maximal polynomial order in gPC to be or . With our computational budget, we failed to get the correct variance of for . Probably the quadrature rule was not good enough to catch oscillatory behavior of gPC polynomials. This is a general known drawback of using global polynomials. 6. 6.
The coefficients of the gPC surrogate require calculating high-dimensional integrals. For stochastic dimensions 4 and higher, we recommend to use Gauss-Legendre (or Clenshaw-Curtis) sparse grid. For dimensions 1-3, we recommend to use a full grid. 7. 7.
The polynomial order in gPC is 2-4. Otherwise, the number of simulations will be too large. 8. 8.
The total numerical cost could be reduced by choosing the spatial and time step sizes adaptive, depending on the random perturbation. But in our implementation, it is hard to it do on the fly. Therefore, we recommend “conservative” settings, which are the same for all random simulations. A large time or large spatial step sizes may result in lack of convergence.
6 Conclusion
We solved the density driven groundwater flow problem with uncertain porosity and permeability. An accurate solution of this time-dependent and non-linear problem is impossible due to the presence of natural uncertainties in the porosity and permeability in the reservoir. Therefore, we estimated the mean value and the variance of the solution, as well as the propagation of uncertainties from the random input parameters to the solution. For this, we, first, computed a multi-variate polynomial approximations (gPC ) to the solution and then used it to estimate the required statistics. Utilizing the gPC method allows us to reduce the computational cost compare of the classical Monte Carlo method. gPC assumes that the output function is square-integrable and smooth w.r.t uncertain input variables .
We considered two different aquifers - a solid parallelepiped and a solid elliptic cylinder. One of our goals was to see how the domain geometry influences the shape of fingers.
Since the considered problem is nonlinear, a high variance in the porosity may result in totally different solutions, for instance, the number of fingers, their intensity, the shape, the propagation time, and the velocity may strongly variate.
From the implementation point of view, we have demonstrated how to use the parallel multigrid solver ug4 as a black-box solver. An additional novelty of this work is that all algorithms for uncertainty quantification are implemented on a distributed memory system, where all solutions and gPC coefficients are distributed over multiple nodes. We saw great potential in using the ug4 library as a black box solver for quantification of uncertainties. The results are reproducible, and the code is available online on GitHub repository111https://github.com/UG4, https://github.com/UG4/ughub.
The number of cells in the presented experiments varied from till for the cylindrical domain and from till for parallelepiped. The maximal number of parallel processing units was , where is the number of parallel nodes, and is the number of computing cores on each node. The computing time varied from 2 hours for the coarse mesh to 23 hours for the finest mesh (for one simulation).
Although this work let many questions open, it is an important step towards the investigation of the density driven flow phenomena. Potential further applications are an estimation of the contamination risks, monitoring the quality of drinking water, modeling of the seawater intrusion into coastal aquifers, radioactive waste disposal, and contaminant plumes.
As a next step, we will integrate the ug4 multi-grid solver in the Multi Level Monte Carlo framework with the idea to compute mostly samples on coarse grids and just a few on a fine grid.
Another idea is to infer the (unknown) porosity coefficient. We assume that there are some noisy measurements of at some locations are available. Then the prior probability density function of could be improved via Bayesian update formula hgmEzBvrAlOp2016 ; hgmEzBvrAl:2016 ; Rosic2013 ; bvrAlOpHgm11 ; litvinenko2013inverse . Herewith, the solution at a point (or at points or in a sub-domain) could be computed with a much smaller numerical cost and with the smaller memory requirement as the whole solution in the whole domain Litvinenko17_PatrInv ; LitDiss .
**Acknowledgements:
**We would like to thank KAUST HPC support team for the assistance with Shaheen II. This work was supported by the Extreme Computing Research Center, by the SRI-UQ Strategic Initiative and by Computational Bayesian group at King Abdullah University of Science and Technology.
7 Legendre polynomials
Since the uncertain input parameters have Uniform distribution, we employ multivariate Legendre polynomials as the gPC basis. Multivariate Legendre polynomials are defined on , where is the dimensionality of the stochastic space. The first few Legendre polynomials are
[TABLE]
The Legendre polynomials are orthogonal with respect to the norm on the interval . The recursive formula is
[TABLE]
The scalar product is
[TABLE]
or
[TABLE]
where the density function is a constant (is equal 1/2 on ), and where denotes the Kronecker delta. The norm and the weighted norm is . The gPC coefficients can be computed as follows
[TABLE]
and for we have
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) R. Askey and J. Wilson , Some basic hypergeometric orthogonal polynomials that generalize Jacobi polynomials , Providence, R.I., U.S.A. : American Mathematical Society, 1985.
- 2(2) I. Babuška, F. Nobile, and R. Tempone , A stochastic collocation method for elliptic partial differential equations with random input data , SIAM J. Numer. Anal., 45 (2007), pp. 1005–1034 (electronic).
- 3(3) I. Babuška, R. Tempone, and G. Zouraris , Galerkin finite element approximations of stochastic elliptic partial differential equations , SIAM Journal on Numerical Analysis, 42 (2004), pp. 800–825.
- 4(4) V. Barthelmann, E. Novak, and K. Ritter , High dimensional polynomial interpolation on sparse grids , Advances in Computational Mathematics, 12 (2000), pp. 273–288.
- 5(5) J. Bear , Dynamics of fluid in Porous Media , Dover Publications, INC. New York, 1972.
- 6(6) , Hydraulics of Groundwater , Dover Publications. Inc., Mineola, 1979.
- 7(7) J. Bear and Y. Bachmat , Introduction to Modeling of Transport Phenomena in Porous Media , Kluwer Academic Publishers, Dordrecht, Boston, London, 1990.
- 8(8) G. Blatman and B. Sudret , An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis , Probabilistic Engineering Mechanics, 25 (2010), pp. 183 – 197.
