Towards a Robust Parameterization for Conditioning Facies Models Using   Deep Variational Autoencoders and Ensemble Smoother

Smith W. A. Canchumuni; Alexandre A. Emerick; Marco Aur\'elio C.; Pacheco

arXiv:1812.06900·stat.ML·January 29, 2020

Towards a Robust Parameterization for Conditioning Facies Models Using Deep Variational Autoencoders and Ensemble Smoother

Smith W. A. Canchumuni, Alexandre A. Emerick, Marco Aur\'elio C., Pacheco

PDF

1 Repo

TL;DR

This paper introduces a deep variational autoencoder-based parameterization method for facies models, improving data assimilation in reservoir history matching, especially for complex channelized facies, by addressing limitations of traditional ensemble methods.

Contribution

It presents a novel deep neural network approach using convolutional variational autoencoders combined with ensemble smoothing for robust facies parameterization.

Findings

01

Outperforms previous methods in preserving channelized facies.

02

Generates well-defined, realistic facies models after data assimilation.

03

Shows promising results in synthetic history-matching problems.

Abstract

The literature about history matching is vast and despite the impressive number of methods proposed and the significant progresses reported in the last decade, conditioning reservoir models to dynamic data is still a challenging task. Ensemble-based methods are among the most successful and efficient techniques currently available for history matching. These methods are usually able to achieve reasonable data matches, especially if an iterative formulation is employed. However, they sometimes fail to preserve the geological realism of the model, which is particularly evident in reservoir with complex facies distributions. This occurs mainly because of the Gaussian assumptions inherent in these methods. This fact has encouraged an intense research activity to develop parameterizations for facies history matching. Despite the large number of publications, the development of robust…

Tables2

Table 1. Table 1: CVAE architecture. Test case 1.

Layer	Configuration	Comment
Encoder
Input	Shape = (45, 45, 2)	Two facies
2D convolution 1	Kernels = 32, size = (2, 2), stride = (2, 2), activation = ReLU	–
2D convolution 2	Kernels = 32, size = (3, 3), stride = (2, 2), activation = ReLU	–
2D convolution 3	Kernels = 16, size = (3, 3), stride = (1, 1), activation = ReLU	–
Flatten	–	Setup for the fully-connected layer
Fully-connected 1	Neurons = 1024, activation = ReLU	–
Dropout	10%	Strategy to avoid overfitting
Fully-connected 2	Neurons = 100, activation = linear	Mean of the VAE
Fully-connected 3	Neurons = 100, activation = linear	Log-variance of the VAE
Code
Lambda	–	Sampling $𝐳$
Decoder
Fully-connected 4	Neurons = 1024, activation = ReLU	–
Dropout	10%	Strategy to avoid overfitting
Fully-connected 5	Neurons = 2034, activation = ReLU	–
Reshape	Output size = (12, 12, 16)	Setup for the transpose convolution
2D transposed convolution 1	Kernels = 16, size = (3, 3), stride = (1, 1), activation = ReLU	–
2D transposed convolution 2	Kernels = 32, size = (3, 3), stride = (2, 2), activation = ReLU	–
2D transposed convolution 3	Kernels = 32, size = (2, 2), stride = (1, 2), activation = ReLU	–
Bilinear up-sampling	Output size = (45, 45, 32)	Resize output dimension
2D convolution 4	Kernels = 2, size = (3, 3), stride = (1, 1), activation = sigmoid	Output image

Table 2. Table 2: CVAE architecture. Test case 3.

Layer	Configuration	Comment
Encoder
Input	Shape = (100, 100, 10, 3)	Three facies
3D convolution 1	Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU	–
Max-pooling 1	Pool = (2, 2, 1)	Dimension reduction
3D convolution 2	Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU	–
3D convolution 3	Kernels = 32, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU	–
3D convolution 4	Kernels = 16, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU	–
Max-pooling 2	Pool = (2, 2, 1)	Dimension reduction
Flatten	–	Setup for the fully-connected layer
Fully-connected 1	Neurons = 2000, activation = linear	–
Batch normalization	–	Regularization
Activation	ReLU	–
Fully-connected 2	Neurons = 100, activation = linear	Mean of the VAE
Fully-connected 3	Neurons = 100, activation = linear	Log-variance of the VAE
Code
Lambda	–	Sampling $𝐳$
Decoder
Fully-connected 4	Neurons = 2000, activation = linear	–
Batch normalization	–	Regularization
Activation	ReLU	–
Fully-connected 5	Neurons = 5000, activation = ReLU	–
Reshape	Output size = (25, 25, 5, 16)	Setup for the transposed convolution
3D transposed convolution 1	Kernels = 16, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU	–
3D transposed convolution 2	Kernels = 32, size = (2, 2, 2), stride = (1, 1, 1), activation = ReLU	–
3D transposed convolution 3	Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU	–
3D transposed convolution 4	Kernels = 32, size = (3, 3, 3), stride = (2, 2, 2), activation = ReLU	–
3D transposed convolution 5	Kernels = 3, size = (3, 3, 3), stride = (1, 1, 2), activation = sigmoid	Output

Equations8

L (x) = L_{RE} (x) + λ D_{KL} (p (z ∣ x) ∥ p (z)),

L (x) = L_{RE} (x) + λ D_{KL} (p (z ∣ x) ∥ p (z)),

L_{RE} (x) = - \frac{1}{N _{x}} i = 1 \sum N_{x} [x_{i} ln (x_{i}) + (1 - x_{i}) ln (1 - x_{i})],

L_{RE} (x) = - \frac{1}{N _{x}} i = 1 \sum N_{x} [x_{i} ln (x_{i}) + (1 - x_{i}) ln (1 - x_{i})],

D_{KL} (p (z ∣ x) ∥ p (z)) = \frac{1}{2} i = 1 \sum N_{z} (μ_{i}^{2} + σ_{i}^{2} - ln (σ_{i}^{2}) - 1),

D_{KL} (p (z ∣ x) ∥ p (z)) = \frac{1}{2} i = 1 \sum N_{z} (μ_{i}^{2} + σ_{i}^{2} - ln (σ_{i}^{2}) - 1),

z_{j}^{k + 1} = z_{j}^{k} + C_{zd}^{k} (C_{dd}^{k} + α_{k} C_{e})^{- 1} (d_{obs} + e_{j}^{k} - d_{j}^{k}), for j = 1, \dots, N_{e},

z_{j}^{k + 1} = z_{j}^{k} + C_{zd}^{k} (C_{dd}^{k} + α_{k} C_{e})^{- 1} (d_{obs} + e_{j}^{k} - d_{j}^{k}), for j = 1, \dots, N_{e},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smith31t/GeoFacies_DL
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSolana Customer Service Number +1-833-534-1729

Full text

Towards a Robust Parameterization for Conditioning Facies Models Using Deep Variational Autoencoders and Ensemble Smoother

Smith W. A. Canchumuni⋆, Alexandre A. Emerick*†* and Marco Aurélio C. Pacheco⋆

⋆PUC-RIO

*†*Petrobras

Abstract

History matching is a jargon used to refer to the data assimilation problem in oil and gas reservoirs. The literature about history matching is vast and despite the impressive number of methods proposed and the significant progresses reported in the last decade, conditioning reservoir models to dynamic data is still a challenging task. Ensemble-based methods are among the most successful and efficient techniques currently available for history matching. These methods are usually able to achieve reasonable data matches, especially if an iterative formulation is employed. However, they sometimes fail to preserve the geological realism of the model, which is particularly evident in reservoir with complex facies distributions. This occurs mainly because of the Gaussian assumptions inherent in these methods. This fact has encouraged an intense research activity to develop parameterizations for facies history matching. Despite the large number of publications, the development of robust parameterizations for facies remains an open problem.

Deep learning techniques have been delivering impressive results in a number of different areas and the first applications in data assimilation in geoscience have started to appear in literature. The present paper reports the current results of our investigations on the use of deep neural networks towards the construction of a continuous parameterization of facies which can be used for data assimilation with ensemble methods. Specifically, we use a convolutional variational autoencoder and the ensemble smoother with multiple data assimilation. We tested the parameterization in three synthetic history-matching problems with channelized facies. We focus on this type of facies because they are among the most challenging to preserve after the assimilation of data. The parameterization showed promising results outperforming previous methods and generating well-defined channelized facies. However, more research is still required before deploying these methods for operational use.

1 Introduction

Ensemble-based methods have been applied with remarkable success for data assimilation in geosciences. However, these methods employ Gaussian assumptions in their formulation, which make them better suited for covariance-based (two-point statistics) models (Guardiano and Srivastava, 1993). This fact lead several researches to propose a variate of parameterizations to adapt these methods for models with non-Gaussian priors, such as models generated with object-based (Deutsch and Journel, 1998) and multiple-point geostatistics (Mariethoz and Caers, 2014). Among these parameterizations, we can cite, for example, truncated plurigaussian simulation (Liu and Oliver, 2005; Agbalaka and Oliver, 2008; Sebacher et al., 2013; Zhao et al., 2008); level-set functions (Moreno et al., 2008; Chang et al., 2010; Moreno and Aanonsen, 2011; Lorentzen et al., 2012; Ping and Zhang, 2014); discrete cosine transform (Jafarpour and McLaughlin, 2008; Zhao et al., 2016; Jung et al., 2017); Wavelet transforms (Jafarpour, 2010); K-singular value decomposition (Sana et al., 2016; Kim et al., 2018); kernel principal component analysis (KPCA) (Sarma et al., 2008; Sarma and Chen, 2009); PCA with thresholds defined to honor the prior cumulative density function (Chen et al., 2014, 2015; Gao et al., 2015; Honorio et al., 2015) and optimization-based PCA (OPCA) (Vo and Durlofsky, 2014; Emerick, 2017). There are also works based on updating probability maps followed by re-sampling steps with geostatistical algorithms (Tavakoli et al., 2014; Chang et al., 2015; Jafarpour and Khodabakhshi, 2011; Le et al., 2015; Sebacher et al., 2015). However, despite the significant number of works, the development of robust parameterizations for facies data assimilation remains an open problem. One clear indication that facies parameterization is an unsolved issue is the fact that the large majority of the publications consider only small 2D problems.

Deep learning became the most popular research topic in machine learning with revolutionary results in areas such as computer vision, natural language processing, voice recognition and image captioning, just to mention a few. The success of deep learning in different areas has inspired applications in inverse modeling for geosciences. Despite the fact that the first investigations in this direction are very recent, the number of publications grew very fast in the last two years. For example, Dubrule and Blunt (2017) used a generative adversarial network (GAN) (Goodfellow et al., 2014) to generate three-dimensional images of porous media. Laloy et al. (2017) used a variational autoencoder (VAE) (Kingma and Welling, 2013) to construct a low-dimensional parameterization of binary facies models for data assimilation with Markov chain Monte Carlo. Later in (Laloy et al., 2018), the same authors extended the original work using spatial GANs. Canchumuni et al. (2017) used an autoencoder to parameterize binary facies values in terms of continuous variables for history matching with an ensemble smoother. Later, Canchumuni et al. (2018) extended the same parameterization using deep belief networks (DBN) (Hinton et al., 2006; Hinton and Salakhutdinov, 2006). Chan and Elsheikh (2017) used a Wasserstein GAN (Arjovsky et al., 2017) for generating binary channelized facies realizations. In (Chan and Elsheikh, 2018), the same authors coupled an inference network to a previously trained GAN to generate facies realizations conditioned to facies observations (hard data). Dupont et al. (2018) also addressed the problem of conditioning facies to hard data. They used a semantic inpainting with GAN (Yeh et al., 2016). Liu et al. (2018) used the fast neural style transfer algorithm (Johnson et al., 2016) as a generalization of OPCA to generate conditional facies realizations using randomized maximum likelihood (Oliver et al., 1996).

The present work is a continuation of the investigation reported in (Canchumuni et al., 2017, 2018) in the sense that it is also based on using an autoencoder-type of network to construct a continuous parameterization for facies. However, the present work addresses the fact that our previous results were limited to small problems due to difficulties to train the neural networks and the fact that the resulting facies realizations did not preserve the desired geological realism. Here, we investigate the use of convolutional VAE (CVAE) to construct the parameterization. Note that Laloy et al. (2017) also used a CVAE to parameterize facies. Unlike Laloy et al. (2017), we consider the use of this parameterization in conjunct with an ensemble smoother for assimilation of hard data and dynamic (production) data. A similar approach was recently applied to parameterize seismic data for history matching with an ensemble smoother by (Liu and Grana, 2018).

The rest of the paper is organized as follows. In the next section, we briefly review generative models. In this section, we describe autoencoders, VAE and convolutional layers. After that, we describe the proposed parameterization for data assimilation applied to petroleum reservoirs using the method ensemble smoother with multiple data assimilation (ES-MDA) (Emerick and Reynolds, 2013). Then, we present three test problems with increasing level of complexity followed by comments on potential issues in the parameterization. The last section of the paper summarizes the conclusions. All data and codes used in this paper are available for download at https://github.com/smith31t/GeoFacies_DL.

2 Generative Models

Generative models are machine learning methods designed to generate samples from complex (and often with unknown closed form) probability distributions in high-dimensional spaces. These methods use unsupervised and semi-supervised techniques to learn the structure of the input data so it can be used to generate new instances.

Let $\mathbf{x}\in\mathds{X}$ denote a vector in the space $\mathds{X}$ containing the facies values of a reservoir model and assume that each realizations of $\mathbf{x}$ are distributed according to some probability density function (PDF) $p(\mathbf{x})$ . Our goal is to construct a generative model that can create new random realizations of facies that are (hopefully) indistinguishable from samples of $p(\mathbf{x})$ . For concreteness, consider a deterministic function, $\bm{f}(\mathbf{z};\mathbf{w}):\mathds{F}\rightarrow\mathds{X}$ which receives as argument a random vector $\mathbf{z}\in\mathds{F}$ with known and easy to sample PDF $p(\mathbf{z})$ . Here, we refer to $\mathbf{z}$ as latent vector which belongs to a feature space $\mathds{F}$ . Moreover, let $\bm{f}(\mathbf{z};\mathbf{w})$ be parameterized by deterministic vector $\mathbf{w}\in\mathds{W}$ . Even though $\bm{f}$ is a function with deterministic parameters, $f(\mathbf{z};\mathbf{w})$ is a random vector in $\mathds{X}$ because $\mathbf{z}$ is random. We want to replace $\bm{f}(\mathbf{z};\mathbf{w})$ by a deep neural network which can be trained (determine the parameters $\mathbf{w}$ ) such that we can sample $\mathbf{z}\sim p(\mathbf{z})$ and generate samples $\widehat{\mathbf{x}}\sim p(\mathbf{x}|\mathbf{z};\mathbf{w})$ which are likely to resemble samples from $p(\mathbf{x})$ . There are several generative models described in the machine learning literature such as restricted Boltzmann machines, DBNs, GANs, VAEs among others. Here, we focus our attention to a specific model based on convolutional neural network (LeCun, 1989) and VAE (Kingma and Welling, 2013). An overview about generative models in the context of deep learning methods is presented in (Goodfellow et al., 2016, Chap. 20). Before we introduce the proposed method, we briefly review the concepts of autoencoders, VAE and convolutional layers.

2.1 Autoencoders

Autoencoder is an unsupervised neural network trained to learn complex data representations. The typical applications of autoencoders include data compression and noise removal. However, especially in the last decade, autoencoders become widely used as building blocks of deep generative models (Goodfellow et al., 2016). Figure 1 illustrates a standard deep autoencoder network composed by six fully-connected layers. The first three layers (encoder) are responsible for mapping the input space to a feature space, $\bm{f}_{e}(\mathbf{x};\mathbf{w}_{e}):\mathds{X}\rightarrow\mathds{F}$ . The last three layers (decoder) correspond to the inverse mapping $\bm{f}_{d}(\mathbf{z};\mathbf{w}_{d}):\mathds{F}\rightarrow\mathds{X}$ . The central layer is called code. The training process consists of minimizing a loss function that measures the dissimilarity between $\mathbf{x}$ and $\widehat{\mathbf{x}}=\bm{f}_{d}(\bm{f}_{e}(\mathbf{x};\mathbf{w}_{e});\mathbf{w}_{d})$ , for example, the mean square error. After training, the autoencoder is able to represent (encode) the most important features of $\mathbf{x}$ in $\mathbf{z}$ . When the decoder function is linear and the loss function is the mean square error, the autoencoder learns to span the same subspace of PCA (Goodfellow et al., 2016). Hence, autoecoders with nonlinear encoder and decoder functions may be interpreted as nonlinear generalizations of PCA (Deng et al., 2017).

2.2 Variational Autoencoders

A VAE is similar to a standard autoencoder in the sense that it is composed by an encoder and a decoder network. However, unlike standard autoencoders, a VAE has an extra layer responsible for sampling the latent vector $\mathbf{z}$ and an extra term in the loss function that forces to generate the latent vector with approximately a specified distribution, $p(\mathbf{z})$ , usually assumed a standard Gaussian, $\mathcal{N}(\mathbf{0},\mathbf{I})$ . This extra term corresponds to the Kullback-Liebler divergence which measures how closely the distribution of the encoded latent vectors $p(\mathbf{z}|\mathbf{x})$ is from the desired distribution $p(\mathbf{z})$ , i.e.,

[TABLE]

where $\mathcal{L}(\mathbf{x})$ is the total loss function. $\mathcal{L}_{\textrm{RE}}(\mathbf{x})$ is the reconstruction error. Here, we use the binary cross-entropy function given by

[TABLE]

where $x_{i}$ assumes values 0 or 1 and $\widehat{x}_{i}$ assumes continuous values in $(0,1)$ . The term $\lambda$ in Eq. 1 is a weight factor (for the test cases presented in this paper we use $\lambda=1$ ). $\mathcal{D}_{\textrm{KL}}\left(p(\mathbf{z}|\mathbf{x})\|p(\mathbf{z})\right)$ is the Kullback–Leibler divergence from $p(\mathbf{z}|\mathbf{x})$ to $p(\mathbf{z})$ . This term can be interpreted as a regularization imposed in the feature space. However, the term $\mathcal{D}_{\textrm{KL}}\left(\cdot\right)$ in Eq. 1 has a more theoretical basis and it is derived from a variational Bayesian framework (Kingma and Welling, 2013). For the case where $p(\mathbf{z}|\mathbf{x})=\mathcal{N}([\mu_{1},\ldots,\mu_{N_{z}}]^{\scriptsize\textrm{T}},\textrm{diag}[\sigma_{1}^{2},\ldots,\sigma^{2}_{N_{z}}]^{\scriptsize\textrm{T}})$ and $p(\mathbf{z})=\mathcal{N}(\mathbf{0},\mathbf{I})$ the Kullback-Leibler divergence becomes

[TABLE]

where $\mu_{i}$ and $\sigma_{i}$ are the $i$ th components of the mean and standard deviation vectors. During training, instead of generating the latent vector $\mathbf{z}$ , the encoder generates vectors of means, $\bm{\mu}$ , and log-variance, $\bm{\ln(\sigma^{2})}$ . Then, the vector $\widehat{\mathbf{z}}$ is drawn from $\mathcal{N}(\mathbf{0},\mathbf{I})$ and rescaled to generate the latent vector $\mathbf{z}=\bm{\mu}+\bm{\sigma}\circ\widehat{\mathbf{z}}$ , which goes in the decoder to generate a reconstructed vector $\widehat{\mathbf{x}}$ . Note that the minimization of the loss function imposes $\widehat{\mathbf{x}}$ to be as close as possible to the input vector $\mathbf{x}$ while the term $\mathcal{D}_{\textrm{KL}}\left(p(\mathbf{z}|\mathbf{x})\|p(\mathbf{z})\right)$ pushes $\bm{\mu}$ and $\bm{\sigma}$ towards the zero and the unity vectors, respectively. After training, the decoder can be used to generate new realizations $\widehat{\mathbf{x}}$ by sampling $\mathbf{z}\sim\mathcal{N}(\mathbf{0},\mathbf{I})$ . Conceptually, we are generating samples $\widehat{\mathbf{x}}$ from a distribution $p(\mathbf{x}|\mathbf{z})=\mathcal{N}(\bm{f}_{d}(\mathbf{z};\mathbf{w}_{d}),\gamma^{2}\mathbf{I})$ , which is a Gaussian with mean given by a trained decoder with parameters $\mathbf{w}_{d}$ and covariance equals to the identity multiplied by a scaling parameter $\gamma^{2}$ (Doersch, 2016). Figure 2 shows a VAE illustrating the main components. The encoder corresponds to an inference network and the decoder corresponds to a generative model. A detailed discussion about the principles behind VAE is presented in (Doersch, 2016).

2.3 Convolutional Layers

The neural networks illustrated in Figs. 1 and 2 are based on fully-connected layers, i.e., each neuron is connected to all neurons in the previous layer. Unfortunately, fully-connected networks do not scale well, i.e., the number of training parameters (weights and bias terms) increase dramatically when the size of the input space is large, which is the case of facies realizations where the number of gridblocks can be easily on the order of hundreds of thousands. This is one of the main limitations observed in our previous work with DBN (Canchumuni et al., 2018). For this reason, in this work we resort to convolutional neural networks (LeCun, 1989) to construct the encoder and decoder of our VAE network. These networks gained significant attention in the deep learning area after the very successful application in the ImageNet image classification challenge (Krizhevsky et al., 2012). Convolution neural networks are specialized in data with grid structure such as images and time series (Goodfellow et al., 2016). Usually each layer of a convolutional network consists of a sequence of convolutional operations, followed by the application of activation functions (detection stage) and pooling operations, which modify the size of the outputs for the next layer and reduce the number parameters and processing time in the next layers. The convolutional operations consist of a series of trainable filters (kernels) which are convolved with the input image to generate activation maps. These convolutions are essentially dot products between the entries of the kernel and the input at any position. Because the size of the kernels is much smaller than the dimension of the input data, the use of the convolutional layers reduces vastly the number of training parameters allowing deeper architectures. The activation functions are applied over the activation maps generated by the convolutional operations. The most common is the rectified linear units (ReLU) function. The pooling operation replaces the output by some statistic of the nearby outputs, typically the maximum output within a rectangular neighborhood (max-pooling). There are also hyperparameters which include the size and the number of kernels and the level of overlapping in the kernel (stride). For a detailed discussion about convolution networks we recommend (Goodfellow et al., 2016, Chap. 9) and (Dumoulin and Visin, 2018).

3 ES-MDA-CVAE

Figure 3 illustrates the final CVAE architecture with convolutional and fully-connected layers. We implemented the CVAE using Keras (Chollet et al., 2015) with TensorFlow (Abadi et al., 2015) as backend engine. This network is trained using a large number of prior facies realizations, on the order of $\mathcal{O}(10^{4})$ realizations. Note that no reservoir simulations are required in this process. After training, the CVAE is conceptually equipped to generate new realizations by simply sampling the random vector $\mathbf{z}\sim\mathcal{N}(\mathbf{0},\mathbf{I})$ and passing it to the decoder. At this point, the decoder works as a substitute model for the geostatistical algorithm used to construct initial realizations.

The data assimilation is done combining the trained decoder with the method ES-MDA. Essentially, we use ES-MDA to update an ensemble of realizations of the latent vector $\mathbf{z}$ to account for reservoir data and use the decoder to reconstruct the corresponding facies models. Here, we refer to this procedure as ES-MDA-CVAE. Figure 4 illustrates this workflow. The data assimilation stars with a set of prior realizations of the latent vector, denoted as $\{\mathbf{z}^{0}_{j}\}_{j=1}^{N_{e}}$ in Fig. 4, where $N_{e}$ is the number of ensemble members. These prior latent vectors can be generated by sampling $\mathcal{N}(\mathbf{0},\mathbf{I})$ or being the result of the encoder for a set of $N_{e}$ prior facies realizations generated with geostatistics, which is the option adopted in the cases presented in this paper. The ensemble of latent vectors is used in the decoder to generate an ensemble of facies $\{\mathbf{x}^{k}_{j}\}_{j=1}^{N_{e}}$ which goes in the reservoir simulator to compute an ensemble of predicted data $\{\mathbf{d}^{k}_{j}\}_{j=1}^{N_{e}}$ . The ES-MDA updating equation is used to update $\{\mathbf{z}^{k}_{j}\}_{j=1}^{N_{e}}$ and the process continue until the number of data assimilation iterations is achieved. Because process requires $N_{e}$ reservoir simulations to computed the vectors of predicted data, which can be very time consuming depending on the size of the model, we limite $N_{e}$ on the order of $\mathcal{O}(10^{2})$ realizations.

The resulting ES-MDA updating equation can be written as

[TABLE]

where $\mathbf{C}^{k}_{\mathbf{z}\mathbf{d}}$ and $\mathbf{C}^{k}_{\mathbf{d}\mathbf{d}}$ are matrices containing the cross-covariances between $\mathbf{z}$ and predicted data $\mathbf{d}$ and auto-covariances of $\mathbf{d}$ , respectively. Both matrices are estimated using the current ensemble. $\mathbf{C}_{\mathbf{e}}$ is the data-error covariance matrix. $\mathbf{d}_{\textrm{obs}}$ is the vector containing the observations and $\mathbf{e}^{k}_{j}$ is a random vector sampled from $\mathcal{N}(\mathbf{0},\alpha_{k}\mathbf{C}_{\mathbf{e}})$ , where $\alpha_{k}$ is the data-inflation factor. In a standard implementation of ES-MDA, Eq. 4 is applied a pre-defined number of times, $N_{a}$ and the values of $\alpha_{k}$ should be selected such that $\sum_{k=1}^{N_{a}}\alpha_{k}^{\scriptsize-1}=1$ (Emerick and Reynolds, 2013). Here, we wrote Eq. 4 in terms of only the latent vector for simplicity. However, we can easily introduce more uncertainty parameters of the reservoir in the data assimilation by updating an augmented vector.

4 Test Cases

4.1 Test Case 1

The first test case corresponds to the same case used in (Canchumuni et al., 2018). This is a channelized facies model generated using the algorithm snesim (Strebelle, 2002). Figure 5 shows the reference (true) permeability field. The model has two facies: channels with constant permeability of 5,000 mD and background with permeability of 500 mD. The size of the model is $45\times 45$ gridblocks, all gridblocks with $100~{}\text{ft}\times 100~{}\text{ft}$ and constant thickness of 50 ft.

4.1.1 CVAE architecture and training

The training set consists of 24,000 facies realizations generated using snesim with the same training image of the reference model. We also use 6,000 additional realizations for validation. The architecture of the network is described in Table 1 in the Appendix. The source code is available at https://github.com/smith31t/GeoFacies_DL. The input data of the CVAE are pre-processed facies images where each facies type corresponds to an color channel with the value one at the corresponding facies. This process is analogous to the pre-processing applied to color pictures where the image is divided in three color channels (red, green and blue). Essentially the encoder is composed of three convolutional layers followed by three fully-connected layers and one dropout layer (Srivastava et al., 2014) to avoid overfitting. In the initial steps of the research, we tested different setups of the network, especially the dimension of the feature space. Because the encoder uses fully-connected layers to compute the latent vector, it is desirable to keep the size of this vector, $N_{z}$ , as small as possible to reduce the computational requirements for training. Unfortunately, fully-connected layers are not efficiently parallelizable even using GPU. Our limited set of tests indicated that for the problems presented in this paper, we did not observe significant improvements for $N_{z}\geq 100$ . Hence, we selected $N_{z}=100$ . The decoder has a mirrored architecture of the encoder with transposed-convolutional layers (often referred to as deconvolutional layers (Dumoulin and Visin, 2018)). Before the last layer of the decoder, we introduced an up-sampling layer with bilinear interpolation to resize the output for the same size of the final model. Note that only the last layer has sigmoid activation function, which is used for classification of the facies type in each gridblock of the model.

The training required approximately 13 minutes in a cluster with four GPUs (NVIDIA TESLA P100) with 3584 cuda cores each. The final reconstruction accuracy for the validation set was 96.7%. Figure 6 shows the first five realizations of the validation set before and after reconstruction. The results in this figure show that the designed CVAE was able to successfully reconstruct the facies. This figure also shows the corresponding histograms of the latent vectors showing nearly Gaussian marginal distributions.

4.1.2 Conditioning to facies data

The facies realizations of the training and validation sets were generated without any hard data (facies type at well locations). However, in real-life applications, geological models are always constructed constrained to hard data. Our tests indicate that if we train the network with realizations conditioned to hard data, most of the reconstructed facies honor these data, but there is no guarantee. In fact, Laloy et al. (2017) reported that in one of their tests only 68% of the realizations honor all nine hard data points imposed in the training set. For this reason, here we investigate the ability of the proposed ES-MDA-CVAE to condition the prior realizations to facies data. For this test, we used an ensemble of $N_{e}=200$ prior realizations and $N_{a}=4$ MDA iterations. We assumed a small value for the data-error variance of $\sigma_{e}^{2}=0.01$ . Figure 7 shows the first 20 prior realizations. Figures 9 and 11 show the corresponding realizations conditioned to seven (Fig. 8) and 20 (Fig. 10) hard data points, respectively. The results in these figures show that ES-MDA-CVAE was able to honor the facies type for all data points. The posterior realizations show well-defined channels, although we observe some “broken” channels. Nevertheless, the final realizations preserve reasonably well the main geological characteristics of the prior ones.

4.1.3 Conditioning to production data

We tested the proposed ES-MDA-CVAE to assimilate production data. We considered four oil producing and three water injection wells as shown in Fig. 5. All producing wells operate at constant bottom-hole pressure of 3,000 psi. The water injection wells operate at 4,000 psi. The synthetic measurements correspond to oil and water rate data corrupted with Gaussian random noise with standard deviation of 5% of the data predicted by the reference model. We use a prior ensemble with $N_{e}=200$ realizations and $N_{a}=4$ iterations. We did not include any facies data (hard data) in order to make the problem more challenging for assimilation of production data. Figure 12 shows the first five prior and posterior realizations obtained with ES-MDA-CVAE. Clearly all posterior realizations are able to reproduce the main features of the reference model (Fig. 5). Figure 13 shows the observed and predicted water rate for four wells showing a good data match. In (Canchumuni et al., 2018), we used the same problem to test the standard ES-MDA and parameterizations with OPCA and DBN. Figure 14 shows the first realization obtained with each method. The results in this figure clearly show the superior performance of ES-MDA-CVAE.

4.2 Test Case 2

The second test case is the same used in (Emerick, 2017). Figure 15 shows the reference permeability field and the corresponding histogram. The model has $100\times 100$ gridblocks with uniform size of 75 meters and constant thickness of 20 meters. Similarly to the first test case, this case has two facies (channel and background sand) generated with the snesim algorithm. However, in this case we update the facies type and the permeability within each facies simultaneously. The permeability values within each facies were obtained with sequential Gaussian simulation. More details about the construction of this problem can be found in (Emerick, 2017).

For this test case, we used a CVAE network with architecture similar to the previous case, with few changes only to accommodate the fact that the size of the models are different. We used a training set with 32,000 realizations and 8,000 for validation. The training required 42 minutes in a cluster with four GPUs (NVIDIA TESLA P100). The final reconstruction accuracy for the validation set was 93.3%. Figure 16 shows the first five realizations of the validation set before and after reconstruction and the corresponding histograms of the latent vectors. Again the CVAE was able to achieve a reasonable reconstruction of the channels.

For this test problem, we assimilated water cut data at five oil producing wells and water rate at two water injection wells. The position of the wells is indicated in Fig. 15. The synthetic measurements were corrupted with Gaussian noise with standard deviation of 5% to the data predicted by the reference model. We use $N_{e}=200$ and $N_{a}=20$ . Our tests showed that for this problem we needed more MDA iterations than usual, possibly because the parameterization makes the problem more nonlinear. All prior realizations do not include facies data at well locations. During the data assimilation, we update the latent vectors and the permeability values within each facies. Figure 17 shows the first five prior and posterior realizations indicating that ES-MDA-CVAE was able to generate plausible facies distributions, i.e., facies with similar features of the prior ones. Figure 18 shows the water cut data for four wells indicating reasonable data matches. Figure 19 shows the first realization obtained with ES-MDA, ES-MDA-OPCA, ES-MDA-DBN and ES-MDA-CVAE. The first three results were extracted from (Canchumuni et al., 2018). This figure shows that the standard ES-MDA was no able to preserve well-defined boundaries for the channels. ES-MDA-OPCA and ES-MDA-DBN resulted in better models, however with some discontinuous branches of channels which are not present in the prior models. Again, ES-MDA-CVAE obtained a realization with better representation of the channels.

4.3 Test Case 3

The last test case is a 3D model with fluvial channels generated with object-based simulation. The model has three facies: channel, levee and background sand. Figure 20 shows the permeability and the facies distribution of the reference case. We applied a transparency to the background sand in Fig. 20b to allow the visualization of the geometry of the channels. We assumed a constant permeability for each facies: 2,000 mD in the channels, 1,000 mD in the levees and 100 mD in the background. This model has 100 $\times$ 100 $\times$ $10$ gridblocks, all gridblocks with 50 m $\times$ 50 m $\times$ 2 m. This reservoir produces with six wells placed near the borders of the model and operated by a constant bottom-hole pressure of 10,000 kPa. There are also two water injection wells placed at the center of the model operating with a fixed bottom-hole pressure of 50,000 kPa.

The 3D geometry of the channel makes this problem particulary challenging because standard convolutional layers are designed for 2D images. One possible approach is to consider each layer of the reservoir model separately. This is the approach used in (Laloy et al., 2017). However, this procedure do not account for the geometry of the facies in the vertical direction as the convolutional operations are performed in 2D. Instead, we used the 3D convolutional layers available in TensorFlow. Even though the extension of convolutional operations to three dimension is conceptually simple, its training becomes computationally challenging. In fact, Geoffrey Hinton described the used of 3D convolutional networks as a “nightmare” (Hinton et al., 2012). The architecture of the network is described Table 2 in the Appendix. In this network, we introduced batch normalization layers (Ioffe and Szegedy, 2018) to improve stability and reduce training times. This procedure removed the need of the dropout layers used in the previous networks. We considered a training set of 40,000 realization and 10,000 for validation. The training took 49 hours in a cluster with four GPUs (NVIDIA TESLA P100) and the reconstruction accuracy was 89.1%.

We applied ES-MDA-CVAE with an ensemble of $N_{e}=200$ realizations and $N_{a}=20$ MDA iterations with constant inflation factors. The observations corresponded to oil and water rate predicted by the reference case and corrupted with random noise of 5%. Figures 21 and 22 show four realizations of the prior and posterior ensembles, respectively. Overall, the ES-MDA-CVAE was able to preserve several channels with the desired characteristics. Figure 23 shows all ten layers of the reference model and the first realization before and after data assimilation. This figure shows that the posterior realization present some facies with well-defined channel-levee sequences. However, the posterior model is clearly distinguishable from the prior and reference models with some discontinuous channels and some oddly-shaped facies; see, e.g., the bottom layer of the posterior realization (Fig. 23c). Ideally, we would like the posterior realizations to be visually indistinguishable from prior realizations generated with the object-based algorithm. Nevertheless, these results are very encouraging and far superior to what would be obtained with standard ES-MDA or even with a OPCA parameterization (this case is no computationally feasible with our previous DBN implementation). In fact, it is important mentioning that this type of model is extremely difficult to history match with the current methods available. Figure 24 shows the oil rate at four wells indicating significant improvements in the predictions, although there are still some realizations with poor data matches; for example, there are some models predicting zero oil rates.

5 Comments

One important limitation of CVAE parameterization is the fact that we cannot apply distance-based localization (Houtekamer and Mitchell, 2001) to update $\mathbf{z}$ because this vector is in a different space. Hence, it does not make sense to compute the Euclidian distance between a component of $\mathbf{z}$ and the spatial position of a well. Yet, localization is important to mitigate the negative effects of sampling errors and limited degrees of freedom in ensemble data assimilation. In (Canchumuni et al., 2018), we tried to work around this issue by using the number of neurons in the code layer equals to the number of reservoir gridblocks. However, this procedure does not ensure the existence of a direct relation between the entry of $\mathbf{z}$ and the corresponding spatial location of the gridblock in the reservoir model. In fact, because the convolutional layers share parameters, it is conceivable that each component of $\mathbf{z}$ may be associated with the reconstruction of the facies in different regions in the reservoir (the same weights of the convolutional kernels are applied to multiple locations of the input data). Moreover, using the size of the code layer equals to the size of reservoir grid increases significantly the number of training parameters because of the fully-connected layers. In practice, this may make the application unfeasible for larger reservoir models. There are localization procedures which are not formulated in terms of spatial distances that could be applied in this case; see, e.g., (Lacerda et al., 2018) and references therein. Unfortunately, in our experience, these procedures are less effective than distance-dependent approaches. We did not use any type of localization in any of the test cases described in this paper. Nevertheless, this is definitely an issue that needs further investigation.

Another practical problem for the application of the method investigated in this paper is the need for a large number of prior realizations to train the CVAE. In the tests cases considered in this paper, we used values between 30,000 and 50,000 realizations. However, in practice, we may need larger numbers for more complex models. Unfortunately, generating several realizations of the geological model with standard geostatistical algorithms may be very challenging. One possible solution we intend to investigate in the future is to use data augmentation (Yaeger et al., 1997; Taylor and Nitschke, 2017) and transfer learning techniques (Hoo-Chang et al., 2018; Cheng and Malhi, 2017). Data augmentation consist of a series of affine transformations applied to the input data to increase the training set. Typical augmentation strategies include mirroring, cropping and rotating images. Transfer learning is a strategy to use previously trained networks either as initialization or fixed parts of the implemented network. For example, in a preliminary test we applied the parameters of the network trained for the first test case as an initialization for the network in the second test case. This process resulted in a reduction of 50% in the training time. Finally, it is necessary to investigate procedures to reduce the computational requirements for training the networks. Note that our last test case has 100,000 gridblocks, which is relatively small compared to the size of the models employed operationally. Yet, the training required approximately two days in a cluster with four GPUs.

6 Conclusions

In this paper, we investigated the use of a CVAE to parameterize facies in geological models and used ES-MDA to condition these models to observed data. We tested the procedure in three synthetic reservoir history-matching problems with channelized features and increasing level of complexity. The first two test problems corresponded 2D cases. The proposed procedure outperformed previous results obtained with standard ES-MDA, ES-MDA with OPCA and DBN parameterizations. The third test problem considered 3D channels and three facies. This case required the use of 3D convolutional layers in the network increasing significantly the training time. There is also a noticeable decrease in the reconstruction accuracy for this case and the conditional realizations exhibit some features not present in the prior geological description of the model. Nevertheless, the overall performance of the method is very encouraging and indicates that the use of deep-learning-based parameterizations is a research direction worth pursuing. In the continuation of this research, we intend to use our trained CVAEs as the generative models in GANs. The objective is to improve the reconstruction accuracy, especially for the third test case.

Acknowledgement

The authors thank Petrobras for the financial support.

Appendix: Architecture of the networks

Bibliography68

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abadi et al. (2015) Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y.,
2Agbalaka and Oliver (2008) Agbalaka, C. C. and Oliver, D. S. Application of the En KF and localization to automatic history matching of facies distribution and production data. Mathematical Geosciences , 40(4):353–374, 2008. doi: 10.1007/s 11004-008-9155-7 . · doi ↗
3Arjovsky et al. (2017) Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. ar Xiv:1701.07875 v 3 [stat.ML] , 2017. URL https://arxiv.org/abs/1701.07875 .
4Canchumuni et al. (2017) Canchumuni, S. A., Emerick, A. A., and Pacheco, M. A. Integration of ensemble data assimilation and deep learning for history matching facies models. In Proceedings of the Offshore Technology Conference, Rio de Janeiro, Brazil, 24–26 October , number OTC-28015-MS, 2017. doi: 10.4043/28015-MS . · doi ↗
5Canchumuni et al. (2018) Canchumuni, S. A., Emerick, A. A., and Pacheco, M. A. History matching channelized facies models using ensemble smoother with a deep learning parameterization. In Proceedings of the 16th European Conference on the Mathematics of Oil Recovery (ECMOR XVI), Barcelona, Spain, 3–6 September , 2018. doi: 10.3997/2214-4609.201802277 . · doi ↗
6Chan and Elsheikh (2017) Chan, S. and Elsheikh, A. H. Parametrization and generation of geological models with generative adversarial networks. ar Xiv:1708.01810 v 1 [stat.ML] , 2017. URL https://arxiv.org/abs/1708.01810 .
7Chan and Elsheikh (2018) Chan, S. and Elsheikh, A. H. Parametric generation of conditional geological realizations using generative neural networks. ar Xiv:1807.05207 v 1 [stat.ML] , 2018. URL https://arxiv.org/abs/1807.05207 .
8Chang et al. (2010) Chang, H., Zhang, D., and Lu, Z. History matching of facies distributions with the En KF and level set parameterization. Journal of Computational Physics , 229:8011–8030, 2010. doi: 10.1016/j.jcp.2010.07.005 . · doi ↗