Example Forgetting: A Novel Approach to Explain and Interpret Deep   Neural Networks in Seismic Interpretation

Ryan Benkert; Oluwaseun Joseph Aribido; and Ghassan AlRegib

arXiv:2302.14644·cs.LG·March 1, 2023

Example Forgetting: A Novel Approach to Explain and Interpret Deep Neural Networks in Seismic Interpretation

Ryan Benkert, Oluwaseun Joseph Aribido, and Ghassan AlRegib

PDF

TL;DR

This paper introduces a novel method called example forgetting to explain and improve deep neural network performance in seismic interpretation, addressing issues of trust and generalization by analyzing model forgetting and augmenting training data.

Contribution

It presents a new technique to relate model mispredictions to the neural network's representation and enhances generalization through targeted data augmentation.

Findings

01

Improved segmentation accuracy on underrepresented classes.

02

Reduced forgotten regions in seismic volume.

03

Enhanced understanding of model behavior during training.

Abstract

In recent years, deep neural networks have significantly impacted the seismic interpretation process. Due to the simple implementation and low interpretation costs, deep neural networks are an attractive component for the common interpretation pipeline. However, neural networks are frequently met with distrust due to their property of producing semantically incorrect outputs when exposed to sections the model was not trained on. We address this issue by explaining model behaviour and improving generalization properties through example forgetting: First, we introduce a method that effectively relates semantically malfunctioned predictions to their respectful positions within the neural network representation manifold. More concrete, our method tracks how models "forget" seismic reflections during training and establishes a connection to the decision boundary proximity of the target…

Tables1

Table 1. TABLE I: Averaged class accuracy.

Class Accuracy & Overall MIoU
Model	MIoU	Upper N. S.	Middle N. S.	Lower N. S.	Chalk	Scruff	Zechstein
Baseline	0.689	0.986	0.875	0.965	0.771	$0.591$	$0.622$
Baseline + Ours	0.687	0.983	0.886	0.962	$0.772$	0.585	0.619
Rand. Rotate	0.709	0.973	0.923	0.972	0.792	0.600	0.556
Rand. Rotate + Ours	0.724	0.974	0.930	0.973	$0.811$	$0.646$	$0.593$
Rand. Rotate + Rand. Flip	0.728	0.973	0.927	0.974	0.804	0.650	0.588
Rand. Rotate + Rand. Flip + Ours	0.732	0.971	0.927	0.975	$0.822$	$0.664$	$0.641$

Equations8

a c c_{i}^{t} = 1_{\tilde{y}_{i}^{t} = y_{i}} .

a c c_{i}^{t} = 1_{\tilde{y}_{i}^{t} = y_{i}} .

f_{i}^{t} = in t (a c c_{i}^{t + 1} < a c c_{i}^{t}) \in 1, 0

f_{i}^{t} = in t (a c c_{i}^{t + 1} < a c c_{i}^{t}) \in 1, 0

L_{i} = t = 0 \sum T f_{i}^{t} .

L_{i} = t = 0 \sum T f_{i}^{t} .

U_{c_{k}} = \frac{\sum _{t = 0}^{T} \sum _{i \in c_{k}} f _{i}^{t}}{N _{c_{k}}} .

U_{c_{k}} = \frac{\sum _{t = 0}^{T} \sum _{i \in c_{k}} f _{i}^{t}}{N _{c_{k}}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Citation

R. Benkert, O.J. Aribido, and G. AlRegib, ”Example Forgetting: A Novel Approach to Explain and Interpret Deep Neural Networks in Seismic Interpretation,” in IEEE Transactions on Geoscience and Remote Sensing (TGRS), May. 12 2022

Review

Date of submission: October 2021

Date of acceptance: May 2022

Bib

@ARTICLE{benkert2022_TGRS,

author={R. Benkert, O.J. Aribido, and G. AlRegib},

journal={IEEE Geoscience and Remote Sensing},

title={Example Forgetting: A Novel Approach to Explain and Interpret Deep Neural Networks in Seismic Interpretation},

year={2022}

Copyright

©2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Contact

[email protected] OR [email protected]

http://ghassanalregib.info/

Example Forgetting: A Novel Approach to Explain and Interpret Deep Neural Networks in

Seismic Interpretation

Ryan Benkert, , Oluwaseun Joseph Aribido, ,

and Ghassan AlRegib Manuscript received February 08, 2022.

Abstract

In recent years, deep neural networks have significantly impacted the seismic interpretation process. Due to the simple implementation and low interpretation costs, deep neural networks are an attractive component for the common interpretation pipeline. However, neural networks are frequently met with distrust due to their property of producing semantically incorrect outputs when exposed to sections the model was not trained on. We address this issue by explaining model behaviour and improving generalization properties through example forgetting: First, we introduce a method that effectively relates semantically malfunctioned predictions to their respectful positions within the neural network representation manifold. More concrete, our method tracks how models ”forget” seismic reflections during training and establishes a connection to the decision boundary proximity of the target class. Second, we use our analysis technique to identify frequently forgotten regions within the training volume and augment the training set with state-of-the-art style transfer techniques from computer vision. We show that our method improves the segmentation performance on underrepresented classes while significantly reducing the forgotten regions in the F3 volume in the Netherlands.

Index Terms:

Example Forgetting, Seismic Interpretation, Deep Learning, Semantic Segmentation.

I Introduction

In the field of geophysics, interpreting processed seismic images is a challenging task. For decades, the process required expert oversight and a costly interpretation process. The introduction of deep learning to the field of geophysics significantly sped up this task and enabled accurate interpretation with limited human interference [1]. Instead of experts annotating volumes for weeks, the interpreter trains a deep model on a annotated training volume and subsequently infers geological information from similar test volumes in a matter of hours. The reason for this success is tied to the nature of the interpretation task. Traditionally, the interpreter extracts quantitative measures (attributes) of interesting characteristics and infers geological information based on the extracted attributes and the seismic section [2]. The choice of these attributes depends on the interpretation objective. For instance, several attributes are based on geometric properties [3], [4], [5] while others are derived from the human visual system [6, 7, 8]. At its core, deep models function in a very similar fashion. The model extracts complex features from the seismic section and classifies the features based on the trained loss objective. By construction, convolutional neural networks [9] posses the capability to model complex spatial features that human interpreters may overlook or that hand-engineered attributes may not capture. This specific characteristic is one of the biggest advantages of deep neural networks, but can also be a huge pitfall. On one hand, deep models automate the attribute extraction process and model complex seismic features easily omitted by interpreters or geometric attributes. This relieves the interpreter from selecting the appropriate attribute and significantly decreases the overall interpretation time. On the flip side, deep models lack interpretability. Even though the model automates feature extraction, it is unclear how these features are related to the semantic interpretation of the subsurface. In other words, the interpreter is unable to explain the behavior of the network because the features are not necessarily based on geophysical information. In many cases, this leads to unexplainable predictions that undermine the confidence in deep models during inference. For instance, a trained machine learning model may predict deep subsurface structures as near surface facies (Figure 1). The former interpretation is significantly less likely in the traditional workflow where attributes are based on relevant geophysical characteristics and interpretations are performed manually by humans.

In this paper, our objective is to make interpretations more explainable and predict the behavior of deep models when utilized on different input volumes. In the field of computer vision, several approaches to model uncertainty or interpretability involve Bayesian inference [10, 11] or gradient-based approaches [12, 13, 14, 15, 16]. Even though these approaches are useful visualization techniques, they fail to provide information about the relationship of samples with respect to the decision boundary. In seismic interpretation, this information is especially important as it can provide useful insights about the geophysical relationship of the extracted features. Based on these observations, we propose using example forgetting to explain seismic deep models. At its core, our method tracks the frequency in which amplitude reflections are forgotten during training and highlights difficult regions in heat maps. From an optimization standpoint, our algorithm contrasts difficult samples with frequent decision boundary shifts from less difficult samples consistently mapped within class manifolds. This provides the interpreter with a powerful tool to visualize the generalization capabilities of the model and to verify model generalization with respect to the interrpetation of the subsurface.

In summary, our contributions in this paper are as follows: First, we present a framework to explain deep model behavior by evaluating the learning dynamics during training. Second, we analyze deep models by visualizing challenging regions and interpret them with respect to model predictions and the geophysical properties of the subsurface. The framework allows us categorize prone regions and evaluate their contribution to the model performance. Third, we introduce a segmentation framework that explicitly targets forgotten samples and significantly reduces difficult pixels. Our empirical findings show that our method impacts the representation space mapping and increases the distance of pixels to their respective class decision boundary. Moreover, our framework improves segmentation performance of underrepresented classes.

II Related Work

In seismic interpretation, deep learning models first surfaced in the form of fully supervised settings [17, 18]. However, due to the high cost, fully annotated datasets are scarce in seismic interpretation. For fully supervised models, this frequently causes overfitting and poor prediction capabilities. As a result, several works explored semi-supervised and weakly supervised approaches [19, 20, 21, 22, 23, 24] as these methods are less dependent on costly data annotations. Apart from methodological shifts, deep models have been further diversified on different seismic applications. A few example applications include detection of faults [25, 26, 27, 28, 29, 30], delineation of salt bodies [31, 32, 6], classification of facies [33, 34, 35, 21, 36], prediction rock lithology from well logs [23, 37, 38, 39] and seismic horizon interpretation [40, 41].

Although deep learning models are effective, they are hard to explain and mispredictions may follow a random pattern that is semantically incorrect. In computer vision, this is a well known issue of deep models and works on model uncertainty or explainability are ubiquitous. For instance, one branch focuses on utilizing gradient activations to infer information about the expected change a model witnesses when updating weight parameters [13, 14, 15, 16]. A more traditional approach is visualizing model uncertainty through Bayesian inference [10, 11]. Typically, this involves estimating the posterior probability of model parameters with respect to given data samples and their respective labels. Subsequently, model uncertainty is visualized by sampling from the parameter distribution and computing the entropy of the resulting prediction distribution. In seismic interpretation, most approaches concerning model uncertainty fall into this area of research [42, 43, 44].

In contrast to existing methods, we explain model behaviour and prediction uncertainty by investigating the learning dynamics in neural networks. In literature, research in this area can be broadly classified in two categories: The first category explores the learning continuity when deep models are trained on new tasks. In research, this behaviour is often referred to as catastrophic forgetting [45, 46]. In seismic interpretation, we frequently encounter this phenomenon in transfer learning scenarios where models are pretrained on one dataset and fine-tuned on another. The second category addresses the learning behaviour within a single task and analyzes sample forgetting within the training distribution [47]. In this paper, we generalize this concept to a seismic segmentation problem and visualize frequently forgotten regions in heat maps. Further, we exploit frequently forgotten regions by transferring their class characteristics to different sections within the seismic volume. To achieve this, we utilize state-of-the-art style transfer algorithms from computer vision.

In this context, several style transfer approaches are based on conditional generative adversarial networks [48]. Starting with [49], conditional generative adversarial networks (cGANs) have been widely deployed in many image-translation applications due to their high quality image generation characteristics. Examples of such applications are high resolution image synthesis [50], multi-modal image synthesis [51, 52] and semantic image synthesis [53, 54, 55, 56, 57]. In seismic, style transfer does not have much research traction. However, few papers address the generation of synthetic subsurface models by applying style transfer techniques [58, 59].

Finally, we note that this work is a continuation of [60]. In addition to [60], we present significantly improved segmentation results as well as thorough analysis aspects of our method.

III Explainability in Neural Networks with Example Forgetting

At the core of our technique stands the concept of example forgetting. Intuitively, samples that are more difficult to learn exhibit different properties than samples that are easy to distinguish and classify. In this section, we formalize this concept in the form of ”forgetting”.

III-A Forgetting Events

Deep neural networks cannot learn continually but forget samples during the optimization process. More generally, optimizing weight parameters causes a shift in the representation manifold that can result in misprediction (or ”forgetting”) of previously correct samples. In neural networks, a shift occurs when a sample has been ”learnt” (classified correctly) at some point $t$ and subsequently ”forgotten” (misclassified) at a time $t^{\prime}>t$ .

Formally, we define $(x_{i},y_{i})\in I^{M\times N}$ as a (pixel, annotation) tuple in image $I$ , where $x_{i}$ and $y_{i}$ correspond to the pixel and annotation respectively. In image segmentation, our goal is to calculate a prediction $\tilde{y}_{i}$ such that $\tilde{y}_{i}=y_{i}$ . Based on this definition, the accuracy of a pixel in training epoch $t$ is defined as

[TABLE]

Here, $\mathbb{1}_{\tilde{y}^{t}_{i}=y_{i}}$ refers to a binary variable indicating the correctness of the classified pixel in image $I$ . With this definition we say a pixel is forgotten at epoch $t+1$ if the accuracy at $t+1$ is strictly smaller than the accuracy at epoch $t$ :

[TABLE]

Similar to [47], we define the binary event $f_{i}^{t}$ as a forgetting event at time $t$ .

In contrast to other deep learning applications, the nature of the segmentation tasks enables visualization of forgetting events (Figure 2). Following our previous definition, we visualize forgetting events in a heat map by counting the number of forgetting events $f_{i}^{t}$ that occur per pixel during the time frame $T$ . Mathematically, heat map $L\in\mathbf{N_{0+}}^{M\times N}$ and every element in $L$ can be written as

[TABLE]

Since frequently forgotten samples were shifted over the decision boundary frequently during training, we interpret forgetting events as an approximate metric for decision boundary proximity. This view is complementary to [47] where frequently forgotten samples are considered support vectors within the representation space. Qualitatively, we note that frequently forgotten regions typically contain overlapping class features or a significant amount of annotation ambiguity. For this reason, we mostly find forgettable regions in underrepresented classes (e.g. salt domes) or facies boundaries where annotations are the most ambiguous (Figure 2).

III-B Heat Map Computation

In our implementation, we calculate heat maps for different distribution sets. Following the previous definitions, we would have to track forgetting events for each model update. Practically, this would result in interpreting every volume set after each minibatch and updating the heat maps multiple times every epoch. Since this approach is computationally expensive, we update the heat maps of the current minibatch only. For the training set we monitor forgetting events of each minibatch and update with the corresponding batch gradient. For validation and test sets we track forgetting events after each epoch. Algorithm 1 outlines the tracking procedure. During training, we count the number of forgetting events for each pixel $(i,j)$ in set $D$ and store the result in a heat map for each image within the minibatch. If $D$ is the training set, we further update with the minibatch $B$ . In all other cases (e.g. test), we do not perform model updates since this would alter the regular training procedure. Instead, we train the model for another epoch on the training set synchronously.

IV Support Vector Augmentation

Building on our previous definitions, we use frequently forgotten regions within the training set to improve robustness and generalization capabilities of the segmentation model. Specifically, we identify frequently forgotten regions within the train set and add example variety through region-wise style transfer. Since we consider forgetting events as an approximate decision boundary proximity metric, our method can be interpreted as an augmentation technique that generates new samples around the class boundaries. Within the manifold, this results in a boundary shift (Figure 3). Based on the popular machine learning paradigm [61], we name our method Support Vector Augmentation.

Our augmentation workflow consists of a segmentation model, a transfer model and a data selection step (Figure 4). First, our method trains the segmentation model on the training data and produces a forgetting event heat map for every validation image in the training volume (Figure 4a). In principle, heat maps could be produced for the entire training set but that would be computationally inefficient.

In the next step of our workflow, we calculate the forgetting event density within each facies class of a heat map (Figure 4b). Specifically, we sum all forgetting events $f_{i\in c_{k}}$ within class $c_{k}$ of a heat map and divide by the number of pixels of class $N_{c_{k}}$ in the image:

[TABLE]

This metric allows us to rank each heat map according to its density with regards to any arbitrary class in the dataset.

Finally, we transfer the visual features of a predefined class from the vertical sections with the highest density to randomly sampled training sections (Figure 4c). Our proposed architecture is a slightly altered version of [53]. In short, the model transfers facies characteristics on the batch-normalization activations within the image generator. Our approach enables class specific transfers without affecting the interpretation characteristics (texture, structure etc.) of other classes within the image. In our method, we transfer the underrepresented class facies (e.g. salt domes) due to the learning difficulty of the samples. After generation, the transferred images are added to our training pool and the segmentation model is trained from scratch (Figure 4d). In the remainder of this section, we will discuss our transfer model in more detail as this represents a crucial step in our workflow.

Our transfer model takes three input parameters: A source image, a target image and a class list. The subsurface source and target image are two seismic sections specified by the user. They represent the facies source as well as the section to be altered by our algorithm. The class list contains the facies that our algorithm will transfer to the target image. We show several transfer examples in Figure 5. Here, we transfer the characteristics of the orange scruff class from the source image (Figure 5d) to the target image (Figure 5a) and evaluate the absolute difference between adjacent transfer images using different sources (Figure 5f). For instance, the difference image using source 2 (middle row) represents the absolute difference between the transfer images of the first and second row. As seen in the transfer output, the characteristics of the scruff class are clearly different from the original target image in all example sections. Moreover, we can see that varying the source image results in different scruff facies that are dependent on the scruff regions of their respective source image. For instance, the scruff facie of the transfer image resulting from source 2 is significantly smoother than the other two examples due to the smooth scruff characteristics of subsurface source 2. Finally, the difference plots show what regions are affected by our transfer model when different source images are used. Since our class list only consisted of the scruff class we observe that only this region of the target class is altered by our model. Note, that the difference in the first row (column f) shows the difference between source 1 and source 0, an adjacent source image not shown in our examples.

To achieve the results in Figure 5 we employ a GAN architecture [62] consisting of an encoder, a generator as well as a discriminator (Figure 7). The discriminator (Figure 7c) predicts whether the presented images originate from the generator or training distribution and is used to derive the adversarial loss. Since this is a standard step in GAN frameworks [62] our explanations will focus on the other two architecture elements.

We encode subsurface characteristics in two steps (Figure 7a): First, we encode the source image to remove information irrelevant to subsurface characteristics. Second, we use a class-wise average pooling layer to produce subsurface representations (codes). Each code is a vector and represents the characteristics of one class in the input source. For instance, one code will include the class scruff whereas another code will represent the zechstein class. These codes are used to transfer the facies within the generator. Simply using the target image as the encoder input, will result in an image reconstruction in the generator as the generated codes contain the target image characteristics. However, in our application we want the output to contain the subsurface characteristics of our source image. For this reason, we substitute the subsurface codes with our desired source subsurface codes during inference. This gives us full control over the transfer and allows us to produce examples as seen in Figure 5. In our example, we substitute the subsurface code of the scruff class with the scruff subsurface code from our source image. Note, that we leave all other subsurface codes untouched because we want the other classes to share the same characteristics as the target image.

In our generator (Figure 7b), we use semantic region-adaptive normalization (SEAN) layers to transfer the codes to their respective class regions [53]. In summary, SEAN modulates the subsurface codes as well as the structural information from the annotation onto the normalized output of the previous layer. In Figure 6 we show a simplified visualization of the modulation process. First, the codes are broadcasted to the respective class regions using the structural information contained in the section annotation (characteristics mapping step). For instance, the scruff subsurface code will be broadcasted to all structural regions containing scruff, the zechstein codes are broadcasted to the zechstein regions and so on. The resulting intermediate image is then used to scale and shift the output of a previous normalization layer.

We train the architecture by learning a simple image reconstruction problem. The subsurface encoder is trained to distinguish per region subsurface codes and the generator is forced to transfer these codes by region adaptive normalization. In inference, the image and annotation source can be different to produce other subsurface codes. In our model, we feed the target image as well as our source image into the encoder sequentially and hand-pick the desired subsurface codes.

V Empirical Analysis

Our experiments in the entire paper were conducted on the F3 block dataset in the Netherlands [36]. We partition the volume into train and test set according to the orignal benchmark paper and show the layout in Figure 8. When showing heat maps we restrict the examples to the test sections in Figure 9. Here, the sections one through four represent the Test 1 crosslines 234, 310, 556, and 622 respecitvely. Further, sections five and six represent the Test 2 inline sections 575, and 596. Our choice is based on the high presence of underrepresented classes (e.g. the orange scruff class) and complex facies structures. Even though this paper only considers these examples, we note that our observations and conclusions are consistent throughout the entire volume.

Throughout all of our experiments, we opt for a deeplab-v3 [63] architecture with different backbone architectures. For optimization, we use the adam variant of stochastic gradient descent with a learning rate of $1e-4$ in combination with a polynomial learning rate decay. We structure our experiments in two sections: First, we show the analysis benefits of forgetting event heat maps by displaying when pixels are forgotten during training. We distinguish different groups within forgettable pixels and relate them to the model interpretations. Second, we benchmark the generalization and robustness properties of our augmentation method. We analyze the impact of our method along the metrics of segmentation performance and heat map impact. As a comparison we use common augmentation techniques used in literature.

V-A Analyzing Forgotten Regions

In our experiments, we train the segmentation architecture with a resnet-18 backbone [64] for 60 epochs on the training volume. We track the forgetting events for the validation and test set and display the heat maps for different observation windows of the test set (Figure 10). In each row, we show different time frames in which forgetting events were tracked: In the first row, all forgetting events that occurred during the 60 epochs are displayed. The second and third row show the forgetting events that occurred between the 20th and the 60th epoch as well as the 50th and 60th epoch respectively. In the final third row, we show the predictions of our model. Overall, we can classify forgettable regions in the following groups:

The first group consist of pixels forgotten rarely and which disappear from the heat maps after 20 epochs. We call these samples early-stage forgettable. These regions are learned at an early stage within the training cycle and the network does not have difficulties mapping them to the representation space. In the context of the interpretation task, these pixels are frequently found in areas that are structurally consistent throughout the volume and do not show a significant variety. An example of these areas can be found in the upper north sea group (dark blue class) and the chaotic middle north sea group (blue class) in images one, two and three. The heat maps within the first row clearly show highlighted regions within the upper classes that disappear after epoch 20 and that are correctly classified by the fully trained model.

The second group of pixels is more difficult to characterize by our network and consists of pixels most frequently forgotten - ambiguous forgettable samples. These areas are frequently shifted between the correct manifold and other class manifolds during the optimization process but are not necessarily misclassified. We interpret these samples to have been within a close proximity to the decision boundary for a specific time frame during the optimization process. Examples of these regions are either the class boundaries or difficult textures within underrepresented classes (e.g. Section 1 frequently forgotten scruff regions in row one and two). Note, these regions are either predicted correctly or incorrectly depending on the network initialization and are not directly visible in the network predicitons.

The final group entails the most difficult pixels for our model. This group is consistently classified incorrectly throughout the training procedure and is forgotten rarely at the end of training. Due to this characteristic, we call these samples late-stage forgettable samples. In terms of the representation mapping, the network is unable to map these regions into the target manifold throughout training and starts to learn these pixels at a late stage when the model has already learned a large variety of textural and structural features. In our examples, these areas are visible in the third row showing the forgetting events at a late stage. Qualitatively, these areas contain difficult textures or salt dome structures that are not present in the training distribution in that form and hence present the most difficult regions within the test set. This is further confirmed by the false predictions in these regions.

V-B Subsurface Transfer

In this section, we benchmark our support vector augmentation method in combination with common augmentation techniques. Our qualitative heat map results are obtained by training the segmentation model with a resnet-18 backbone for 60 epochs with and without augmentations. For our segmentation performance comparison, we use a resnet-101 backbone and train our model for 80 epochs on five separate random seeds. Furthermore, we train all of our architectures by randomly cropping 255 pixel patches and test with full sections. We choose this setup to ensure a proper comparison to the original baseline of [36]. For consistency, we choose the validation set by selecting every fifth inline and every fifth crossline of the training volume for all seeds (Figure 8). We note that our numerical results are summarized over every inline and crossline of both Test 1 and Test 2. We query six images with the highest forgetting event density of our target class. Each section is used as a source image to generate 64 transfer images. For generation, we sample randomly to obtain the target image and retrain the segmentation model from scratch. We note that due to the random target image selection our technique is sensitive to the hyperparameter choice (number of sources, targets etc.) and that significant experimentation had to be performed to achieve the results in Table I. However, the investigation of different query methods is beyond the scope of this paper and we leave this topic for future research. In this paper, we report the results when transferring the scruff class (orange). For evaluation, we compare our subsurface transfer technique with common augmentation methods (random horizontal flip and random rotations; [65]) from computer vision in terms of segmentation performance (in class accuracy and mean intersection over union) and impact on forgetting events heat maps.

We show show the numerical results in Table I. Overall, every method matches or outperforms the baseline in terms of class accuracy. In particular, our method significantly increases the performance of the target class (scruff) as well as the neighboring underrepresented classes (zechstein and chalk) in the majority of cases. We see that our method affects the accuracy of other classes (upper, middle, and lower north sea group) only mildly and largely remains untouched by the algorithm. For instance, adding our method on top of random rotate increases the scruff class accuracy by 4.5% while the upper north sea group accuracy is increased by 0.1% which we consider insignificant. This suggests, that our method is spatially localized and affects the classes in direct proximity of the target class scruff.

Further, we observe that our method matches the baseline accuracy when combined with the baseline exclusively. Specifically, we observe a maximum of 0.6% difference to the baseline on the target class as well as its neighboring classes. We reason, that the data variations of our method are not as profound as conventional augmentation techniques and are not as effective when paired with the baseline exclusively. While augmentations such as random rotate result in significant structural variation, our method adds slight subsurface variations to a single class and maintains all other components of the seismic image. Therefore, the augmentation alone does not affect the numerical values strongly. However, when combined with other augmentations its effect becomes amplified and more pronounced. For instance, we see a clear improvement when using our method in combination with random rotate.

Finally, we note that adding an augmentation can result in minor accuracy reductions for selective classes. For instance, adding random rotate to the baseline results in a 1.3% reduction in terms of accuracy on the upper north sea class. While augmentations frequently result in an overall accuracy improvement, several augmentations can have a negative effect on specific class groups or even entire section performances. In the example of random rotate, the upper north sea class does not share an upper boundary with another class and is therefore ”cut off” when rotated. However, we note that adding our method does not result in such a behavior and reductions can be considered irrelevant. This affirms that our method introduces realistic data variations for seismic interpretation for every class and therefore matches or improves the baseline performance.

In addition to Table I, we further show the predictions of crossline 60 in Test 2 for different augmentation constellations (Figure11). We highlight areas of improvement in green. Overall, the predictions further support our numerical analysis of Table I. In particular, we find that adding our method in any constellation typically results in more fine-grained predictions that are less smooth. For instance the highlighted area when using random rotate and random flip contains significantly smoother scruff predictions than the model trained with our method. We reason, that our method introduces style variations into the data that provide more boundary robustness. For this reason, the predictions are more fine-grained.

We further show the forgetting event heat maps of the different augmentations in Figure 12. Qualitatively, our method reduces the amount of forgetting events significantly more than traditional augmentation methods indicating a clear representation shift. Specifically, we find that several regions with a high forgetting event density are transferred to a low forgetting event density or disappear entirely (bottom scruff class in Section 2, entire left part of Section 6, or center of Section 4). These regions are shifted away from the decision boundary and the classification accuracy is not significantly affected by model updates. In contrast, forgettable regions do not disappear with standard augmentation techniques. Instead, only the severity of the forgetting event regions is reduced or the texture of the regions is blurred. For instance, random rotation results in blurred edges around the forgettable regions.

Finally, we identify regions that transition from lower forgetting event densities to higher densities (e.g. Section 6 bottom left) when using our augmentation method. Because these regions transition to more difficult regions, and serve as an example of negative representation shifts. However, we also note empirically that these occasions are rare and that the reduction of forgettable regions is significantly more common than an increase.

VI Conclusion

In this paper, we presented a novel framework that enhances explainability in deep seismic models. We track the frequency in which pixels are forgotten during training and analyze the relationship to the sample position within the feature space. We highlight forgotten pixels spatially in heat maps and interpret their semantic geologic meaning. Further, we consider different time frames in which samples are forgotten and are able to tie specific prediction properties to model behaviour. Finally, we exploit our framework to engineer an augmentation method that explicitly targets forgotten regions and increases the variety of difficult pixels through subsurface transfer. Our empirical evaluations clearly show the shift in the learned feature space when compared to traditional augmentation methods. In future, we hope that this work will provide a powerful concept for interpreters to verify the model functionality and explain its behaviour. Furthermore, we have shown that the well crafted methods can target prone regions and allow explicit control over the decision boundary. Future work could include an application exploration and forgetting events could be applied to multiple seismic applications such as rock lithology predictions or salt body delineation.

Bibliography65

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Al Regib, M. Deriche, Z. Long, H. Di, Z. Wang, Y. Alaudah, M. A. Shafiq, and M. Alfarraj, “Subsurface structure analysis using computational interpretation and learning: A visual signal processing perspective,” IEEE Signal Processing Magazine , vol. 35, no. 2, pp. 82–98, March 2018.
2[2] S. Chopra and K. J. Marfurt, “Seismic attributes—a historical perspective,” Geophysics , vol. 70, no. 5, pp. 3SO–28SO, 2005.
3[3] M. T. Taner, J. S. Schuelke, R. O’Doherty, and E. Baysal, “Seismic attributes revisited,” in SEG Technical Program Expanded Abstracts 1994 . Society of Exploration Geophysicists, 1994, pp. 1104–1106.
4[4] A. E. Barnes, “The calculation of instantaneous frequency and instantaneous bandwidth,” Geophysics , vol. 57, no. 11, pp. 1520–1524, 1992.
5[5] Q. Chen and S. Sidney, “Seismic attribute technology for reservoir forecasting and monitoring,” The Leading Edge , vol. 16, no. 5, pp. 445–448, 1997.
6[6] M. A. Shafiq, Z. Wang*, A. Amin, T. Hegazy, M. Deriche, and G. Al Regib, “Detection of salt-dome boundary surfaces in migrated seismic volumes using gradient of textures,” in SEG Technical Program Expanded Abstracts 2015 . Society of Exploration Geophysicists, 2015, pp. 1811–1815.
7[7] M. A. Shafiq, Y. Alaudah, H. Di, and G. Al Regib, “Salt dome detection within migrated seismic volumes using phase congruency,” in SEG Technical Program Expanded Abstracts 2017 . Society of Exploration Geophysicists, 2017, pp. 2360–2365.
8[8] M. A. Shafiq, T. Alshawi, Z. Long, and G. Al Regib, “The role of visual saliency in the automation of seismic interpretation,” Geophysical Prospecting , vol. 66, no. S 1, pp. 132–143, 2018.